Flash MX 2004 RSS Reader Pg.4

source: http://www.thegoldenmean.com

4 — What NOT to do…

A Blunt Instrument

The next five pages are all about writing the parsing engine for our Reader. We won’t even launch Flash until page Ten. We are going to write an ActionScript 2.0 class for parsing RSS, so we'll be working with a text editor through page Nine.

We will begin by examining alternative approaches in order to arrive at the most efficient solution.

Document Structures

Presented below are two skeleton RSS documents so you can study the form. These were derived from one of my favorite ’blogs (generated by MovableType). I shortened it to two entries and replaced all the real content with the word “content”. I just want to show you the form of the XML. I put the nodes I am interested in extracting in bold type. Note the similarities, but also note the differences.

version 1:

<?xml version="1.0" encoding="utf-8"?>

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:admin="http://webns.net/mvcb/"
xmlns:cc="http://web.resource.org/cc/"
xmlns="http://purl.org/rss/1.0/">

  <channel rdf:about="content">
    <title>content</title>
    <link>content</link>
    <description>content</description>
    <dc:language>en-us</dc:language>
    <dc:creator></dc:creator>
    <dc:date>content</dc:date>
    <admin:generatorAgent rdf:resource="content" />
    <items>
      <rdf:Seq><rdf:li rdf:resource="content" />
      <rdf:li rdf:resource="content" />
      </rdf:Seq>
    </items>
  </channel>

  <item rdf:about="content">
    <title>content</title>
    <link>content</link>
    <description>content</description>
    <dc:subject>content</dc:subject>
    <dc:creator>content</dc:creator>
    <dc:date>content</dc:date>
  </item>
  <item rdf:about="content">
    <title>content</title>
    <link>content</link>
    <description>content</description>
    <dc:subject>content</dc:subject>
    <dc:creator>content</dc:creator>
    <dc:date>content</dc:date>
    </item>

</rdf:RDF>

version 2:

<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>content</title>
    <link>content</link>
    <description>content</description>
    <language>content</language>
    <copyright>content</copyright>
    <lastBuildDate>content</lastBuildDate>
    <pubDate>content</pubDate>
    <generator>content</generator>
    <docs>content</docs>
    <item>
      <title>content</title>
      <description>content</description>
      <link>content</link>
      <guid>content</guid>
      <category>Humor</category>
      <pubDate>content</pubDate>
    </item>
    <item>
      <title>content</title>
      <description>content</description>
      <link>content</link>
      <guid>content</guid>
      <category>content</category>
      <pubDate>content</pubDate>
    </item>
  </channel>
</rss>

One solution, but don’t wander off the path!

Before you launch your text editor, let’s look at what not to do! When I made the first version of the Reader I was familiar with stepping through the XML heirarchy using the basic XML techniques, the relationships between child and parent nodes and using node names and node values. We reference the contents of an XML document as the first child of the XML Object. Let’s say we store our XML object in a variable named “_xml”. If we examine the version 1 skeleton example above, that would make the contents of the document be
_xml.firstChild;

The first child of the document is the <rdf> node. All the content I am interested is nested inside that node. So we step down a level:
_xml.firstChild.firstChild

Now we find that the channel node and multiple item nodes are child nodes of <rdf>. We can examine the child nodes and if they happen to have a nodeName “item” we can do something with them:

var rootNode:XMLNode = _xml.firstChild;
var contentNode:XMLNode = rootNode.firstChild;
var kiddos:Number = contentNode.childNodes.length;
for (var i:Number = 0; i<kiddos; i++) {
   if (contentNode.childNodes[i].nodeName == "item") {
      //do something
   }
}

I don’t know about you, but that sort of code makes my head swim. And what is “do something”? Probably it involves stuffing the contents of some child nodes of item into an array. Now the code gets even worse. Witness:

var rootNode:XMLNode = _xml.firstChild;
var contentNode:XMLNode = rootNode.firstChild;
var kiddos:Number = contentNode.childNodes.length;
for (var i:Number = 0; i<kiddos; i++) {
   if (contentNode.childNodes[i].nodeName == "item") {
      titleArray.push(contentNode.childNodes[i].childNodes[0].firstChild.nodeValue);
      linkArray.push(contentNode.childNodes[i].childNodes[1].firstChild.nodeValue);
      descriptArray.push(contentNode.childNodes[i].childNodes[2].firstChild.nodeValue);
   }
}

It works. Ugly and convoluted it may be, but the sample above will reliably traverse the XML tree, find the <item> nodes and push the text content of three child nodes into arrays. Unless…

Problems with RSS Versions

Unless the structure of the document changes. The problem with this approach is that it is extremely inflexible: it demands that we know structure exactly. We now know what will happen if we try to load a version of RSS other than the version we built our parser for - it will fail.
if <item> is in a different place it breaks
if <title>, <link> and <description> are in different order it breaks

How can we tell what version we are loading? Not all versions announce their version number in some convenient place.

My first try at writing an RSS reader took pages of code and required careful examination of various documents to find the differences around which to build conditionals to discern the document version. Depending on the outcome of the conditional tests, it passed the XML on to one of two different parsers. It worked, but it was very inefficient and slow and it was certainly not very adaptive to different document structures. This was without a doubt not the way to do things! I needed a parser that was more flexible, something that would search the whole document quickly and find matches to nodes at whatever level they lurked. In short I needed something I came to call “Recursive Node Matching”.

go to page: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13

--top--