FlashRSS Reader Pg.7

source: http://www.thegoldenmean.com

7 — In-Line Extraction

Version Two: Using Recursion without Arrays

Restating The Objective:

We want to present data using a TextArea component in a Flash Movie, formatted with HTML tags and structured like this:

<headline><a href="LINK">TITLE</a></headline><p>DESCRIPTION</p>

(where the words in all caps above are the actual content from an RSS document). There might be three or thirty entries. The script shouldn’t care. We simply want to target the <item> nodes and from them extract the text data contained in their <title>, <link> and <description> child nodes.

The Modified Method

One-Pass Parsing with “for-in”

If you survived the previous page, you recall that we got a reasonably robust RSS parsing engine going, but that we wished it could pull out the actual data we want in one pass and without needing to burden system resources with arrays whose only reason for existences is temporary storage.

The first recursive loop from page six actually does its job quite rapidly and efficiently, so let’s keep that. This time when it hits an <item> tag let’s have it pry out the data immediately before moving on to the next <item> tag, building the final output string as it goes, and returning that long complex string at the very end.

Similar to the example on page Six, this page of the tutorial will only discuss the key points of the parsing code. It does not attempt to present the entire Class and doesn’t deal at all with the Flash movie. Please download the project files and look at the complete implementation after reading this page of the tutorial.

As with the example on page six, this approach requires two functions (or methods), one whose task is to seek every <item> node in the XML Object being considered, and another to find specific child nodes and extract the content of those nodes. We begin with getContent() which ought to look quite familiar from the previous example:

private function getContent (node:XMLNode, name:String):String {
  //initialize String variable
  var content:String = "";
  //start at the top of the document
  var c:XMLNode = node.firstChild;
  while (c) {
    //text nodes are of type 3 - don't waste cycles looking at them
    if (c.nodeType != 3) {
      //check for a match to the node we want
      if (c.nodeName == name) {
        //if it matches, get the data from the subnodes by passing
        //the current node's childNodes (which is an array) to the
        //"getNodeText()" method
        var itemTitle:String = getNodeText(c.childNodes, "title");
        var itemLink:String = getNodeText(c.childNodes, "link");
        var itemDescr:String = getNodeText(c.childNodes, "description");
        //add formatting and update the variable that will get
        //passed to the TextArea component
        content += "<headline><a href='"+itemLink+
            "' target='_blank'>"+itemTitle+"</a></headline><p>"+
      //here's the recursive bit:
      //call getContent on the current node
      content += getContent(c, name);
    //examine the next node if there is one
    c = c.nextSibling;
  //send the final string back to the method that called it
  return content;

What’s New:

That chunk of code would look a lot scarier if it wasn’t so similar to the one we look at in depth on the previous page. This one is almost an old friend now! The goal of this method is to return a complex string consisting of the data we are interested in wrapped in some formatting tags for the TextArea component that will display it. This in itself is quite significant: the method outlined on page six returned an Array at its conclusion, whereas this version returns a String.

The first order of business then is to make sure the variable which will hold that string is initialized, which is done like this:
var content:String = "";

Things look familiar for a while, and then we encounter three nearly identical lines that invoke a method named getNodeText(). We are going to examine that method as soon as we are done with getContent(), but I’ll bet you have already figured that this is the part where the text data is located and extracted from the child nodes.

The next line might at first look more daunting than it really is. It is simply combining the data which has been extracted from the current <item> node with formatting tags to conform to the goal stated at the top of this page. When displayed in the TextArea component this will link the headline text to the actual weblog entry. If the summary interests you, click the headline to read the entire entry.

The concluding lines are more or less identical to the example on page six except that it is a String that gets updated and returned instead of an Array.

Looking For Something In Something

Back on page five I mentioned a snippet of code by Grant Skinner he called “indexOn()”. I didn’t use it for the main pass because it isn’t recursive but now it shines at locating the title, link and description nodes. I have used the basics of Grant’s technique here:

private function getNodeText (child:Array, searchTerm:String):String {
  for (var i in child) {
    if (child[i].nodeName == searchTerm) {
      return child[i].firstChild.nodeValue;

Pretty short, isn’t it? You may have wondered why the data type for the “child” argument is Array. When this utility method is invoked inside of the getNodes() method, it is passed the childNodes of the <item> node as one of its arguments. By definition, childNodes is an Array.

Using a for-in statement this method searches the node seeking a match for a search term and, when it finds a match, returns the text content as a string.

for-in statements are absolutely magic. Quoting Colin Moock:

A for-in statement is a specialized loop used to list the properties of an object. Unlike other loops, which repeat a series of statements until a given test expression is false, a for-in loop iterates once for each property in the specified object. Therefore, for-in statements do not need an explicit update statement because the number of loop iterations is determined by the number of properties in the object being inspected.

Not mentioned in the statement above is that for-in statements execute fast! So we invoke a for-in statement on the childNodes array and compare the current value of what’s called the “enumerator” ("i" in this case) against what we are seeking ("searchTerm") and when a match is found return the text ("nodeValue"). For-In statements run “backwards” (that is, from end to beginning) but in this case it makes no difference.

That’s all there is to it. One method probes the XML seeking a match for <item> nodes, and, when it finds one, invoking another method to pull out specific morsels. Because all the variables are local variables inside functions they are cleaned up (“garbage collected”) when the functions finish executing. No residue remains to tie up system resources. This is executed remarkably quickly too! What could be any better than this?

“Better” is certainly a subjective term, but the next page demonstrates a way of doing things that is certainly much easier, and I think an argument could be made that easier == better! Now that we have suffered through building our own parsing engine, let’s take advantage of the effort someone else put into making our lives easier. Prepare to meet XPath!

go to page: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13
divider ornament