Selecting XML Nodes with JavaScript, or the Peril of getElementsByTagName

May 10, 2008

There are two popular camps for ajax data formats right now: XML and JSON. Both have their (dis-)advantages. The purpose of this article is to show you how to effectively parse XML in the browser.

Super Fast Beginner’s Primer

Case 1: Node (or un-namespaced node, null-namespaced node): a node without a prefix, such as child here: <child/>
Case 2: Default namespaced node: a node without a prefix, but a parent node (or itself) has a xmlns attribute, like both root and child here: <root xmlns="http://example.com/"><child/></root>
Case 3: Namespaced node: a node with a prefix, and a parent node (or itself) declaring a xmlns with that prefix attached, like both child and root here: <prefix:root xmlns:prefix="http://example.com/"><prefix:child/></root>

/End Primer

Parsing XML in the browser can be a tricky beast. There are many different wrong ways to do it, which can leave you cold and naked in a snowstorm if you’re not careful. So, let’s put on the metaphorical electric one-sie of standards based code and let the power of Edison heat our JavaScript code like the innards of a tauntaun.

If there is one thing you can take away from this article, its that the problems with XML in JavaScript have already been solved, and there is library code out there to do the job for you. But libraries aren’t a substitute for knowledge (abstraction is a dangerous thing during education), so let’s learn why these problems are occurring so we can wrinkle our gray matter and increase our productivity at the same time.

Use Cases

These are the main use cases that takes place when selecting a node inside of an XML document:

Case 1: Selecting un-namespaced nodes (or nodes in the null namespace):

        <root><child/></root>

This one is easy. If you can guarantee that your XML will never have any namespaces, you’re home free. Take your get out of jail free card and run for the hills. Using this assumption, you can query nodes inside of your XML Document object using nothing other than getElementsByTagName(). Lucky bastard.

        // assume oDocEl is the documentElement inside of an XML Document
        var correctForCase1 = oDocEl.getElementsByTagName('child');

Case 2: Selecting default namespaced nodes:

        <root xmlns="http://example.com/"><child/></root>

Tread lightly, this is about to get serious. In most cases, historically I had thought that using the solution described for Case 1 would be sufficient in this case. I had learned awhile back that Internet Explorer treats node names (including namespace prefix and local name together) as one string. So, the method for Case 1 should work for Internet Explorer, especially in the case of node sans prefix. In Firefox, you’d have to use getElementsByTagNS(), but that would be just a simple wrapper.

Then I met an Internet Explorer exception. The only unique thing about this installation of Internet Explorer 7 was that it had MSXML 6 installed, when all the other computers I had tested on were using MSXML 3. The obvious conclusion here is that MSXML 6 won’t select child nodes for Case 2.

        var incorrectForCase2 = oDocEl.getElementsByTagName('child');

Here’s the right way to select nodes for Case 2. Fair warning, to keep the code examples here simple, this solution requires Sarissa (sarissa.js and sarissa_ieemu_xpath.js) to be included on the page prior to usage.

        // assume oDoc is an XML Document object.
        oDoc.setProperty("SelectionNamespaces", "xmlns:whatever='http://example.com/'");
        var oDocEl = oDoc.documentElement;
        var correctForCase2A = oDocEl.selectNodes('whatever:child');
        var correctForCase2B = oDocEl.selectSingleNode('whatever:child');

Note how we’ve mapped what was the default namespace (without a prefix) to be a namespace WITH a prefix during the node selection.

It should be noted that when the resultant XML has a namespace attached (Case 2 and 3), Firefox works fine using getElementsByTagNameNS. IE doesn’t include support for that method, however, so we’re forced to find a more complete solution.

Case 3: Select a non-default namespaced node:

        <root prefix:xmlns="http://example.com/"><prefix:child/></root>

As I mentioned in Case 2, normally (pre-MSXML 6), you’d be able to perform a getElementsByTagName('prefix:child') in IE and use getElementsByTagNameNS in Firefox as usual. But that has changed now. We need to set up the SelectionNamespaces property for IE, and we’ll use Sarissa to take it cross-browser for us.

        // assume oDoc is an XML Document object.
        oDoc.setProperty("SelectionNamespaces", "xmlns:whatever='http://example.com/'");
        var oDocEl = oDoc.documentElement;
        var correctForCase3A = oDocEl.selectNodes('whatever:child');
        var correctForCase3B = oDocEl.selectSingleNode('whatever:child');
        // Note, this is the same code as Case 2 (which is a good thing)

Note that we did not have to use the same prefix that was defined by the result XML. We can map it to whatever we want (literally).

Why is this important?

Because most libraries don’t handle Case 2 and Case 3, which are important parts of XML. Here’s some code straight from YUI 2.5.1 (DataSource component):

    // Line 1394
    var xmlNode = result.getElementsByTagName(key);
    if(xmlNode && xmlNode.item(0) && xmlNode.item(0).firstChild) {
        data = xmlNode.item(0).firstChild.nodeValue;
    }
    else {
           data = "";
    }

Notice how they just do getElementsByTagName. For shame :( jQuery doesn’t handle Case 2 or Case 3 either. (Proof is an exercise to the reader :P) So, if you have XML data sources with namespaces, it would do you well to use the solution presented in this article, or you’re going to have headaches later.

Springer’s Final Word

Don’t use getElementsByTagName. If you do, PLEASE include a note saying that your code isn’t going to support namespaced XML. Branch your selection code to check if Sarissa has been included on the page, and use Sarissa for namespaced XML if it’s there. It’s not fun to be pidgin-holed into the simplest case of XML.

Zach Leatherman is a builder for the web at Font Awesome and the creator/maintainer of Eleventy (11ty), an award-winning open source site generator. At one point he became entirely too fixated on web fonts. He has given 85 talks in nine different countries at events like Beyond Tellerrand, Smashing Conference, Jamstack Conf, CSSConf, and The White House. Formerly part of CloudCannon, Netlify, Filament Group, NEJS CONF, and NebraskaJS. Learn more about Zach »

Peter Disqus

12 Aug 2008

I recently volunteered myself to make a little web mashup for someone. I'm not really a web developer. I just finally reached the point of testing it on Other Browsers, blissfully unaware of this particular little feature...The comment about being cold and naked in a snowstorm was much appreciated ;-)As was the technical stuff, of course.

Zach Leatherman Disqus

My pleasure. It's nice to see someone take the time to say thanks :)

Mamta Disqus

04 Nov 2008

This was really helpful. saved my day. Thanks a lot!

teddy Disqus

31 Aug 2010

Shameful plug for Sarissa!

If you'd like me to do a post on how Sarissa handles this internally, let me know. If we don't use tools, we're just animals.

alex Disqus

24 Jun 2011

out of curiosity, what software do you use to edit your xml files? I am tried a few free products and have trialled liquid xml (http://www.liquid-technolog... also, do you have a preference?

To be honest, I do so little raw XML editing anymore that I'd probably just use Notepad++ (or Sublime Text 2). Can't remember the last time I edited an XML file though, not counting XHTML of course.

Seth Brame Disqus

12 Aug 2011

I'm still learning from you, but I'm making my way to the top as well. I certainly love reading all that is posted on your site.Keep the stories coming. I liked it!