I got a number (~dozen) of emails about by use of the Nametable in my XmlReader post recently. Charles Cook tried it out and noticed about a 10% speedup. I also received a number of poo-poo emails that said "use XPath" or "don't bother" and "the performance is good enough."
Sure, if that works for you, that's great. Of course, always measure before you make broad statements. That said, here's a broad statement. Using an XmlReader will always be faster than the DOM and/or XmlSerializer. Always.
Why? Because what do you think is underneath the DOM and inside of XmlSerialization? An XmlReader of course.
For documents larger than about 50k, you're looking at least one order of magnitude faster when plucking a single value out. When grabbing dozens, it increases.
Moshe is correct in his pointing out that a nice middle-place perf-wise is the XPathReader (for a certain subset of XPath). There's a number of nice XmlReader implementations that fill the space between XmlTextReader and XPathDocument by providing more-than-XmlReader functionality:
BTW, I would also point out that an XmlReader is what I call a "cursor-based pull implementation." While it's similar to the SAX parsers in that it exposes the infoset rather than the angle brackets, it's not SAX.
Now, all that said, what was the deal with my Nametable usage? Charles explains it well, but I will expand. You can do this if you like:
XmlTextReader tr =
if (tr.NodeType == XmlNodeType.Element && tr.LocalName == "enclosure")
The line in red does a string compare as you look at each element. Not a big deal, but it adds up over hundreds or thousands of executions when spinning through a large document.
The NameTable is used by XmlDocument, XmlReader(s), XPathNavigator, and XmlSchemaCollection. It's a table that maps a string to an object reference. This is called "atomization" - meaning we want to think about atom (think small). If they see "enclosure" more than once, they use the object reference rather than have n number of "enclosure" strings internally.
It's not exactly like a Hashtable, as the NameTable will return the object reference if the string has already been atomized.
object enclosure = tr.NameTable.Add("enclosure");
if (tr.NodeType == XmlNodeType.Element &&
The easiest way, IMHO, to think about it is this:
This kind of optimization makes a big perf difference (~10% depending) when using an XmlReader. It makes less of one when using an XPathDocument because you are using Select(ing)Nodes in a loop.
Stealing Charles' words: "...because it involves very little extra code it is perhaps an optimization worth making prematurely."
Even the designers agree: "...using the XmlNameTable gives you enough of a performance benefit to make it worthwhile especially if your processing starts to spans multiple XML components in a piplelining scenario and the XmlNameTable is shared across them i.e. XmlTextReader->XmlDocument->XslTransform."
Oleg laments: "...that something needs to be done to fix this particular usage pattern of XmlReader to not ignore great NameTable idea."
Conclusion: The NameTable is there for a reason, no matter what System.Xml solution you use. This is a the correct and useful pattern and not using it is just silly. If you're going to develop a habit, why not make it a best-practice-habit?
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.
Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.