I've noticed a lot of folks who still do COM development using MSXML (2, 3, or 4) and MSXML2.DOMDocument or MSXML2.FreeThreadedDOMDocument in wrongheaded ways. I wanted to make folks aware of a few tips and thoughts around these two components from experience and elsewhere around the net.
MSXML exposes DOMDocument and FreeThreadedDOMDocument. They are DIFFERENT and using one vs. the other (depending GREATLY on how) can make a 7x to 10x diference in performance. XML is very powerful, but remember that even though it's easy to use [Xml.Load("somexml.xml")] it's also easy to GREATLY slow your code down in VERY few lines of code.
DOMDocumentThese objects use what's known as the "Rental" model of threading. That means that they can be accessed from any thread, but only one thread at a time. As long as you're not trying to share DOM objects between threads, you're fine with these objects.
Best Practices for DomDocument
- When using Single Threaded EXEs (VB6, etc) and manipulating Xml with in a single transaction, you're working on a single thread -> Use DomDocument.
- When using Classic ASP and manipulating Xml within a single page request, you're working on a single thread -> Use DomDocument
The "free-threaded" DOM document exposes the same interface as the "rental" threaded document. This object can be safely shared across any thread in the same process. Free-threaded documents are generally slower than rental documents because of the extra thread safety work they do. You use them when you want to share a document among multiple threads at the same time, avoiding the need for each of those threads to load it's own copy.
If you do need to share objects between threads, you have two choices:
1. use "FreeThreadedDOMDocument", which exposes all the same interfaces as DOMDocument, but is multi-thread safe (with a corresponding performance hit due to internal locking and synchronization). It can be safely stored in ASP Application state on IIS.
For C++ people:
2. Change your threads to use single-threaded apartments(COINIT_APARTMENTTHREADED) and then marshall interface pointers between your threads (See CoMarshalInterThreadInterfaceInStream).
Best Practices for FreeThreadedDomDocument
- When using Classic ASP and storing Xml in the Application Object or Session Object (which is a questionable practice, can affect performance, and is not recommend for the inexperienced) -> Use FreeThreadedDomDocument
- There is not any good reason that I can think of to use FreeThreadedDomDocument in a Single Threaded Exe (unless you're marshalling it off somewhere)
Note: The fastest way to load an XML Document (assuming you're loading it into a DOM) The fastest way to load an XML document is to use the default "rental" threading model (which means the DOM document can be used by only one thread at a time) with validateOnParse, resolveExternals, and preserveWhiteSpace all disabled, like this in (for example
var doc = new ActiveXObject("MSXML2.DOMDocument");
doc.validateOnParse = false;
doc.resolveExternals = false;
doc.preserveWhiteSpace = false;
If you have an element-heavy XML document that contains a lot of white space between elements and stored in Unicode, it can actually be smaller in memory than on disk. Files that have a more balanced ratio of elements to text content end up at about 1.25 to 1.5 the UCS-2 disk file size when in memory. Files that are very data-dense, such as an attribute - heavy XML - persisted ADO recordset, can end up more than twice the disk-file size when loaded into memory.
NOTE: These tips are for the COM MSXML Components and do NOT represent best practices in .NET. .NET has much richer and highly nuanced XML support. Also note that loading XML into a DOM is fairly slow anywhere since you're loading XML in to a pre-parsed and indexed tree. An example of an inefficient operation would be to load a 3000k XML file into a DOM, then perform a SelectNodes("//somenode") in order to retrieve a single value.
[Listening to: The Verve Pipe - The Freshman (Special Version)]