XmlValidatingReader problems over derived XmlReaders

March 12, 2006 Comment on this post [3] Posted in Web Services | XmlSerializer | Bugs

Sponsored By

This whole validation ickiness deserved two posts, so I didn't mention it in the last XmlValidatingReader post.

The XML format that I'm parsing and validating isn't the savvyest of formats as it was created years ago before the XML Schema specification was complete. While it has a namespace and it's an official specification, the instance documents don't have a namespace. They are entirely "unqualified." So, basically I'm trying to validate XML documents with a namespace against a schema that expects namespaces.

Additionally, the elementFormDefault is set to "unqualified." There's a great explanation of what elementFormDefault means here.

The documents come in like this:

Before I'd look hard at the schema I had assumed that I could load them with an XmlNamespaceUpgradeReader. This is a derivation of XmlTextReader that does nothing but lie about the Namespace of every element. I'm using System.Xml on .NET 1.1.

public class XmlNamespaceUpgradeReader : XmlTextReader
    {
        string oldNamespaceUri;
        string newNamespaceUri;
 
        public XmlNamespaceUpgradeReader( TextReader reader, string oldNamespaceUri, string newNamespaceURI ):base( reader )
        {
            this.oldNamespaceUri = oldNamespaceUri;
            this.newNamespaceUri = newNamespaceURI;
        }
 
        public override string NamespaceURI
        {
            get
            {
                // we are assuming XmlSchemaForm.Unqualified, therefore
                // we can't switch the NS here
                if ( this.NodeType != XmlNodeType.Attribute && 
                    base.NamespaceURI == oldNamespaceUri )
                {
                    return newNamespaceUri;
                }
                else 
                {
                    return base.NamespaceURI;
                }
            }
        }
    }

For example, if I did this:

XmlTextReader reader = new XmlNamespaceUpgradeReader(
    File.OpenText("MyLameDocument.xml"), 
    String.Empty, 
    "http://thenamespaceiwant"); 

XmlDocument doc = new XmlDocument();
doc.Load(reader);
Console.WriteLine(doc.OuterXml);

I would end up with this resulting XML:

Seemed like this would validate. Well, not so much. The document, as you can see, is fine. It's exactly what you'd expect. But, the I remember/noticed that the document was elementFormDefault="unqualified" meaning that only the root node needs the namespace. So...

public class XmlRootNamespaceUpgradeReader : XmlTextReader
{
    string oldNamespaceUri;
    string newNamespaceUri;
 
    public XmlRootNamespaceUpgradeReader( TextReader reader, string oldNamespaceUri, string newNamespaceURI ):base( reader )
    {
        this.oldNamespaceUri = oldNamespaceUri;
        this.newNamespaceUri = newNamespaceURI;
    }
 
    public override string NamespaceURI
    {
        get
        {
            // we are assuming XmlSchemaForm.Unqualified, therefore
            // we can't switch the NS here
            if ( Depth == 0 && this.NodeType != XmlNodeType.Attribute && 
                    base.NamespaceURI == oldNamespaceUri )
            {
                return newNamespaceUri;
            }
            else 
            {
            return base.NamespaceURI;
            }
        }
    }
 
    public override string Prefix
    {
        get
        {
            if(Depth == 0 && this.NodeType == XmlNodeType.Element)
            {
                return "x";
            }
            return null;
        }
    }
 
}

...which results in a document like this:

<x:FOO xmlns:x="http://thenamespaceiwant">
<BAR>text</BAR>
</x:FOO>

This document should now validate, and it fact it does in my test applications. When the document is loaded directly from a test file it works fine. When I run it directly through one of the extended "fake-out" XmlTextReaders, it doesn't work. It's as if my readers don't exist at all, even though their code does indeed execute.

To be clear:

Original Doc -> XmlTextReader -> XmlValidatingReader -> doesn't validate (as expected)
Original Doc -> XmlNamespaceUpgradingReader -> XmlValidatingReader -> doesn't validate (but it should!)
Original Doc -> XmlNamespaceUpgradingReader -> XmlDocument -> write to file -> read from file -> XmlValidatingReader -> doesn't validate (as expected, it's "overqualified")
Original Doc -> XmlRootNamespaceUpgradingReader -> XmlDocument -> write to file -> read from file -> XmlValidatingReader -> DOES VALIDATE (as expected)

Why don't the "fake-out" XmlTextReaders work when chained together and feeding the XmlValidatingReader directly, but they do work when there's an intermediate format?

A few things about the XmlValidatingReader in .NET 1.1 (since it's obsolete in 2.0). While its constructor takes the abstract class XmlReader, internally it insists on an XmlTextReader. This is documented, but buried IMHO. Reflector shows us:

XmlTextReader reader1 = reader as XmlTextReader;
if (reader1 == null)
{
throw new ArgumentException(Res.GetString("Arg_ExpectingXmlTextReader"), "reader");
}

<conjecture>When a class takes an abstract base class - the one it "should" - but really requires a specific derivation/implementation internally, it's a good hint that the OO hierarchy wasn't completely thought out and/or a refactoring that was going to happen in a future version never happened.</conjecture>

Regardless, System.Xml in .NET 2.0 is much nicer and as well though-out as System.Xml 1.x was, 2.0 is considerably more thought out. However, I'm talking about 1.1.

<suspicion>I take this little design snafu as a strong hint that the XmlValidatingReader in .NET 1.1 has carnal knowledge of XmlTextReader and is probably making some assumptions about the underlying stream and doing some caching rather than taking my fake-out XmlReader's word for it.</suspicion>

If you're on, or were on, the System.Xml team let me know what the deal is and I'll update this post.

I know that the XmlRootNamespaceUpgradingReader works because the XML is correct when it's written out to an intermediate. However, the InfoSet that the XmlValidatingReader acts on is somehow now the same. How did we solve it? Since XmlValidatingReader needs an XmlTextReader that is more "legit," we'll give it one

Original Doc -> XmlRootNamespaceUpgradingReader -> XmlDocument -> CloneXmlReader -> XmlValidatingReader -> DOES VALIDATE

This is cheesy, but if a better way is found at least it's compartmentalized and I can fix it in one place. We quickly run through the input XmlTextReader, write the Infoset out to a MemoryStream and return a "fresh" XmlTextReader and darn it if it doesn't work just fine.

/// <summary>
/// Makes an in memory complete, fresh COPY of an XmlReader. This is needed
/// because the XmlValidatingReader takes only XmlTextReaders and isn't fooled
/// by our XmlNamespaceUpgradingReader.
/// </summary>
/// <param name="reader"></param>
/// <returns></returns>
protected XmlTextReader CloneReader(XmlTextReader reader)
{
    MemoryStream m = new MemoryStream();
    XmlTextWriter writer = new XmlTextWriter(m,Encoding.UTF8);
    while (reader.Read())
    {
        writer.WriteNode(reader,false);
    }
    writer.Flush();
    m.Seek(0,SeekOrigin.Begin);
    XmlTextReader returnedReader = new XmlTextReader(m);
    return returnedReader;
}

Madness. Many thanks to Tomas Restrepo for his help and graciousness while debugging this issue!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.