Scott Hanselman

XmlValidatingReader problems over derived XmlReaders

March 12, '06 Comments [3] Posted in Web Services | XmlSerializer | Bugs
Sponsored By

This whole validation ickiness deserved two posts, so I didn't mention it in the last XmlValidatingReader post.

The XML format that I'm parsing and validating isn't the savvyest of formats as it was created years ago before the XML Schema specification was complete. While it has a namespace and it's an official specification, the instance documents don't have a namespace. They are entirely "unqualified." So, basically I'm trying to validate XML documents with a namespace against a schema that expects namespaces.

Additionally, the elementFormDefault is set to "unqualified." There's a great explanation of what elementFormDefault means here.

The documents come in like this:

<FOO>
  <BAR>text</BAR>
</FOO>

Before I'd look hard at the schema I had assumed that I could load them with an XmlNamespaceUpgradeReader. This is a derivation of XmlTextReader that does nothing but lie about the Namespace of every element. I'm using System.Xml on .NET 1.1.

public class XmlNamespaceUpgradeReader : XmlTextReader

    {

        string oldNamespaceUri;

        string newNamespaceUri;

 

        public XmlNamespaceUpgradeReader( TextReader reader, string oldNamespaceUri, string newNamespaceURI ):base( reader )

        {

            this.oldNamespaceUri = oldNamespaceUri;

            this.newNamespaceUri = newNamespaceURI;

        }

 

        public override string NamespaceURI

        {

            get

            {

                // we are assuming XmlSchemaForm.Unqualified, therefore

                // we can't switch the NS here

                if ( this.NodeType != XmlNodeType.Attribute &&

                    base.NamespaceURI == oldNamespaceUri )

                {

                    return newNamespaceUri;

                }

                else

                {

                    return base.NamespaceURI;

                }

            }

        }

    }

For example, if I did this:

XmlTextReader reader = new XmlNamespaceUpgradeReader(
    File.OpenText("MyLameDocument.xml"),
    String.Empty,
    "http://thenamespaceiwant"); 


XmlDocument doc = new XmlDocument();

doc.Load(reader);

Console.WriteLine(doc.OuterXml);

I would end up with this resulting XML:

<FOO xmlns="http://thenamespaceiwant">
  <BAR xmlns="
http://thenamespaceiwant">text</BAR>
</FOO>

Seemed like this would validate. Well, not so much. The document, as you can see, is fine. It's exactly what you'd expect. But, the I remember/noticed that the document was elementFormDefault="unqualified" meaning that only the root node needs the namespace. So...

public class XmlRootNamespaceUpgradeReader : XmlTextReader

{

    string oldNamespaceUri;

    string newNamespaceUri;

 

    public XmlRootNamespaceUpgradeReader( TextReader reader, string oldNamespaceUri, string newNamespaceURI ):base( reader )

    {

        this.oldNamespaceUri = oldNamespaceUri;

        this.newNamespaceUri = newNamespaceURI;

    }

 

    public override string NamespaceURI

    {

        get

        {

            // we are assuming XmlSchemaForm.Unqualified, therefore

            // we can't switch the NS here

            if ( Depth == 0 && this.NodeType != XmlNodeType.Attribute &&

                    base.NamespaceURI == oldNamespaceUri )

            {

                return newNamespaceUri;

            }

            else

            {

            return base.NamespaceURI;

            }

        }

    }

 

    public override string Prefix

    {

        get

        {

            if(Depth == 0 && this.NodeType == XmlNodeType.Element)

            {

                return "x";

            }

            return null;

        }

    }

 

}

...which results in a document like this:

<x:FOO xmlns:x="http://thenamespaceiwant">
  <BAR
>text</BAR>
</x:FOO>

This document should now validate, and it fact it does in my test applications. When the document is loaded directly from a test file it works fine. When I run it directly through one of the extended "fake-out" XmlTextReaders, it doesn't work. It's as if my readers don't exist at all, even though their code does indeed execute.

To be clear:

Original Doc -> XmlTextReader -> XmlValidatingReader -> doesn't validate (as expected)
Original Doc -> XmlNamespaceUpgradingReader -> XmlValidatingReader -> doesn't validate (but it should!)
Original Doc -> XmlNamespaceUpgradingReader -> XmlDocument -> write to file -> read from file -> XmlValidatingReader -> doesn't validate (as expected, it's "overqualified")
Original Doc -> XmlRootNamespaceUpgradingReader -> XmlDocument -> write to file -> read from file -> XmlValidatingReader -> DOES VALIDATE (as expected)

Why don't the "fake-out" XmlTextReaders work when chained together and feeding the XmlValidatingReader directly, but they do work when there's an intermediate format?

A few things about the XmlValidatingReader in .NET 1.1 (since it's obsolete in 2.0). While its constructor takes the abstract class XmlReader, internally it insists on an XmlTextReader. This is documented, but buried IMHO. Reflector shows us:

XmlTextReader reader1 = reader as XmlTextReader;
if (reader1 == null)
{
    throw new ArgumentException(Res.GetString("Arg_ExpectingXmlTextReader"), "reader");
}

<conjecture>When a class takes an abstract base class - the one it "should" - but really requires a specific derivation/implementation internally, it's a good hint that the OO hierarchy wasn't completely thought out and/or a refactoring that was going to happen in a future version never happened.</conjecture>

Regardless, System.Xml in .NET 2.0 is much nicer and as well though-out as System.Xml 1.x was, 2.0 is considerably more thought out. However, I'm talking about 1.1.

<suspicion>I take this little design snafu as a strong hint that the XmlValidatingReader in .NET 1.1 has carnal knowledge of XmlTextReader and is probably making some assumptions about the underlying stream and doing some caching rather than taking my fake-out XmlReader's word for it.</suspicion> 

If you're on, or were on, the System.Xml team let me know what the deal is and I'll update this post.

I know that the XmlRootNamespaceUpgradingReader works because the XML is correct when it's written out to an intermediate. However, the InfoSet that the XmlValidatingReader acts on is somehow now the same.  How did we solve it? Since XmlValidatingReader needs an XmlTextReader that is more "legit," we'll give it one

Original Doc -> XmlRootNamespaceUpgradingReader -> XmlDocument -> CloneXmlReader -> XmlValidatingReader -> DOES VALIDATE

This is cheesy, but if a better way is found at least it's compartmentalized and I can fix it in one place. We quickly run through the input XmlTextReader, write the Infoset out to a MemoryStream and return a "fresh" XmlTextReader and darn it if it doesn't work just fine.

/// <summary>

/// Makes an in memory complete, fresh COPY of an XmlReader. This is needed

/// because the XmlValidatingReader takes only XmlTextReaders and isn't fooled

/// by our XmlNamespaceUpgradingReader.

/// </summary>

/// <param name="reader"></param>

/// <returns></returns>

protected XmlTextReader CloneReader(XmlTextReader reader)

{

    MemoryStream m = new MemoryStream();

    XmlTextWriter writer = new XmlTextWriter(m,Encoding.UTF8);

    while (reader.Read())

    {

        writer.WriteNode(reader,false);

    }

    writer.Flush();

    m.Seek(0,SeekOrigin.Begin);

    XmlTextReader returnedReader = new XmlTextReader(m);

    return returnedReader;

}

Madness. Many thanks to Tomas Restrepo for his help and graciousness while debugging this issue!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Loading XmlSchema files out of Assembly Resources

March 12, '06 Comments [2] Posted in XML
Sponsored By

I've been doing some validating of documents against an XSD lately. Validation is pretty straight forward, you take any XmlTextReader and wrap and run it through the XmlValidatingReader. The ValidationEventHandler will call you back if there's any trouble. You can poke around in the document if you like, while the validation happens, but when I'm just validating I do a while(reader.Read()) as you'll see.

I have a PILE of .XSD files - 64 of them - that represent a single specification. I load the most-leaf node to load whole spec:

XmlSchemaCollection schemas = new XmlSchemaCollection();

XmlReader reader = new XmlTextReader("TheMainSchema.xsd");

XmlSchema schema = XmlSchema.Read(reader, null);

schemas.Add(schema);

 

XmlReader readerDoc = new XmlTextReader(TheFileYouWantToValidate.xml");

XmlValidatingReader newReader = new XmlValidatingReader(readerDoc);

newReader.Schemas.Add(schemas);

newReader.ValidationEventHandler += new ValidationEventHandler(OnValidate);

 

while ( newReader.Read() );

newReader.Close();

I wanted an assembly that was self-contained and would hold all 64 of these XSD files internally as resources, and I didn't want to put them in a temp directory.

I added all the schemas to the project, right clicked "Properties" and set them all to Embedded Resources. When you request an embedded resource you need to ask for the file using the original file name as well as the namespace. Use Reflector to determine what the ultimate fully qualified resource name is if you have trouble.

It's easy to pull the main schema out of it's resource and pass the Stream into XmlSchema.Read. It's slightly less obvious how to get that schema to resolve its imports.

Schemas may reference other schemas like this:

<xsd:schema targetNamespace="foofoo"
        xmlns:xsd="
http://www.w3.org/2001/XMLSchema">
   <xsd:include schemaLocation="SomeIncludedSchemas.xsd"/>
   <xsd:include schemaLocation="SomeIncludedSchemas2.xsd"/>

In this, and most, cases schemaLocation refers to a relative file. However it could refer to a URL, or some custom scheme. Personally I find the "relative filename" style to be the most flexible. I don't like to bake too much knowledge about the outside world into my schemas. On this project, it's a (light) requirement that we use the specification schemas unchanged.

Assembly a = Assembly.GetExecutingAssembly();

Stream stream = a.GetManifestResourceStream("MyNamespace.TheMainSchema.xsd");

 

XmlSchema x = XmlSchema.Read(stream,
    new ValidationEventHandler(SchemaValidationEventHandler));

 

x.Compile(
    new ValidationEventHandler(SchemaValidationEventHandler),
    new MyCustomResolver(a));

 

schemas.Add(x);

Note the instance call to XmlSchema.Compile. The XmlSchema class will use a FileSystemResolver by default and fail to find the other 63 schemas. So, I pass in a custom resolver that will find the correct schema given the URI (the value in the schemaLocation attribute) and return it, in this example, as a stream.

private class MyCustomResolver : XmlUrlResolver

{

    private const string MYRESOURCENAMESPACE = "MyNamespace.{0}";

    private Assembly resourceAssembly = null;                               

 

    public MyCustomResolver(Assembly resourceAssembly)

    {

        if (resourceAssembly == null)
           throw new ArgumentNullException("resourceAssembly must not be null");

        this.resourceAssembly = resourceAssembly;

    }

 

    override public object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)

    {

        if(absoluteUri.IsFile)

        {

            string file = Path.GetFileName(absoluteUri.AbsolutePath);

            Stream stream = resourceAssembly.GetManifestResourceStream(
               String.Format(MYRESOURCENAMESPACE, file));

            return stream;

        }

        return null;

    }

Here we just grab the relative filename from out of the file:/// URI that we're passed into GetEntity each time a schemaLocation needs to be resolved. Works like a charm. I wrap the whole thing in a factory method and cache the compiled XmlSchemaCollection so we don't load and compile this more than once.

There's a few ways one might want to extend this. I've seen folks build Assembly schemas like assembly:/// and embed stuff in the schemas, but eh, who has the time. This is simpler, IMHO and works for relative file locations and didn't take 10 minutes to write.

Quote of the day: I'm not a control freak, I'm a control enthusiast.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

ZEB (Zero Email Bounce) and a new Outlook Rule

March 10, '06 Comments [8] Posted in Musings | Tools
Sponsored By

I am always made uncomfortable when I see an email inbox with 1000's of emails. I wonder how folks can handle the psychic weight of all those emails. I continue to try to effectively implement Getting Thing Done as I've mentioned before in my systems of organization post.

I also try to get to ZEB (Zero Email Bounce) every day or so. This is when you "bounce" up against zero emails in your inbox. Omar reminded me of the importance of this. This doesn't mean that you've done all your tasks, instead it means you know what your tasks are.

The image at right is my Outlook this moment. I've got an Outlook Search Folder called "Email ZEB" that finds all my Red or Yellow Flagged emails, anywhere in Outlook. I've got four other folders, @Action, @WaitingFor, @Someday and @Snooze.

Right now I've got 16 Action Items to schedule. I've got 44 Red or Yellow Flagged items to watch, and a number of items that are waiting for action from other folks.

Remember that your inbox is not storage, it's a list of what hasn't been categorized yet. If you've got 5000 emails in your inbox, select all the ones that are older than one month and make a folder called Storage. Dump them all in there and you'll have a good start. Get yourself a nice Outlook Search tool like X1 or Google Desktop or MSN Desktop and don't worry, you WILL be able to find stuff again.

I think it's funny that we all know the human brain can't comfortably hold more than 7 digits at once (hence the length of a phone number) but we think that having 5000 emails in our inbox "makes sure things aren't dropped."

Do it, Drop it, Delegate it or Defer it. That's what you should be saying when you read an email.

Another great way I got my ever-increasing Inbox down to zero items (if only for a moment) was to make another Inbox, just for items that I was cc'ed on.

I hate the "Reply to all" culture, where folks cover their own butts by cc'ing others to make sure an item "doesn't get forgotten." I'm guilty of it as well, but 9 times out of 10, an email that I'm cc'ed on will not turn into an Action Item.

Notice that "Inbox - cc'ed" doesn't show up in my Favorite Folders. Again, this decreases its hold on me and the pressure one feels when they see a folder name go bold, indicating that more (potential) work has shown up.

I've found this technique, while always evolving, to be fairly effective in keeping me from stressing out TOO much.

What is your mail-handling style?

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Hanselminutes Podcast 9

March 10, '06 Comments [0] Posted in Podcast | ASP.NET | Ruby | XML | Tools
Sponsored By

HanselminutesMy ninth Podcast is up. This episode is about ASP.NET, John Lam's Ruby CLR Bridge, and useful tools.

We're listed in the iTunes Podcast Directory, so I encourage you to subscribe with a single click (two in Firefox) with the button below. For those of you on slower connections there are lo-fi and torrent-based versions as well.

Subscribe to my Podcast in iTunes

Our sponsors are Automated QA, PeterBlum and the .NET Dev Journal.

Do take a look at TestComplete from Automated QA. It integrates with Visual Studio 2005 and I'm going to try to get a formal review of their stuff probably next week, particularly their functional Web Testing and Recording.

As I've said before this show comes to you with the audio expertise and stewardship of Carl Franklin. The name comes from Travis Illig, but the goal of the show is simple. Avoid wasting the listener's time. (and make the commute less boring)

  • Each show will include a number of links, and all those links will be posted along with the show on the site. There were 15 sites mentioned in this ninth episode, some planned, some not. We're still using Shrinkster.com on this show.
  • The basic MP3 feed is here, and the iPod friendly one is here. There's a number of other ways you can get it (streaming, straight download, etc) that are all up on the site just below the fold. I use iTunes, myself, to listen to most podcasts, but I also use FeedDemon and it's built in support.
  • Note that for now, because of bandwidth constraints, the feeds always have just the current show. If you want to get an old show (and because many Podcasting Clients aren't smart enough to not download the file more than once) you can always find them at http://www.hanselminutes.com.
  • I have, and will, also include the enclosures to this feed you're reading, so if you're already subscribed to ComputerZen and you're not interested in cluttering your life with another feed, you have the choice to get the 'cast as well.
  • If there's a topic you'd like to hear, perhaps one that is better spoken than presented on a blog, or a great tool you can't live without, contact me and I'll get it in the queue!

Enjoy. Who knows what'll happen in the next show?

Now playing: Ricky Gervais, Steve Merchant, and Karl Pilkington - Ricky Gervais Show: Season 2, Episode 1

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Can't Upgrade a Blackberry 7290 to latest system software

March 8, '06 Comments [2] Posted in XML
Sponsored By

Sat_screenshotI had an older 7280 "Blue" BlackBerry that died, so our IT org pulled a "Black" BlackBerry out of a drawer. I went to http://www.cingular.com/bbdownloads for the latest firmware so I could run Google Local for Mobile 

(Which, BTW, is unbelievably brilliant. I wish I knew what flavor of Java they were using to allow them to have such vast phone support. Do check it out if you have a data plan.)

Anywho, when I launched the Crackberry's Desktop Manager after installing the System Software Upgrade Package, you are supposed to get a prompt telling you that your system is out of date and would you like the new stuff. Nothing. It was as if the Desktop Manager couldn't see the new package, or more likely, that it didn't realize that the new package supported this particular Blackberry.

Regmon and Filemon led me to a file called vendor.xml in C:\Program Files\Common Files\Research In Motion\AppLoader. It's got stuff like this:

<vendor id="0x66" Name="Cingular Wireless">
      <bundle id="System" version="4.0.0.201">
  <devicehwid>0x80000503 0x90000503 0x80000403 0x94000503 0x94000403 0x94000903</devicehwid>
      </bundle>
   </vendor>
...etc...

Now, in the registry, under HKEY_LOCAL_MACHINE\SOFTWARE\Research In Motion\AppLoader\SearchPaths\{54DF9FA9-C79E-4BFC-94DE-C56456F9452A} there's a HardwareID listed in decimal, 2617246979 which is 0x9C000503 in hex. It also notes that my BlackBerry is system software version 4.0.0.201.

I wanted to get to 4.0.2.93 and I could see that those files were over in C:\program files\common files\research in motion\shared\loader files.

So, I could go into the vendor.xml file and add the new version of the System software and the mapping to my Device's Hardware ID. Sigh.

Conclusion: Rather than add a new mapping I renamed the vendor.xml file to vendor.foo. Upgraded and everything's lovely. Who has the patience, really?

Elapsed (wasted) time: 9 mins. Damn BlackBerry and their -600 million dollars.

DISCLAIMER: It's your tuckus, not mine, if this violates your IT org's policies, or your ISP's policies. I'm just talking, you're the one who has to take responsibility for pressing buttons and turning dials.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.