Scott Hanselman

Loading XmlSchema files out of Assembly Resources

March 12, 2006 Comment on this post [2] Posted in XML

Sponsored By

I've been doing some validating of documents against an XSD lately. Validation is pretty straight forward, you take any XmlTextReader and wrap and run it through the XmlValidatingReader. The ValidationEventHandler will call you back if there's any trouble. You can poke around in the document if you like, while the validation happens, but when I'm just validating I do a while(reader.Read()) as you'll see.

I have a PILE of .XSD files - 64 of them - that represent a single specification. I load the most-leaf node to load whole spec:

XmlSchemaCollection schemas = new XmlSchemaCollection();
XmlReader reader = new XmlTextReader("TheMainSchema.xsd");
XmlSchema schema = XmlSchema.Read(reader, null);
schemas.Add(schema);
 
XmlReader readerDoc = new XmlTextReader(TheFileYouWantToValidate.xml");
XmlValidatingReader newReader = new XmlValidatingReader(readerDoc);
newReader.Schemas.Add(schemas);
newReader.ValidationEventHandler += new ValidationEventHandler(OnValidate);
 
while ( newReader.Read() );
newReader.Close();

I wanted an assembly that was self-contained and would hold all 64 of these XSD files internally as resources, and I didn't want to put them in a temp directory.

I added all the schemas to the project, right clicked "Properties" and set them all to Embedded Resources. When you request an embedded resource you need to ask for the file using the original file name as well as the namespace. Use Reflector to determine what the ultimate fully qualified resource name is if you have trouble.

It's easy to pull the main schema out of it's resource and pass the Stream into XmlSchema.Read. It's slightly less obvious how to get that schema to resolve its imports.

Schemas may reference other schemas like this:

<xsd:schema targetNamespace="foofoo"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:include schemaLocation="SomeIncludedSchemas.xsd"/>
<xsd:include schemaLocation="SomeIncludedSchemas2.xsd"/>

In this, and most, cases schemaLocation refers to a relative file. However it could refer to a URL, or some custom scheme. Personally I find the "relative filename" style to be the most flexible. I don't like to bake too much knowledge about the outside world into my schemas. On this project, it's a (light) requirement that we use the specification schemas unchanged.

Assembly a = Assembly.GetExecutingAssembly();
Stream stream = a.GetManifestResourceStream("MyNamespace.TheMainSchema.xsd");
 
XmlSchema x = XmlSchema.Read(stream,
    new ValidationEventHandler(SchemaValidationEventHandler));
 
x.Compile(
    new ValidationEventHandler(SchemaValidationEventHandler), 
    new MyCustomResolver(a));
 
schemas.Add(x);

Note the instance call to XmlSchema.Compile. The XmlSchema class will use a FileSystemResolver by default and fail to find the other 63 schemas. So, I pass in a custom resolver that will find the correct schema given the URI (the value in the schemaLocation attribute) and return it, in this example, as a stream.

private class MyCustomResolver : XmlUrlResolver
{
    private const string MYRESOURCENAMESPACE = "MyNamespace.{0}";
    private Assembly resourceAssembly = null;                                
 
    public MyCustomResolver(Assembly resourceAssembly)
    {
        if (resourceAssembly == null) 
           throw new ArgumentNullException("resourceAssembly must not be null");
        this.resourceAssembly = resourceAssembly;
    }
 
    override public object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
    {
        if(absoluteUri.IsFile)
        {
            string file = Path.GetFileName(absoluteUri.AbsolutePath);
            Stream stream = resourceAssembly.GetManifestResourceStream(
               String.Format(MYRESOURCENAMESPACE, file));
            return stream;
        }
        return null;
    }
} 

Here we just grab the relative filename from out of the file:/// URI that we're passed into GetEntity each time a schemaLocation needs to be resolved. Works like a charm. I wrap the whole thing in a factory method and cache the compiled XmlSchemaCollection so we don't load and compile this more than once.

There's a few ways one might want to extend this. I've seen folks build Assembly schemas like assembly:/// and embed stuff in the schemas, but eh, who has the time. This is simpler, IMHO and works for relative file locations and didn't take 10 minutes to write.

Quote of the day: I'm not a control freak, I'm a control enthusiast.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

About Newsletter

Hosting By

Hosted on Linux using .NET in an Azure App Service

Comment on this post [2]

Share on BlueSky or use the Permalink and post anywhere!

ZEB (Zero Email Bounce) and a new Outlook Rule

March 10, 2006 Comment on this post [8] Posted in Musings | Tools

Sponsored By

I am always made uncomfortable when I see an email inbox with 1000's of emails. I wonder how folks can handle the psychic weight of all those emails. I continue to try to effectively implement Getting Thing Done as I've mentioned before in my systems of organization post.

I also try to get to ZEB (Zero Email Bounce) every day or so. This is when you "bounce" up against zero emails in your inbox. Omar reminded me of the importance of this. This doesn't mean that you've done all your tasks, instead it means you know what your tasks are.

The image at right is my Outlook this moment. I've got an Outlook Search Folder called "Email ZEB" that finds all my Red or Yellow Flagged emails, anywhere in Outlook. I've got four other folders, @Action, @WaitingFor, @Someday and @Snooze.

Right now I've got 16 Action Items to schedule. I've got 44 Red or Yellow Flagged items to watch, and a number of items that are waiting for action from other folks.

Remember that your inbox is not storage, it's a list of what hasn't been categorized yet. If you've got 5000 emails in your inbox, select all the ones that are older than one month and make a folder called Storage. Dump them all in there and you'll have a good start. Get yourself a nice Outlook Search tool like X1 or Google Desktop or MSN Desktop and don't worry, you WILL be able to find stuff again.

I think it's funny that we all know the human brain can't comfortably hold more than 7 digits at once (hence the length of a phone number) but we think that having 5000 emails in our inbox "makes sure things aren't dropped."

Do it, Drop it, Delegate it or Defer it. That's what you should be saying when you read an email.

Another great way I got my ever-increasing Inbox down to zero items (if only for a moment) was to make another Inbox, just for items that I was cc'ed on.

I hate the "Reply to all" culture, where folks cover their own butts by cc'ing others to make sure an item "doesn't get forgotten." I'm guilty of it as well, but 9 times out of 10, an email that I'm cc'ed on will not turn into an Action Item.

Notice that "Inbox - cc'ed" doesn't show up in my Favorite Folders. Again, this decreases its hold on me and the pressure one feels when they see a folder name go bold, indicating that more (potential) work has shown up.

I've found this technique, while always evolving, to be fairly effective in keeping me from stressing out TOO much.

What is your mail-handling style?

About Scott

About Newsletter

Hosting By

Comment on this post [8]

Share on BlueSky or use the Permalink and post anywhere!

Hanselminutes Podcast 9

March 10, 2006 Comment on this post [0] Posted in Podcast | ASP.NET | Ruby | XML | Tools

Sponsored By

My ninth Podcast is up. This episode is about ASP.NET, John Lam's Ruby CLR Bridge, and useful tools.

We're listed in the iTunes Podcast Directory, so I encourage you to subscribe with a single click (two in Firefox) with the button below. For those of you on slower connections there are lo-fi and torrent-based versions as well.

Our sponsors are Automated QA, PeterBlum and the .NET Dev Journal.

Do take a look at TestComplete from Automated QA. It integrates with Visual Studio 2005 and I'm going to try to get a formal review of their stuff probably next week, particularly their functional Web Testing and Recording.

As I've said before this show comes to you with the audio expertise and stewardship of Carl Franklin. The name comes from Travis Illig, but the goal of the show is simple. Avoid wasting the listener's time. (and make the commute less boring)

Each show will include a number of links, and all those links will be posted along with the show on the site. There were 15 sites mentioned in this ninth episode, some planned, some not. We're still using Shrinkster.com on this show.
The basic MP3 feed is here, and the iPod friendly one is here. There's a number of other ways you can get it (streaming, straight download, etc) that are all up on the site just below the fold. I use iTunes, myself, to listen to most podcasts, but I also use FeedDemon and it's built in support.

Some other clients are Doppler (also suppose Windows CE), FireAnt, Nimiq, and PrimeTime Podcast.

Note that for now, because of bandwidth constraints, the feeds always have just the current show. If you want to get an old show (and because many Podcasting Clients aren't smart enough to not download the file more than once) you can always find them at http://www.hanselminutes.com.
I have, and will, also include the enclosures to this feed you're reading, so if you're already subscribed to ComputerZen and you're not interested in cluttering your life with another feed, you have the choice to get the 'cast as well.
If there's a topic you'd like to hear, perhaps one that is better spoken than presented on a blog, or a great tool you can't live without, contact me and I'll get it in the queue!

Enjoy. Who knows what'll happen in the next show?

Now playing: Ricky Gervais, Steve Merchant, and Karl Pilkington - Ricky Gervais Show: Season 2, Episode 1

About Scott

About Newsletter

Hosting By

Comment on this post [0]

Share on BlueSky or use the Permalink and post anywhere!

Can't Upgrade a Blackberry 7290 to latest system software

March 08, 2006 Comment on this post [2] Posted in XML

Sponsored By

Sat_screenshot I had an older 7280 "Blue" BlackBerry that died, so our IT org pulled a "Black" BlackBerry out of a drawer. I went to http://www.cingular.com/bbdownloads for the latest firmware so I could run Google Local for Mobile

(Which, BTW, is unbelievably brilliant. I wish I knew what flavor of Java they were using to allow them to have such vast phone support. Do check it out if you have a data plan.)

Anywho, when I launched the Crackberry's Desktop Manager after installing the System Software Upgrade Package, you are supposed to get a prompt telling you that your system is out of date and would you like the new stuff. Nothing. It was as if the Desktop Manager couldn't see the new package, or more likely, that it didn't realize that the new package supported this particular Blackberry.

Regmon and Filemon led me to a file called vendor.xml in C:\Program Files\Common Files\Research In Motion\AppLoader. It's got stuff like this:

<vendor id="0x66" Name="Cingular Wireless">
      <bundle id="System" version="4.0.0.201">
<devicehwid>0x80000503 0x90000503 0x80000403 0x94000503 0x94000403 0x94000903</devicehwid>
      </bundle>
   </vendor>
...etc...

Now, in the registry, under HKEY_LOCAL_MACHINE\SOFTWARE\Research In Motion\AppLoader\SearchPaths\{54DF9FA9-C79E-4BFC-94DE-C56456F9452A} there's a HardwareID listed in decimal, 2617246979 which is 0x9C000503 in hex. It also notes that my BlackBerry is system software version 4.0.0.201.

I wanted to get to 4.0.2.93 and I could see that those files were over in C:\program files\common files\research in motion\shared\loader files.

So, I could go into the vendor.xml file and add the new version of the System software and the mapping to my Device's Hardware ID. Sigh.

Conclusion: Rather than add a new mapping I renamed the vendor.xml file to vendor.foo. Upgraded and everything's lovely. Who has the patience, really?

Elapsed (wasted) time: 9 mins. Damn BlackBerry and their -600 million dollars.

DISCLAIMER: It's your tuckus, not mine, if this violates your IT org's policies, or your ISP's policies. I'm just talking, you're the one who has to take responsibility for pressing buttons and turning dials.

About Scott

About Newsletter

Hosting By

Comment on this post [2]

Share on BlueSky or use the Permalink and post anywhere!

Xml and the Nametable

March 07, 2006 Comment on this post [6] Posted in XmlSerializer

Sponsored By

I got a number (~dozen) of emails about by use of the Nametable in my XmlReader post recently. Charles Cook tried it out and noticed about a 10% speedup. I also received a number of poo-poo emails that said "use XPath" or "don't bother" and "the performance is good enough."

Sure, if that works for you, that's great. Of course, always measure before you make broad statements. That said, here's a broad statement. Using an XmlReader will always be faster than the DOM and/or XmlSerializer. Always.

Why? Because what do you think is underneath the DOM and inside of XmlSerialization? An XmlReader of course.

For documents larger than about 50k, you're looking at least one order of magnitude faster when plucking a single value out. When grabbing dozens, it increases.

Moshe is correct in his pointing out that a nice middle-place perf-wise is the XPathReader (for a certain subset of XPath). There's a number of nice XmlReader implementations that fill the space between XmlTextReader and XPathDocument by providing more-than-XmlReader functionality:

BTW, I would also point out that an XmlReader is what I call a "cursor-based pull implementation." While it's similar to the SAX parsers in that it exposes the infoset rather than the angle brackets, it's not SAX.

Now, all that said, what was the deal with my Nametable usage? Charles explains it well, but I will expand. You can do this if you like:

XmlTextReader tr = 
   new XmlTextReader("http://feeds.feedburner.com/ScottHanselman");
while (tr.Read()) 
{
    if (tr.NodeType == XmlNodeType.Element && tr.LocalName == "enclosure")
    {
        while (tr.MoveToNextAttribute())
        {
            Console.WriteLine(String.Format("{0}:{1}", 
               tr.LocalName, tr.Value));
        }
    }
}

The line in red does a string compare as you look at each element. Not a big deal, but it adds up over hundreds or thousands of executions when spinning through a large document.

The NameTable is used by XmlDocument, XmlReader(s), XPathNavigator, and XmlSchemaCollection. It's a table that maps a string to an object reference. This is called "atomization" - meaning we want to think about atom (think small). If they see "enclosure" more than once, they use the object reference rather than have n number of "enclosure" strings internally.

It's not exactly like a Hashtable, as the NameTable will return the object reference if the string has already been atomized.

XmlTextReader tr = 
   new XmlTextReader("http://feeds.feedburner.com/ScottHanselman");
object enclosure = tr.NameTable.Add("enclosure");
while (tr.Read())
{
    if (tr.NodeType == XmlNodeType.Element &&
        Object.ReferenceEquals(tr.LocalName, enclosure))
    {
        while (tr.MoveToNextAttribute())
        {
            Console.WriteLine(String.Format("{0}:{1}", 
               tr.LocalName, tr.Value));
        }
    }
}

The easiest way, IMHO, to think about it is this:

If you know that you're going to look for an element or attribute with a specific name within any System.Xml class that has an XmlNameTable, preload or warn the parser that you'll be watching for these names.
When you do a comparison between the current element or attribute and your target, use Object.ReferenceEquals. Instead of a string comparison, you'll just be asking "are these the same object" - which is about the fastest thing that the CLR can do.

Yes, you can use == rather than Object.ReferenceEquals, but the later makes it totally clear what your intent is, while the former is more vague.

This kind of optimization makes a big perf difference (~10% depending) when using an XmlReader. It makes less of one when using an XPathDocument because you are using Select(ing)Nodes in a loop.

Stealing Charles' words: "...because it involves very little extra code it is perhaps an optimization worth making prematurely."

Even the designers agree: "...using the XmlNameTable gives you enough of a performance benefit to make it worthwhile especially if your processing starts to spans multiple XML components in a piplelining scenario and the XmlNameTable is shared across them i.e. XmlTextReader->XmlDocument->XslTransform."

Oleg laments: "...that something needs to be done to fix this particular usage pattern of XmlReader to not ignore great NameTable idea."

Conclusion: The NameTable is there for a reason, no matter what System.Xml solution you use. This is a the correct and useful pattern and not using it is just silly. If you're going to develop a habit, why not make it a best-practice-habit?

About Scott

About Newsletter

Hosting By

Comment on this post [6]

Share on BlueSky or use the Permalink and post anywhere!

Newer Posts >>

<< Older Posts