Scott Hanselman

Loading XmlSchema files out of Assembly Resources

March 12, 2006 Comment on this post [2] Posted in XML
Sponsored By

I've been doing some validating of documents against an XSD lately. Validation is pretty straight forward, you take any XmlTextReader and wrap and run it through the XmlValidatingReader. The ValidationEventHandler will call you back if there's any trouble. You can poke around in the document if you like, while the validation happens, but when I'm just validating I do a while(reader.Read()) as you'll see.

I have a PILE of .XSD files - 64 of them - that represent a single specification. I load the most-leaf node to load whole spec:

XmlSchemaCollection schemas = new XmlSchemaCollection();

XmlReader reader = new XmlTextReader("TheMainSchema.xsd");

XmlSchema schema = XmlSchema.Read(reader, null);

schemas.Add(schema);

 

XmlReader readerDoc = new XmlTextReader(TheFileYouWantToValidate.xml");

XmlValidatingReader newReader = new XmlValidatingReader(readerDoc);

newReader.Schemas.Add(schemas);

newReader.ValidationEventHandler += new ValidationEventHandler(OnValidate);

 

while ( newReader.Read() );

newReader.Close();

I wanted an assembly that was self-contained and would hold all 64 of these XSD files internally as resources, and I didn't want to put them in a temp directory.

I added all the schemas to the project, right clicked "Properties" and set them all to Embedded Resources. When you request an embedded resource you need to ask for the file using the original file name as well as the namespace. Use Reflector to determine what the ultimate fully qualified resource name is if you have trouble.

It's easy to pull the main schema out of it's resource and pass the Stream into XmlSchema.Read. It's slightly less obvious how to get that schema to resolve its imports.

Schemas may reference other schemas like this:

<xsd:schema targetNamespace="foofoo"
        xmlns:xsd="
http://www.w3.org/2001/XMLSchema">
   <xsd:include schemaLocation="SomeIncludedSchemas.xsd"/>
   <xsd:include schemaLocation="SomeIncludedSchemas2.xsd"/>

In this, and most, cases schemaLocation refers to a relative file. However it could refer to a URL, or some custom scheme. Personally I find the "relative filename" style to be the most flexible. I don't like to bake too much knowledge about the outside world into my schemas. On this project, it's a (light) requirement that we use the specification schemas unchanged.

Assembly a = Assembly.GetExecutingAssembly();

Stream stream = a.GetManifestResourceStream("MyNamespace.TheMainSchema.xsd");

 

XmlSchema x = XmlSchema.Read(stream,
    new ValidationEventHandler(SchemaValidationEventHandler));

 

x.Compile(
    new ValidationEventHandler(SchemaValidationEventHandler),
    new MyCustomResolver(a));

 

schemas.Add(x);

Note the instance call to XmlSchema.Compile. The XmlSchema class will use a FileSystemResolver by default and fail to find the other 63 schemas. So, I pass in a custom resolver that will find the correct schema given the URI (the value in the schemaLocation attribute) and return it, in this example, as a stream.

private class MyCustomResolver : XmlUrlResolver

{

    private const string MYRESOURCENAMESPACE = "MyNamespace.{0}";

    private Assembly resourceAssembly = null;                               

 

    public MyCustomResolver(Assembly resourceAssembly)

    {

        if (resourceAssembly == null)
           throw new ArgumentNullException("resourceAssembly must not be null");

        this.resourceAssembly = resourceAssembly;

    }

 

    override public object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)

    {

        if(absoluteUri.IsFile)

        {

            string file = Path.GetFileName(absoluteUri.AbsolutePath);

            Stream stream = resourceAssembly.GetManifestResourceStream(
               String.Format(MYRESOURCENAMESPACE, file));

            return stream;

        }

        return null;

    }

Here we just grab the relative filename from out of the file:/// URI that we're passed into GetEntity each time a schemaLocation needs to be resolved. Works like a charm. I wrap the whole thing in a factory method and cache the compiled XmlSchemaCollection so we don't load and compile this more than once.

There's a few ways one might want to extend this. I've seen folks build Assembly schemas like assembly:/// and embed stuff in the schemas, but eh, who has the time. This is simpler, IMHO and works for relative file locations and didn't take 10 minutes to write.

Quote of the day: I'm not a control freak, I'm a control enthusiast.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Hosting By
Hosted in an Azure App Service
March 12, 2006 15:21
This is perhaps the one instance where Google would fail you. I couldn't find any reasonable search (such as "xmlvalidatingreader load schema from resource") that would lead you to a quicker solution than having to write this on your own. I've written a small article about this thing ages ago:
http://www.tomergabel.com/XML+Schema+Inclusion+In+Resources.aspx

I'd appreciate any ideas on how to make this more accessible - it makes no sense that people should have to re-invent this every other Tuesday.

March 13, 2006 3:50
If you're using VS 2003, a silly programmer trick makes dealing with embedded resources a little less frustrating: http://www.palladiumconsulting.com/blog/sebastian/?p=4

Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.