Scott Hanselman

Subtle Behaviors in the XML Serializer can kill

May 25, '06 Comments [3] Posted in XmlSerializer | Bugs
Sponsored By

Dan Maharry is/was having a heck of a time with the XmlSerializer after upgrading an application from .NET 1.1 to .NET 2.0.

Given this XSD/schema:

<element name="epp" type="epp:eppType" />
<complexType name="eppType">
  <choice>
    <element name="hello" />
    <element name="greeting" type="epp:greetingType" />
  </choice>
</complexType>

The .NET 1.1 framework serializes a greeting element thusly (actually by incorrect and lucky behavior in the 1.x serializer):

<?xml version="1.0" encoding="utf-8"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0">
  <greeting>
    <svID>Test</svID>
    <svDate>2006-05-04T11:01:58.1Z</svDate>
  </greeting>
</epp>

but although it seemed to be fine initially in .NET 2.0, he started getting this instead.

<?xml version="1.0" encoding="utf-8"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0">
  <hello d2p1:type="greetingType" xmlns:d2p1="http://www.w3.org/2001/XMLSchema-instance">
    <SvID>Test</SvID>
    <svDate>2006-05-04T10:55:07.9Z</svDate>
  </hello>
</epp>

Dan worked with MS Support and filed a bug in the Product Feedback labs and attached an example if you'd like to download it.

Unfortunately, this isn't a bug. The problem is caused by the ordering of the elements in the original schema causing the XmlElement attributes to stack in the same order resulting in the wrong semantics:

   [System.Xml.Serialization.XmlTypeAttribute(Namespace = "urn:ietf:params:xml:ns:epp-1.0", TypeName = "eppType")]
   [System.Xml.Serialization.XmlRootAttribute("epp", Namespace = "urn:ietf:params:xml:ns:epp-1.0", IsNullable = false)]
   public class EppType
   {
      private object item;

      [System.Xml.Serialization.XmlElementAttribute("hello", typeof(object))]
      [System.Xml.Serialization.XmlElementAttribute("greeting", typeof(GreetingType))]
      public object Item
      {
         get
         {
            return this.item;
         }
         set
         {
            this.item = value;
         }
      }
   }

The problem is that the semantics of the schema and the resulting XmlSerializer attributes say "This object can be either an object or a GreetingType." Well, a GreetingType IS an object, so the 2.0 serializer happily obliges.

Reversing those two lines in the XSD and regening the CS file with XSD.EXE expresses the correct intent. "This object can be a GreetingType or any other kind of object." and the expected (original) output is achieved. If Dan can't change the original schema (which is likely wrong) then he'll have to change the generated code to get the semantics he wants. Not a bad thing, actually. I did the same thing with the code generated from the OFX schemas.

Using a previously published tip called HOW TO: Debug into a .NET XmlSerializer Generated Assembly I add an app.config with these lines:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <system.diagnostics>
    <switches>
      <add name="XmlSerialization.Compilation" value="1" />
    </switches>
  </system.diagnostics>
</configuration>

And check the contents of the Temp Directory by going Start|Run and typing in "%temp%" and pressing enter. I then sort by Date Modifed.

The contents of my temp folder

I run the test program twice, once the original way and once with the lines reversed (my "fix") and diff the geneated .cs files in BeyondCompare.

Debuggingserializer3

You can see from the picture above exactly where the difference is, in the middle of a series of if/elseifs that basically are saying "what kind of object is this?"

The XmlSerializer is glorious and wonderful until it's totally not. I know that's not going to make Dan or his team feel better, but hang in there, it gets better the more you use it.

UPDATE: Dan has an interesting update that points out that the order of the attributes generated isn't regular, nor is the order they come back via reflection. James weighs in as well. My solution worked because there were only two attributes. Nutshell - order matters, but it's not regular.

I'm not defending the XmlSerializer folks, although it may sound like I am. James says "it looks like a bug to me." Personally I think it's less a bug and more a complex and poorly documented edge case that highlights the funamental differences between the XML type system and the CLR type system. At the edges, it's dodgey at best.

I think where we're all getting nailed here is that that XSD Type System can represent things that the CLR Type System can't. Full stop.

In Schema, xs:choice is a complex thing, much like unions in C. The XmlSerializer chooses to present xs:choice as a Object that you have to downcast yourself. The mapping is uncomfortable at best. However, beyond this one uncomfortable type mapping, there are structures you can present in Schema that simply have no parallel in the CLR and the mappings won't ever been 100%. This is just what happens when translating between type systems. The same thing happened(s) for years with nullable DB columns as simple types got translated into the CLR and we leaned on IsDBNull. With the XmlSerializer they introduced the whole and parallel field with a "Specified" suffix.

In this instance, if it were me using this schema and dealing with these documents I'd switch over to implementing IXmlSerializable. IXmlSerializable provides coverage for the final few percent that the XmlSerializer doesn't provide.  It doesn't solve the problem of mapping between type-systems, but it at least puts YOU in control of the decisions being made.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Thursday, May 25, 2006 2:16:08 PM UTC
I was not happy with the whole XmlSerializationAttribute related stuff in .NET 1.1 (and grandfathered in to 2.0). Not following encapsulation and having to have a setter and getter for each property basically said "code smell" to me. I have been using the IXmlSerializable interface and XmlSchemaProviderAttribute instead (more work, but more control over how the items get serialized [CONSISTENTLY]). Here are some good articles that I have referenced:

XmlSchemaProviderAttribute:
http://shrinkster.com/fdz

IXmlSerializable:
http://shrinkster.com/fdx
http://shrinkster.com/fdy

Type Matching for Xml and .NET:
http://shrinkster.com/fdv
JH
Thursday, May 25, 2006 6:28:42 PM UTC
Since this seems to be the "fling poo at our beloved XmlSerializer" post today, I'll throw another handful out... IMHO, the 2.0 XmlSerializer support for Nullable-of-T (your validator is triggered by C# generic declarations) is awful. If this support were complete, the whole xxxSpecified thing could be ignored for the rest of time as a 1.x hack. As it is, Nullables as optional elements require the use of xsi:nil (worthless for backward compatibility without a PSVI) and Nullables as optional attributes aren't supported at all. Arg! Isn't "MyDateProp == null" more elegant than using MyDatePropSpecified (especially when reusing an XmlSerializable class not constructed by the serializer)?

I filed a bug on this when I first tripped over it, but it wasn't 'til beta2, and it was too late by then.

http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=3b9ee554-d097-433a-a062-7fc2986c779b

I haven't played with the Indigo/WCF XML serialization layer much yet, but hopefully it has better support for nullables. :)
Matt Davis
Friday, May 26, 2006 9:30:30 AM UTC
You're right - it's definitely an edge case. The XSD.exe approach to schema / class mapping, though, has always been to reject schemas that incorporate edge cases it can't hope to accurately map. Perhaps the right thing here is for xsd.exe to throw out xs:choice elements like this one, and refuse to generate classes that it can't guarantee will serialize correctly. I'd certainly rather that xsd.exe stuck to guaranteeing that the classes it constructs can only be used to construct object graphs that are guaranteed to serialize safely to schema-valid XML.

Looks like IXmlSerializable is the answer here...
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.