Scott Hanselman

The Myth of XML Purity?

April 8, '04 Comments [11] Posted in Web Services | XML | Tools
Sponsored By

Here's a hypothetical.  Say there is an client I'm working with that needs to return Valid XML from their system.  They've given me XML Schemas and said they are representative of the XML returned.  Since Valid follows Well-Formed, sounds good.

Then someone mentions, "oh, well, we can't guarantee that there won't be some < or > or & in the element content.  But, that's no problem, right?"

I said, "Well, then technically you are not sending us XML.  If you can't escape (or CDATA) out the stray content with < >, then you're not even returning less-than/greater-than delimited files. What if I gave you content like this "123123324","2003-04-05","Scott ",Hans,"elman","Portland?"  We have to agree on some fundamentals here.  The XML 1.0 spec (and all tools based on it) is very specific." (They won't even CDATA the stuff)

The response? "Well, that's a purist's viewpoint."

I guess I got too mired in the Judeo-Christian Ethic of "Thou shalt not return malformed XML."

QUESTION: What level of Dante's Inferno would I be relegated to if I pre-process this XML-y (pronounced: 'smelly') to make it well-formed?

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb
Thursday, April 08, 2004 7:27:59 PM UTC
When is a rose, not a rose? When it's XML-y.

His comment is a cop-out. If it's defined to a certain way, I think you can reasonably expect it to be that way. He might as well tell you that up is down. Why can't he just escape it?

"Here ya go. Here's a cheese burger".
"I asked for tofu. I'm a vegetarian."
"Close enough. Cheese burger has lettuce, a vegetable. Thus its vegetarian food."
"It's not the same! A cheese burger has cow meat. It's NOT vegetarian food!"
"Well, that's a purist's viewpoint."
Thursday, April 08, 2004 8:22:03 PM UTC
Ah clients - ya gotta love them :) Back in the days when Apache and Microsoft SOAP toolkits were all in beta I had a problem with a J2SE client who insisted booleans had to be encoded as "true" or "false" (the Apache way) while Microsoft's toolkit insisted on "1" and "0". The spec at the time, if I remember correctly, said either way was fine, so we were both right and but still incompatible. Since they were the ones paying us you can guess which party had to hack code up...
Thursday, April 08, 2004 8:28:25 PM UTC
Ask them if it will be a problem if you insert some extra 1's and 0's into the binary you deliver. 1's and 0's are perfectly valid in a binary file, and its part of your current process to add some as your "signature". They will probably have to do some "pre-processing", unless they have a modified runtime.

I know that doesn't fly, because they are the customer, but maybe it will help illustrate the point.
Thursday, April 08, 2004 8:57:06 PM UTC
It's stuff like this that blows the whole "the customer is always right" theory right out of the water.
Thursday, April 08, 2004 9:27:52 PM UTC
Somehow I wonder how hypothetical this is. You could just agree as long as they let you send responses with a few EBCDIC characters in them - hey, they can just preprocess them out, right?
Sadly, it's probably going to come down to who's paying whom.
Thursday, April 08, 2004 11:27:37 PM UTC
Even if they are signing the checks, you do have some measure of control. Just kindly remind them that sending XML-y replies, while manageable, will increase the price of the software and push out the release date.

"Given enough time and resources, anything is possible."
Friday, April 09, 2004 2:34:28 AM UTC
Awww.... the limiting of XML to be a one off integration.

Actually I think that preprocessing is the correct way to handle it Scott. Some integration partners (or customers) just use a text file that has some structure and lots of angle brackets. It is always a good thing to stop and look around at the state whole business community (non-computer industry). We are spoiled in many ways we have been living and breathing XML, Web Services, WS-I-[insert random letters], etc over the past few years.

I try to rationalize the use of these technologies *correctly* by myself internal to my application versus now someone extends and interfaces with it in a less than desirable fashion. If they do not play nice, they lose the obvious advantages. How people run fiber optic between Portland and Denver is not how you have to connect a phone to the wall. People make trade offs all the time that make no sense to me in many ways but it does to them.

I try to put my preprocessors, adaptors etc in place to isolate me from their less pure methodologies and move on so I can sleep well.
Bart Elia
Monday, April 12, 2004 11:55:38 PM UTC
Aw stop whingein Scott -- this is what customer's are for.

They ask and we do. (And they pay, of course)

Just don't let any documentation (or notes or minutess) refer to the data as XML. It's malformed data, or unclean data or non-compliant XML-like data -- whatever, but it's not XML, and it's not even XML-y (that sounds too official)(although it is a cute name, i'll give you that). This data however, is *not* XML. it never was XML. It just looked a little like XML.

But smile and nod and be happy to pre-process it. They're just the client, they're not expected to know what XML is. That's your job. They came close, sure, but they missed it.

Client's often create much worse things than a few unescaped brackets... by comparison, what they're giving you is gold!


Tuesday, April 13, 2004 8:12:45 AM UTC
If you want to increase the purity of the solution, take your "preprocessing" step and call it a "postprocessing" step for the other system. That way it's producing valid XML which you're consuming. Then when the other system can be fixed more elegantly, the postprocessing can be removed, but at least these tightly-coupled changes occur within the same system.
Wednesday, April 14, 2004 6:11:10 PM UTC
The next time the do that, give 'em a good swift kick in the junk and tell 'em it's from me.

At least you got a schema, we have "mystery XML" where I'm working. We never know what's going to come down the pipe. Last week they decided to substitute the patients record number for the patients gender. So what we thought was a simple trinary value (yes I said trinary, "Male", "Female" and "Unknown" I kid you not) turned into a mishmash. Had we a schema to validate against, we could have said "well that's not a value we're familiar with so reject it" instead the automated feed created a duplicate row for every patient in the database. And they laughed when I suggested an hourly transaction log backup, thank god I didn't listen to them.
Thursday, April 15, 2004 5:40:47 AM UTC
Give them a DTD (or an XSD) and tell them if it doesn't validate, it's outside the spec and won't be accepted!

Dammit, we need to make people take XML seriously. Otherwise we're back in the world of half-baked HTML and "whatever IE allows"...
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.