Scott Hanselman

Inception-Style Nested Data Formats

September 30, '14 Comments [43] Posted in Musings
Sponsored By

Dan Harper had a great tweet this week where he discovered and then called out a new format from IBM called "JSONx."

"JSONx is an IBM standard format to represent JSON as XML"

ByUETcKIIAELf7h

Oh my. JSONx is a real thing. Why would one inflict this upon the world?

Well, if you have an existing hardware product that is totally optimized (like wire-speed) for processing XML, and JSON as a data format snuck up on you, the last thing your customers want to hear is that, um, JSON what? So rather than updating your appliance with native wire-speed JSON, I suppose you could just serialize your JSON as XML. Send some JSON to an endpoint, switch to XML really quick, then back to JSON. And there we are.

Storing a BMW inside another

But still, yuck. Is THIS today's enterprise?

In 2003 I wrote a small Operating System/Virtual Machine in C#. This was for a class on operating systems and virtual CPUs. As a joke when I swapped out my memory to virtual memory pages on disk, I serialized the bytes with XML like this:

8

Hilarious, I thought. However, when I showed it to some folks they had no problem with it.

DTDD: Data Transformation Driven Development?

That's too close to Enterprise reality for my comfort. It's a good idea to hone your sense of Code Smell.

Mal Sharmon tweeted in response to the original IBM JSONx tweet and pointed out how bad this kind of Inception-like nested data shuttling can get, when one takes the semantics of a transport and envelope, ignores them, then creates their own meaning. He offers this nightmarish Gist.

--
Http Response Code: 200 OK
--







{"errorCode":"ItemExists","errorMessage":"EJPVJ9161E: Unable to add, edit or delete additional files for the media with the ID fc276024-918b-489d-9b51-33455ffb5ca3."}

Here we see an HTML document returned presumably is the result of an HTTP GET or POST. The response, as seen in headers, is an HTTP 200 OK.

Within this HTML document, is a META tag that says, no, in fact, nay nay, this is not a 200, it's really a 409. HTTP 409 means "conflict," and it usually means that in the context of a request. "Hey, I can't do this, it'll cause a conflict, try again, user."

Then, within the BODY OF THE HTML with a Body tag that includes a CSS class that itself includes some explicit semantic meaning, is some...wait for it....JSON. And, just for fun, the quotes are HTML entities. "e;

What's in the JSON, you say?

{
"errorCode": "ItemExists",
"errorMessage": "EJPVJ9161E: Unable to add, edit or delete additional files for the media with the ID fc276024-918b-489d-9b51-ffff5ca3."
}

Error codes and error messages that include an unique error code that came from another XML document downstream. Oh yes.

But why?

Inception Nested Data FormatsIs there a reason to switch between two data formats? Sure, when you are fully aware of the reason, it's a good technical reason, and you do it with intention.

But if your system changes data formats a half-dozen times from the moment leaves its authoritative source on its way to the user, you really have to ask yourself (a half-dozen times!) WHY ARE WE DOING THIS?

Are you converting between data formats because "but this is our preferred format?" Or, we had nowhere else to stick it!

Just because I can take a JSON document, HTML encode it, tunnel it with a SOAP envelope, BASE64 that into a Hidden HTML input, and and store it in a binary database column DOES NOT MEAN I SHOULD.

Sometimes there's whole teams who exist only as a data format transformation as they move data to the next team. 

Sound Off

What kinds of crazy "Data Format Inception" have you all seen at work? Share in the comments!

UPDATE: Here's an insane game online that converts from JSON to JSONx to JsonML until your browser crashes!

P.S. I can't find the owner of the BMW picture above, let me know so I can credit him or her and link back!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb
Tuesday, 30 September 2014 01:37:44 UTC
Along time ago, I work for a company that purchased those DataPower appliances. They're pretty impressive machines that do a lot of the heavy lifting with XML and XSLT.

At that same company, there was it a piece of the app that used a SOAP envelope to transfer data to/from a mainframe stack (COBOL, Assembler). At the time I thought, "that's stupid!" but after getting to know how the system (Java based) dealt with the EBCDIC translation, I knew that SOAP was a better way. Easier to troubleshoot and get going when issues came up.
Tuesday, 30 September 2014 01:41:41 UTC
With the exception of services returning simple value types, and some internal RPC type services, every SOAP web service designed for integration I've encountered has used a payload of "string" or "xml" and effectively just used SOAP as a transport or security wrapper because no-one has a clue how to version and migrate SOAP.
Craig D
Tuesday, 30 September 2014 03:08:54 UTC
About the picture. Can't tell you who the author is but what I can tell you that the sign on the building says: "Hair Salon" and it is in Russian.
Alex Kravchenko
Tuesday, 30 September 2014 04:47:33 UTC
I worked at a place that had an internal "message template" standard for what was essentially a home-grown knock-off of a SOAP envelope which they then tunneled inside an actual SOAP envelope, while using none of its actual SOAPness. I was able to rinse off the inner "SOAP" and map all of its properties to actual SOAP properties.

I also worked with a system that occasionally needed to serialize its data to WDDX, an old ColdFusion XML data serialization format. Instead of just serializing when needed, objects in the C# domain model had property setters and getters that immediately serialized and deserialized to and from a single string backing field for the entire object. As I commented at the time, premature serialization is the root of all evil.
Oran
Tuesday, 30 September 2014 07:47:22 UTC
First though was: "How do they handle an array of objects" and then i slowly backed out of it.....
Franc Schiphorst
Tuesday, 30 September 2014 08:07:30 UTC
Erm... so how about that situation where people load up entity instances (rows) from a database into perfectly fine entity class instances (objects) which are then converted into instances of other classes for no other reason but 'it's right, at least someone said so', which are then converted into JSON by a service which is then sent to a client which converts this JSON data back to 'objects', which are then used by JS to build a UI which contains a Grid which is then converted to HTML as the browser works with that.

And yes, that's how a lot of people write web applications these days. And it's absurd, but they do it anyway, because it's 'the right way to do it', at least they think it is. My point: it's easy to laugh at the IBM example, but frankly the current webdev community is not being any more clever, on the contrary. So a typical case of pot-kettle-black.

The IBM example might not be what you'd do in that situation, but chances are they have a heck of a lot more experience with backwards compatibility on their systems than you do: DB2, a database which is around for a long long time (first release was in '83), can still run applications which are written decades ago, simply because there are customers which run these apps. Those customers are their bread and butter; they don't write new apps every week, they keep the ones that work around for a long long time, because rewriting them might take a long time and a lot of work/money. The average web-developer might think that the world consists of mostly developers who write new stuff but that's simply not true: most software devs do maintenance on existing software, and thus are faced with problems like this: how to make sure existing large systems, which were perhaps booted up the first time before the developer was even interested in computers, keep working with modern new clients?

This 'workaround' of json-as-xml might look silly, but it's a way to keep existing applications alive without expensive rewrites, this is perhaps the best way to solve it, as it can make new clients work with existing back-ends (with perhaps a helper converter on that end, but that's it).

Not one of your best articles, Scott.
Tuesday, 30 September 2014 08:08:17 UTC
Be careful of what you ask for...

I worked in a company that made some B2B products. At one time we were tasked to transform/import data from a public data provider to make invoices and such... the format was in XML: Because thats what the government demands..

Sure enough, the data was XML... with a big fat CDATA chunk of CSV data ... fuck sake.
Martin Kirk
Tuesday, 30 September 2014 08:30:57 UTC
In your lovely HTML/CSS/JSON example I think the worst bit is the 200/409 HTTP status bit. HTTP doesn't care very much what data you send over it, but it does care about its status codes. They're comprehensive, documented, fixed, and understood by many people and technologies.

I've seen many examples where people just use 200 for everything (most probably because that's what the framework they're using does easily) and use their own payload to replace HTTP status codes. It's especially beyond me that the example you show is not ignorant of status codes, but actively decides to circumvent them.

If you're using HTTP, you should use HTTP. If you decide to send some crazy stuff over the wire, that's kind of up to you. That's not to say that I agree with JSON inside CSS inside HTML as a data format.
Tuesday, 30 September 2014 08:31:36 UTC
At work we have an API with JSON embedded inside JSON as a string:

{
"type": "ItemAddedEvent",
"payload": "{\"id\":42,\"name\":\"foobar\"}"
}

Instead of just this:

{
"type": "ItemChangedEvent",
"payload": { "id": 42,"name": "foobar" }
}

Well, at least it's still only JSON...

(There is actually a reason for doing this: to make it easier for the client to deserialize the payload based on the type. But I don't think that reason is good enough to justify such a horrible design...)
Tuesday, 30 September 2014 08:50:10 UTC
When setting up a file transfer to Java system was told by a Java Lead Developer that the following was not valid XML as it did not have a SOAP Header
<root>
<item key="MyKey" Value="MyValue" />
<item key="MyKey2" Value="MyValue2" />
</root>

His solution was for me to Base64 encode the data as tab delimited and embed it in a static SOAP header.
Fizzelen
Tuesday, 30 September 2014 09:11:18 UTC
There is a need for XML/JSON transformations.

We have a real need for JSON to XML and XML to JSON -- a legacy XML based J2EE app that needs to be extended to support a proper RESTful web service which should ideally support both XML and JSON payloads.

However, the horrible idea that is JSONx does not meet our very real need nor can I imagine it meeting any real need.
Bojan Markovic
Tuesday, 30 September 2014 09:27:33 UTC
http://thedailywtf.com/Articles/Web_0_0x2e_1.aspx
Aaron
Tuesday, 30 September 2014 09:29:42 UTC
The first thing that came to my mind is this hilarious Daily WTF article:
XML vs CSV : The Choice is Obvious

There are so many other lovely examples on their site :D
XML'd XML
Thomas
Tuesday, 30 September 2014 09:37:49 UTC
I agree that at first glance this seems completely ridiculous! However I can think of a Microsoft product that is very similar: the BizTalk Server HL7 Accelerator.

That product is a set of schemas and pipeline components which convert between HL7 (pipe and hat delimited EDI format for the healthcare sector) and XML. It seems ridiculous, but BizTalk is inherently XML-based for mapping, orchestrating, etc. and in that context it makes complete sense to have an XML representation.

I'm not saying JSONx doesn't look crazy, but sometimes there is a bigger picture to consider.
Chris
Tuesday, 30 September 2014 09:46:32 UTC
Scott, you've introduced a real thought provoker with this post. Could the proliferation of data exchange formats be the cause of this? Are we trying to be all things to all people? Have we dug ourselves into a hole because we've been too quick to implement the latest new thing? New/updated data formats may solve specific problems and may improve our ability to do things faster, but have we really improved or has the code just gotten more complicated. Has the availability of high bandwidth, cheap components just made us more lazy? Do we care about efficient streamlined code and simple common data structures? Situations like those mentioned here make me wonder how many resources (people and computing) are wasted to support this mess of "standards"(???) we've gotten ourselves into.
Karl Fleischmann
Tuesday, 30 September 2014 10:21:28 UTC
Completely agree with Frans.

Also I'd add that SQL Server doesn't support JSON (except as a blob) but does support XML... Sometimes legacy is all we have to work with.
Ben
Tuesday, 30 September 2014 10:36:04 UTC
This reminds me of the fabulous Daily WTF article Embedding The Embedded Embedding:
Javascript embedded in HTML embedded in a SQL query embedded in an ASP page to display videos embedded in OBJECT tags embedded in HTML embedded in the database.
Tuesday, 30 September 2014 12:31:05 UTC
BizTalk does the same thing. http://msdn.microsoft.com/en-us/library/dn789173%28v=bts.80%29.aspx

For better or worse, enterprise vendors bet big on XML. Unfortunately (for them) that bet hasn't paid off and now they're faced with either going through a costly rewrite of their products or performing strange conversions like this to protect their investment.
Tuesday, 30 September 2014 13:29:35 UTC
For the honorary Canadian (Scott):

Speaking of nested data formats, here's a fun json <-> jsonX <-> jsonML "game"/browser crash simulator: http://orihoch.uumpa.com/jsonxml/
Kori Francis
Tuesday, 30 September 2014 13:35:51 UTC
This isn't that bad, let me tell you about the time I had to consume the UK's Driver and Vehicle Licensing Authorities remote service for checking driver and vehicle data...

The spec seemed innocent enough - XML request and response, nicely detailed document for both. The devil started in the details.

The request and response packets, while looking like XML, had to had a specific layout - new lines in the right place, no comments, certain fields on certain lines etc etc. Apparently the DVLA had written its own parser.

Then came the limitations - no more than 100 driver or vehicle requests at a time, no more than one single request a day. That's it.

Ok, so how do we send the request and get the response? Web service? No, that would be way way too easy. They wanted the request file SCPed to their own SSH server end point, with responses SCPed back again at some point in the next 24 hours.

Right, so we can do this over the internet, after all we had to exchange SSH keys with them in order to gain access? No. We had to have a dedicated leased line installed to connect us to the Government Secure Intranet. WHY IN GODS NAME?!

This was in 2009 - as far as I'm aware its still running.
Richard Price
Tuesday, 30 September 2014 14:03:26 UTC
I would call this "Overencapsulation" or maybe "Overabstraction", this happens unfortunately due to the fact that the computer world as we know it today is nothing but an abstraction of an abstraction of an... of the basic, primitive computer systems of yesterday.

In fact progress in the computer world evolves mostly utilizing whatever was first and wrapping it around new covers... this has affected our minds in a way that this is the only way we know in how to operate and it becomes difficult for us to come up with new original ways to do things unless it is to continue to encapsulate...
Tuesday, 30 September 2014 14:23:22 UTC
It's pretty easy to whip up absurdities from whole cloth and it's certainly important to try to use the most efficient and effective data formating. Which, I'd guess, is what IBM did here.
Real business engineering sometimes requires solutions that appear awkward and asinine from the outside because the outside viewer only gets the view available from the porthole. There's usually quite a bit going on that can't be seen but which applies constraints and requirements that contort the eventual solution. Often the most onerous constraints include "it has to work on time and in budget."

Which doesn't mean that people don't sometimes do stupid things either from a sense of frivolity or from ignorance. Or possibly a lack of mastery. And, as per TDWTF, also doesn't preclude laughing at the absurdity of it all.
Grady
Tuesday, 30 September 2014 14:40:39 UTC
Really no sense!!!
Carlos dos Santos
Tuesday, 30 September 2014 14:52:00 UTC
Frans makes a great point. Conversions seem silly from a certain point of view in nearly all cases and yet they are absolutely vital from other points of view. Indeed, one may say that software is all about conversions of data from machine to machine, context to context, and most importantly, machine to human and vice versa.

Still, I have to agree with Scott. JSONx? Yuck!
Tuesday, 30 September 2014 16:07:59 UTC
How about 'classic' ASP served over SAP ALE? Ok, more Frankenstein than Inception...
What could be worse? Hmmm... Base64 encoded pgp signed Fortran punchcards scans in Tiff with metadata?
Crosre
Tuesday, 30 September 2014 16:25:15 UTC
Great Article! I took a screenshot of it and sent it as an attachment to my IT Department's efax number.
Erik Moore
Tuesday, 30 September 2014 18:24:10 UTC

Within this HTML document, is a META tag that says, no, in fact, nay nay, this is not a 200, it's really a 409.


This is actually 'industry-standard practice'. For a real-world example which is used just about everywhere, try cXML

Relevant section:


Because cXML is layered above HTTP in most cases, many errors (such as HTTP
404/Not Found) are handled by the transport.


Which would be fine, except that the cXML status mimic HTTP status. So yeah, a cXML OK is 200, 204 is No Content, and so on.

So, you can hit a webservice, get a HTTP 200, and as a response a gem like this:


<cXML payloadID="9949494@supplier.com"
timestamp="2000-01-12T18:39:09-08:00" xml:lang="en-US">
<Response>
<Status code="200" text="OK"/>
</Response>
</cXML>
Stephen Eilert
Tuesday, 30 September 2014 18:41:45 UTC
I just wrote a client for a 3rd party web service. Their SOAP endpoint returns an XML file, the body of which is a base64-encoded zip file containing another XML file which is the actual result of the request. They have a rest endpoint as well, but I think that one just wraps the the outer XML file in JSON...
Rory
Tuesday, 30 September 2014 19:52:39 UTC
Frans B et al,

I think Scott probably understands this. Personally, I don't think he's posting a critique on IBM. I believe he's just highlighting the irony of our situation on a broader scale -- the example being that JSON, a lightweight protocol intended to displace and consolidate heavier protocols, actually necessitated and begat another protocol that did need to exist beforehand.

We'll never get away from legacy protocols and systems.

-Oisin
Tuesday, 30 September 2014 20:38:37 UTC
I was once tasked with writing getting an application to talk to a web service. Problem: The programming environment for the application had no facilities for calling webservices--or any facility for calling anything outside of itself except for stored procedures in Sql Server. So the hops went like this:

Application => Sql 2000 Stored Procedure => Invokes DTS Package => Runs VB Module => Loads .NET Assembly => Invokes Remote Web Service.

My understanding is that this is still running.
Tuesday, 30 September 2014 21:32:32 UTC
Stephen - And that's good? Or we agree it's at least WEIRD?
Scott Hanselman
Tuesday, 30 September 2014 21:43:13 UTC
I can't find the owner of the BMW picture above, let me know so I can credit him or her and link back!

Hm, it's problem, because it's crazy Russia :) All I can say about him/her car registration number is that it's registered in Moscow Oblast. I can't found it in public databases...
Tuesday, 30 September 2014 22:39:31 UTC
Oh fu-
I misunderstood you. Sorry.
Tuesday, 30 September 2014 22:47:44 UTC
In my opinion I think the issue that some people are missing is not that you have to convert formats, that is pretty much expected between systems in a legacy environment, but that they created a new format, JSONx, to represent that conversion.

Say I take in a csv file and convert it to json, do I need a new format called CSVJson? Or if I use an ORM to bring in sql data to json do I need a SQLJson format that has everything listed as table and column properties?

Or to further illustrate the point with the given example, why isn't the XML that is created just <ticker>IBM</ticker>?

To try to create a whole new standardized format for representing json as xml is, at least in my opinion, the main issue here and not the fact that you might have a use case where you need to convert from json to xml or vice versa. Such as this website http://www.utilities-online.info/xmltojson/ which manages to do it without the need for JSONx.

So I think peoples examples saying that front end developers do it, or biztalk does it, or whatever are misguided, in those cases they aren't creating a new intermediary standardized format, they are just doing a translation/transformation.
Chris Meyers
Wednesday, 01 October 2014 01:01:13 UTC
I don't like that idea, either. However, it seems to support kinda strongly-typed variables (int vs float, for example).
Johnny
Wednesday, 01 October 2014 09:34:12 UTC
Not completely related but here's another story of data conversion which is ridiculous:

Someone I know is a sales rep for a big company and he goes to stores to write down orders and have them delivered to the store. All sales reps have laptops. But since he's not very good with laptops he prefers to write the orders down. In the evening when he gets home, he copies his written notes to an excel template. Then he prints the excel document and faxes it over to the headquarters in Brussels. There sits a guy who receives the fax and types all of this over into an e-mail. This e-mail gets sent to the headquarters in Germany where they manually enter it into their tracking system.

Of course, tracking of these orders follows the same path: when the client calls him, he calls Brussels HQ, they call HQ in Germany to get an update on the order and then all the way back down the chain.
Wednesday, 01 October 2014 11:30:57 UTC
I agree with Chris Meyers. Also I think people are talking about 3 different things, translation, transformation, and serialization. One question is, is IBM for some reason translating, "only changing formats" JSON into JSONx instead of just XML. I would assume that the JSONx format has the same rules as JSON. XML has quite a few rules that probably aren't part of the JSONx standard therefore would not care about DTDs and things of that sort to be valid. That is simply a guess as to why it might make sense, the other would be they would love to get a patent on a "new" technology.

As for transformation in the same format (object, json, xml,etc), this is done all the time. It is rare that the data that you want to use in app is stored in exactly the shape you want to use it in every scenario.

As for serialization, "flattening data structures to a text format" unless you are using a language like node or you are directly exposing your database that stores objects in json, you are likely going to be doing some serialization to expose your data, its up to you what formats you want to serialize your data in. Json seems to be the common option these days.
Adam Wright
Wednesday, 01 October 2014 12:50:22 UTC
Hi Scott, will you take a look at the monstrous SessionAuthenticationModule in .NET 4.5?

What it does:
1) Binary serialize ClaimsPrincipal (let's call it CP_bin)
2) Encrypt and compress CP_bin --> CP_bin_crypt_zip
3) Base64 encode CP_bin_crypt_zip --> CP_bin_crypt_zip_base64
4) Place CP_bin_crypt_zip_base64 in an XML --> CP_bin_crypt_zip_base64_xml
5) Base64 encode CP_bin_crypt_zip_base64_xml --> CP_bin_crypt_zip_base64_xml_base64
6) Write CP_bin_crypt_zip_base64_xml_base64 into HTTP response cookie, which by this time weighs several KB and spans multiple cookies (split into 2KiB chunks by ChunkedCookieHandler)

Many people faced frustration with the large cookies (e.g. http://stackoverflow.com/questions/7219705/large-fedauth-cooke-fedauth4-with-only-7-claims)

How it might be improved:
1) Have a custom serializer that shortens the claims namespace
3) Base85 encode rather than base64
4) Do we really need XML with fully qualified namespace here?
5) Escape the characters that are invalid in cookies, rather than base64 it
Wednesday, 01 October 2014 13:19:47 UTC
You mean: http://en.wikipedia.org/wiki/Jason_X
Chris
Wednesday, 01 October 2014 15:33:28 UTC
As silly as it seems, this has context. It allows older DB2 systems to process JSON. I don't think IBM would advocate using this in any current production environment.
Delmania
Wednesday, 01 October 2014 21:05:14 UTC
Here's one I recorded for posterity: The worst API ever.
Thursday, 02 October 2014 13:36:40 UTC
"Just because I can take a JSON document, HTML encode it, tunnel it with a SOAP envelope, BASE64 that into a Hidden HTML input, and and store it in a binary database column..."

Sure, but did you do it with a partner on the same keyboard?
Robert
Monday, 20 October 2014 21:19:24 UTC
Small picky: The markup around the 8 you serialized to XML needs to be quoted by your blog engine (it's in the HTML but obv not visible).
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.