Scott Hanselman

Returning DataSets from WebServices is the Spawn of Satan and Represents All That Is Truly Evil in the World

June 1, '04 Comments [11] Posted in TechEd | Web Services | XML
Sponsored By

(Nah, I don't really believe that, but it's a good title, no?  DataSets have there place, just not as publically visible Business Objects or from publically accessible WebServices.)

Barry Gervin commented on my "quick bash at DataSets" and that I didn't explain my reasonining.  In his post, Barry commented on Harry Pierson's statement that one shouldn't use DataSets in a Web Service because they aren't compatible with non .NET Platforms.  Barry says, "This isn't true. A DataSet is just XML."  Well, of course it's XML, but if I say, hey take this sentence, it's in the ASCII character set (who care's if it isn't English) Le "DataSet" n'est pas votre ami si vous faites des Services de Web.  Well, Barry can understand that, but I no hablo French. ;)

DataSets are bowls, not fruit.  Do you really want to return bowls?

A DataSet is an object, right?  But it's not a Domain Object, it's not an "Apple" or "Orange" - it's an object of type "DataSet."  A DataSet is a bowl (one that knows about the backing Data Store).  A DataSet is an object that knows how to HOLD Rows and Columns.  It's an object that knows a LOT about the Database.  But I don't want to return bowls.  I want to return Domain Objects, like "Apples."

"Use Strongly Typed DataSets," you say.  "They are the same as Objects, and look how intellisquish works now!"

No, they still aren't Domain Objects, a Strongly Typed DataSet is just a bowl with a picture of an Apple on it.  "Look there's an Apple INSIDE - we've broken it down into columns!"  DataSets are a shoddy replacement for a good Domain Model (and that includes Strongly Typed DataSets). 

Barry has a very good argument for the use of DataSets on his site, and I won't go through his list agreeing and disagreeing with various points.  I will say this, however, it seems that his arguments support the use of DataSets in a Data Access Layer - not in a Business Object Layer.  THAT I would support.  Additionally I understand the usefulness of DataSets in a classic (intranet) Client-Server WinForms app with lots of DataBinding. 

Returning DataSets from a publically accessible Web Service is a BAD IDEA©.

Now, why shouldn't we return DataSets from WebServices?  DataSets and their serialized XML format includes a pile of information that has little to do with the Domain Model itself.  DataSets may be DiffGrams, they may or may not include schema, and they represent "Sets of Data."  They are an object of one type, DataSet.  Whether there is a Java version of a DataSet object available doesn't matter.  They are late-bound by nature, as even a Strongly Typed DataSet encapsulates conversion of types back and forth from SqlDataTypes to typical CLR types and calls to Rows["Apple"].  DataSets are the Class equivalent of an Variant - an Object that can be any kind of Object - only serializable as XML.  Returning an object of type DataSet or Typed DataSet via publically accessible Web Service would succeed only in confusing a Java person, stymying any chance of interop, and giving them more ammo to use against .NET.

To be clear, I WOULD architect a system that included DataSets if I felt that they provided an exceptional value.  I'm just promoting that folks BE AWARE of the ramifications of their decisions.


Tagential aside: There's some yummy best practices up on TheServerSide.NET.  Here's my favorites:

  • Using a DataReader vs. a DataSet: The DataReader was of course faster. It was faster by 16% in this particular case.
  • SQLDataReader vs. OleDBDataReader: Going with native drivers is always better. The SQLDataReader was 115% faster than going through OLE.
  • DataReader Column Reference - By Name, Ordinal, or GetString(): The order of speed? dr[0] was the fastest, followed by dr["ProductName"], followed by dr.GetString(0) as it has to do the conversion.
  • Inline (DataReader) vs. Controls (DataGrid): The inline script was 233% faster than a DataGrid component.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Tuesday, June 01, 2004 7:49:55 PM UTC
I am glad you expanded on your thoughts and I agree with them!
The one scenario I support DataSets in (and it is a biggie) is lowering the development environment for integration / customization especially combined with databinding in a win forms environment. I am looking forward to seeing how xml databinding will increase developer productivity (read less bugs per x lines of code).

There are many evils with using DataSets as an integration schema but it IS a well known entity. C# is a dream next to assembly, c or c++ but it has overhead that almost all say is worth the productivity improvement. Whenever we are faced with a propritary extension, we have to be careful. DataSets are just another one.
Tuesday, June 01, 2004 8:27:46 PM UTC
My response: Are DataSets the Spawn Of Satan? - http://donxml.com/allthingstechie/archive/2004/06/01/762.aspx

XPathNavigator BABY!
Tuesday, June 01, 2004 8:34:58 PM UTC
The really damning statement in Barry's rebuttal is "Somebody in the J2EE world has actually gone to the extent of creating a similar type of base object in Java that can deserialize the dataset in most of it's [sic]glory." He makes it pretty clear that reverse engineering the serialization format isn't straightforward, and the one non-.Net implementation he knows of doesn't even cover all the edge cases. Well, that's because you have to REVERSE ENGINEER the format - it's not documented. Anyway, so maybe that's fine for Java clients, but what about every other language out there? DataSet doesn't interop, period. It probably never will.
Tuesday, June 01, 2004 10:42:41 PM UTC
"DataReader Column Reference - By Name, Ordinal, or GetString(): The order of speed? dr[0] was the fastest, followed by dr["ProductName"], followed by dr.GetString(0) as it has to do the conversion."

Well, that's not really completely fair. Obviously dr[0] is always going to be faster than dr["ProductName"].

But what about dr[0].ToString() vs. dr["ProductName"].ToString() vs. dr.GetString(0)?

Maybe I'm just biased because I use dr.Get<type>() alot, but I think it's the clearest way. First you set up variables for the column ordinals. int c_ProductName = dr.GetOrdinal("ProductName"). Then you use dr.GetString(c_ProductName), etc.
Richard P
Wednesday, June 02, 2004 2:43:05 AM UTC
Check http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnservice/html/service02112003.asp for a discussion on using web services with DataSets.

Summarizing, you can return a DataSet that can be consumed by a Java client quite easily.

On the other hand, if you need to update data returned by a webservice, if you don't use a DataSet you need to invent your own diffgram, as you need a way to serialize the changes made to the original data.
Wednesday, June 02, 2004 2:46:35 AM UTC
Ok - doing a prep for a DevDayish thing tomorrow but after that, I'll have to post for you two web services that return datasets vs. custom object serialized XML - they really aren't that different. Singleton objects are the toughest to reproduce out of a dataset, but anything with a collection - and I can get almost char by char. The J2EE thingy just creates similar typed objects from the same xsd used to create your .NET Dataset - and then support all the same constructs like original values, changes i.e. diffgram.

Who ever told you 16% faster on a datareader didn't turn off index building & constraint enforcement in the dataset to compare apples to apples.
Wednesday, June 02, 2004 2:58:14 AM UTC
The main advantage of thinking datasets for SOA apps or webservices is that you focus on data and not on your domain model.

If you want to return an 'Order' instance from a webservice, you probably don't want to return an instance of the same class you have in your domain model. But you could, and if you do, you'll get into trouble (for example, you want to return the whole Order.Customer data or just a couple of fields from the Customer instance?, what are you going to do with cycles in your object graph?, what happens when your domain model changes?)

So, the right way of doing it is to create an 'OrderReturnedFromWebService' class with the 'view' of your object model you need to retrieve.

If you use a DataSet you need to explicitly describe the structure of your data, so you know what you are returning. It's a way to prevent programmers to do bad things and exposing your inner object model.



Wednesday, June 02, 2004 1:57:52 PM UTC
Hey Scott,

I generally agree with your points. If we use a strongly typed dataset, we can get the schema of the dataset by using asmx?schema. Hopefully we split up the wsdl and xsd and the location of the dataset's schema is more apparent in the wsdl. Strongly typed with a wsdl first approach == ok to good. Regular dataset with implementation first approach == not so good to bad idea.
Thursday, June 03, 2004 2:21:49 AM UTC
So, what do you use to pass back data from a Web Service?
Thursday, June 03, 2004 3:22:59 AM UTC
You'll never save Barry. I've tried. He's an old PowerBuilder guy and passing typed datasets around lets him program using the PB datawindow paradigm.

There's no getting around the fact that datasets are really just in memory relational databases. Relational models are not always the best object models.

Tuesday, July 26, 2005 3:53:45 PM UTC
Scott,
You actually haven't reply people asking if NO DataSet on WebSerivices what then?
Al
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.