Scott Hanselman

Humor: Marriage and Video Games

January 24, '06 Comments [5] Posted in Musings
Sponsored By

I don't usually post stuff like this, but it was hilarious and a bit subtle and right in line with my humor. Starts slow, but stick with it. This is Tripod performing at a comedy festival.

 

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

XSLT Stylesheet Performance on Big Ass Documents

January 24, '06 Comments [8] Posted in XML | Tools
Sponsored By

Like it or not, when it comes type to start transforming XML datas folks turn to stylesheets. Sure, it'd be nice if we could write XmlReader/XmlWriter transforms or if one of these Streaming XML Transformation languages would really take off. But for now, you know it, and I know it - folks love their XSLT.

Anyway we had a large XML document that was on the order of 250megs, sometimes larger. It was running in a batch process using MSXSL.exe, a command-line tool that invokes the "newest" version of MSXML that's on your system, starting with MSXML4, the moving backwards to MSXML3 then 2.6. It was running out of memory sometimes using as much as a gig. It was also taking 15 minutes and more. It was written three years ago and was written in a very procedural way. XSLT is meant to be written in a more declarative way, with templates that match on the input elements as they find them.

  • Original XSLT with MSXSL using MSXML4 – crashes memory exception
  • Original XSLT with NXSLT 1.6 (.NET 1.1) – Private bytes level out around 1G
      Source document load time: 16059.870 milliseconds
      Stylesheet load/compile time: 204.672 milliseconds
      Stylesheet execution time: 683552.000 milliseconds

This stylesheet wasn't very opmtized and was kinda:

<xsl:template match="/">
 <xsl:for-each select="$x">
  <xsl:variable name="Something">
   <xsl:call-template name="CreateSomething">
    <xsl:with-param name="Row" select="."/>
   </xsl:call-template>
  </xsl:variable>
  <xsl:value-of select="$Something"/>
 </xsl:for-each>
</xsl:template>

...which is sub-optimal. Not only that, but the variable Something is holding the results of the template rather than allowing it to "flow" out as data is transformed. This transform actually had two input files, the main one, and another small one that contained configuration and some other details that was selected into variable.

<xsl:variable name="Foo" select="document('foo.xml')"/>

The stylesheet was rewritten to be more template-focused ala:

<xsl:template match="Row" >
   <xsl:apply-templates select="$x[@SomeID = $someID]"/>
</xsl:template>

After this change/re-write, the stylesheet was sped up by about 66% and didn't run out of memory. However, it was still using MSXSL and we wanted to try a few other processors. I did try Saxon and a few Java/C++ parsers but they ran out of memory, so don't pick on me for not including their numbers, as this post is primarily a test of the various Microsoft XSL/T options. All these timings are generated with the -t option that all these utilities support.

  • Improved XSLT with MSXSL using MSXML4 – private bytes level out around 300M
      Source document load time: 41920 milliseconds
      Stylesheet document load time: 18.37 milliseconds
      Stylesheet compile time: 3.692 milliseconds
      Stylesheet execution time: 174327 milliseconds
  • Improved XSLT with NXSLT 1.6 (.NET 1.1) – private bytes level out around 550M
      Source document load time:     17893.370 milliseconds
      Stylesheet load/compile time:    462.974 milliseconds
      Stylesheet execution time:     629697.700 milliseconds

Interestly, but not unexpectedly, the .NET 1.1 XSLT transformations used by NXSLT are slower than the original unmanaged code in MSXML. A lot of XSLT wonks have apparently said, after the release of .NET 1.1, that when you have to do some hard-core (large) XSLT you should still use MSXSL.

We had two questions at this point - what if we used MSXML6? what if we used .NET 2.0 (whose XSLT engine was greatly improved)

However, MSXSL.exe hasn't been updated to support MSXML6 yet (the site says coming soon), and while I could go to a VBScript or whatever, I figured why not just add the support to the source of MSXSL (which is available here). I couldn't find the updated SDK header files for MSXML.H so I just hacked it together from the registry. The general gist is at the bottom of this post.

Anyway, I made a version of MSXSL that tries for MSXML6, and falls back to 4, etc. Then I got Oleg's NXSLT2 friendly command-line 2.0 stuff.

You may ask why I'm using this command-line stuff. Well, Oleg has kindly seen fit to maintain "command-line compatibility" with MSXSL.exe which makes swapping out command-line processors within our batch process very easy.

  • Improved XSLT with NXSLT2 (.NET 2.0) - private bytes level out around 500M
      Stylesheet load/compile time:   4596.000 milliseconds  
      Transformation time:           53248.000 milliseconds  
      Total execution time:          59064.000 milliseconds
  • Improved XSLT with (custom) MSXSL using MSXML6 - private bytes level out around 300M
      Source document load time:     33677 milliseconds    
      Stylesheet document load time: 4.685 milliseconds    
      Stylesheet compile time:       3.774 milliseconds    
      Stylesheet execution time:     200952 milliseconds  

Nutshell: .NET 2.0 was 10x faster than .NET 1.1. MSXML6 was 15% slower than MSXML4. This of course, was with one specific funky stylesheet and one rather big ass file. Either way, we are sticking with the MSXML4 stuff for now, but looking forward to .NET 2.0's support for this particular style (pun intended) of madness.

Updating MSXSL to choose MSXML6: I cracked open the source for MSXSL.  I couldn't find the new MSXML6.H so I added this to msxmlinf.hxx:

typedef class XSLTemplate60 XSLTemplate60;
#ifdef __cplusplus
class DECLSPEC_UUID("88d96a08-f192-11d4-a65f-0040963251e5")
XSLTemplate60;
#endif
typedef class DOMDocument60 DOMDocument60;
typedef class FreeThreadedDOMDocument60 FreeThreadedDOMDocument60;
#ifdef __cplusplus
class DECLSPEC_UUID("88d96a05-f192-11d4-a65f-0040963251e5")
DOMDocument60;
class DECLSPEC_UUID("88d96a06-f192-11d4-a65f-0040963251e5")
FreeThreadedDOMDocument60;
#endif

Then I updated the static array and factory in msxmlinf.cxx to check for the version specific ProgID:

const MSXMLInfo::StaticInfo MSXMLInfo::s_staticInfo60 =
{
    VERSION_60,
    L"6.0",

    "{88d96a08-f192-11d4-a65f-0040963251e5}",
    &__uuidof(XSLTemplate60),
    L"MSXML2.XSLTemplate.6.0",

    &__uuidof(DOMDocument60),
    L"MSXML2.DOMDocument.6.0",

    &__uuidof(FreeThreadedDOMDocument60),
    L"MSXML2.FreeThreadedDOMDocument.6.0",
};

...along with a few other things. Email me if you want the source, I don't think I'm allowed to redist this. Anyway, when I ran it the first time I got a "Access Denied 0x80004005" and stared at it for a while. Andy Phenix said, "Didn't they tighten security and break some stuff in MSXML6?" This involved using IXMLDomDocument2 and explicitly allowing the document() function to load our 'foo.xml':

VARIANT FreakingTrue;
FreakingTrue.vt = VT_BOOL;
FreakingTrue.boolVal = VARIANT_TRUE;
pStylesheet->setProperty(L"AllowDocumentFunction", FreakingTrue);

Once we turned on the document() feature, everything worked great. However, I wasn't sure if MSXML4 or MSXML6 was doing the work. (I did filemon.exe and regmon.exe as well as procexp.exe and it WAS in fact loading msxml6.dll) I noticed some cleverness, again from Oleg that allows the XSLT stylesheet to actually detect what vendor and (if MSFT) version of the XSLT engine was being used. I'd reprint it, but you should go visit his site anyway.

Thanks to Krishnan and Andy for their hard work on the new stylesheet and performance testing.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Media Center Games

January 23, '06 Comments [2] Posted in Javascript | Speaking | PDC | Gaming
Sponsored By

MesudokuRussell Beattie has a interesting writeup on the whole "Casual Gaming" trend. Games delivered by browser are huge business. Last time I visited MSN Games site there were 200,000 people playing. Disney Channel has my niece and nephew captivated with cheesy, but free, Flash games.

Seems there would be great potential for games on Windows Media Center Edition. There's MCE Sudoku from at MediaCenterWare(KMS), and there's MCE Peaks from 10 Foot Games. BTW, KMS Software has released a number of nice titles for the Media Center.

Why aren't there more games for the Media Center PC? Perhaps ease of development? I poked around at the code for MCE Sudoku (review) and it's pretty hairy stuff. To be clear, it's a very nice implementation that was clearly written by folks who know and enjoy the game. However, the code involved, ActiveX controls, 22 javascript libraries - Media Center Development is AJAX. Sure, AJAX is cool, but let's be serious. It's hard. It's 1997. It's MacGyver Development at it's best, except instead of bailing wire and paper clips we use ActiveX controls and Javascript.

One of the surprises of the Xbox 360 has been the Xbox Live Arcade service. It, along with it's clever implementation of micro-payments and the often-updated blurs the usefulness of the slightly-less-often-updated Online Spotlight within the Media Center PC. 

My conclusion: If there isn't an equivalent service like Xbox Live Arcade for the Media Center PC crowd, gems like MCE Sudoku are going to get lost in the shuffle. I'm actually a little surprised that MSN Games hasn't worked out a deal to create and distribute 10-foot versions of the myriad games available at MSN. They're mostly Flash, why isn't the Media Center an attractive option? Why write MCE applications in MC-HTML when I could port existing scalable Flash apps? I'd love to have a pile of Disney games for the kids to play, not to mention the strictly educational market.

Casey's got it right, what will happen is that there will be a lull (that's where we are now) then folks will start writing apps for the Media Center using WPF/WinFX and WinFX XAML Browser Applications and things will really take off. Until that day comes later this year, I will continue to enjoy Charlie's PDC Presentation that I sadly can't view via my Xbox 360 (I tried.) I'll get around to installing the February CTP and see how it goes.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Flickr and DasBlog and geo-tagging and EXIF and on and on and on

January 23, '06 Comments [7] Posted in Z | DasBlog | Javascript | Tools
Sponsored By

I was just trying out Flickr for some Z pictures and I noticed:

There's such a fantastic amount of work going on with Geotagging and Photo sharing, but it all feels so skunkworks. Maybe that's just what meta-data feels like, but even the geo-tagged name=value pairs that appear in "tunnelled" within flickr's geotagged photos seem hacky. There's a lot of equipment needed to pull this stuff off.

How long until GPS's are so tiny that they are just inside any small digital camera and the JPEG's EXIF data is automatically geo-tagged? Its interesting to me that Wi-Fi is in Nikons but I can't seem to find a camera that includes a GPS. It'd sure be shiny. It's so hard now that only phillip torrone can do it. Sigh. These pictures ARE NOT geo-tagged. :)

UPDATE: RichB points me to the 3.2MP Ricoh Camera/GPS combo. A little funky because the GPS is a CF Card in the bottom of the camera, but still very cool. Definitely better than a 1MP GPS-enabled camera phone.

www.flickr.com

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.