Scott Hanselman

Maintaining PermaLinks when moving from .Text to DasBlog

January 25, 2006 Comment on this post [6] Posted in DasBlog
Sponsored By

I believe very strongly in the concept of the PermaLink. They aren't TempLinks, they are meant to be as permanent as possible. When people want their hair done, they get perms, not temps.

When I moved this blog from Radio Userland back in September of 2003, I was worried that my 17 readers at the time wouldn't find the new blog. So, using Clemens' schema, I maintained my permalinks (mostly) via a funky redirect mechanism. If you should still find some old Radio urls and schmutz from my old blog, you'll be redirected to the old content that was imported here.

I've seen a number of folks migrate their content from .Text to DasBlog and there's a half-dozen utilities to do it, it's not too hard. What is more difficult/tricky is maintaining your old permalinks. Joshua Flanagan has contributed some stuff to the DasBlog project recently (that we'll make public later) but he also made some changes to the default web.config that can help folks maintain their URLs.

DasBlog has a very flexible RegularExpression-based URL Mapper (again, there's lots out there, not just DasBlog's) that will allow you to add matches, take them apart, and reassemble the results. Here's two for mapping .Text URLs to DasBlog. The end result is that requests for your old (non-existent) .Text pages are redirected to DasBlog.

This one should be used if you didn't use the old .Text PostID as the permalink GUID in DasBlog, so it's date-based. This is what Joshua added. Note that you'll likely already have the "<newtelligence.DasBlog.UrlMapper>" section, so just add the <add> element. Note also that I've split the regex up for formatting purposes, but the matchExpression should have NO whitespace.

  <!-- Translates
    FROM: /blog/archive/2004/07/27/194.aspx
    TO: /blog/default.aspx?date=2004-07-27
  -->
  <newtelligence.DasBlog.UrlMapper>
    <add matchExpression="(?&lt;basedir&gt;.*?)/archive/
                (?&lt;year&gt;\d{4})/(?&lt;month&gt;\d{2})/(?&lt;day&gt;\d{2})/
                (?&lt;postid&gt;\d+)\.aspx"
         mapTo="{basedir}/default.aspx?date={year}-{month}-{day}" />        
  </newtelligence.DasBlog.UrlMapper>

This one should be used if you DID use the old .Text PostID as the permalink GUID in DasBlog, which is my preferred mechanism. The GUIDs in DasBlog just need to be unique, and in this case we can toss the date info in the URL. 

   <!-- Translates
     FROM: /blog/archive/2004/07/27/194.aspx
     TO: /blog/permalink.aspx?guid=194
   -->
  <newtelligence.DasBlog.UrlMapper>
    <add matchExpression="(?&lt;basedir&gt;.*?)/archive/
                (?&lt;year&gt;\d{4})/(?&lt;month&gt;\d{2})/(?&lt;day&gt;\d{2})/
                (?&lt;postid&gt;\d+)\.aspx"
         mapTo="{basedir}/permalink.aspx?guid={postid}" />        
  </newtelligence.DasBlog.UrlMapper>

You can shape you URLs anyway you like. Maybe you brought your Radio content over like Jeff Sandquist, or if you've migrated content from other (homegrown) blogs to DasBlog.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service

Microsoft Expression Graphic Designer and Microsoft Expression Interactive Designer - Whew!

January 24, 2006 Comment on this post [1] Posted in Musings
Sponsored By

Noticed via MikeG that Microsoft Expression Graphic Designer (CTP) and Microsoft Expression Interactive Designer  (CTP) were out today. I was a smidge confused about which was which and for what, so I asked a smartie involved with the project what was the difference. Here's (paraphrased) what he said if you're confused as well.

Interactive Designer Sparkle New 100% managed WPF (Avalon) codebase for making XAM & animations UI Design
Graphic Designer Acrylic 100% unmanaged and acquired codebase for doing vector/bitmap graphics and export to xaml Graphics

You'll likely remember that Microsoft bought Creature House Expression 3 back in 2003 and that was code-named Acrylic and that is the father of the Expression Graphic Designer. Vector-based art is tantamount to voodoo if you ask me. I can barely make icons.

There's some nice videos and demos to check out.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service

Vista-style Task Switching on XP

January 24, 2006 Comment on this post [5] Posted in Musings
Sponsored By

Coming very soon to a Windows XP desktop near you...

Sshot-1

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service

Humor: Marriage and Video Games

January 24, 2006 Comment on this post [5] Posted in Musings
Sponsored By

I don't usually post stuff like this, but it was hilarious and a bit subtle and right in line with my humor. Starts slow, but stick with it. This is Tripod performing at a comedy festival.

 

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service

XSLT Stylesheet Performance on Big Ass Documents

January 23, 2006 Comment on this post [8] Posted in XML | Tools
Sponsored By

Like it or not, when it comes type to start transforming XML datas folks turn to stylesheets. Sure, it'd be nice if we could write XmlReader/XmlWriter transforms or if one of these Streaming XML Transformation languages would really take off. But for now, you know it, and I know it - folks love their XSLT.

Anyway we had a large XML document that was on the order of 250megs, sometimes larger. It was running in a batch process using MSXSL.exe, a command-line tool that invokes the "newest" version of MSXML that's on your system, starting with MSXML4, the moving backwards to MSXML3 then 2.6. It was running out of memory sometimes using as much as a gig. It was also taking 15 minutes and more. It was written three years ago and was written in a very procedural way. XSLT is meant to be written in a more declarative way, with templates that match on the input elements as they find them.

  • Original XSLT with MSXSL using MSXML4 – crashes memory exception
  • Original XSLT with NXSLT 1.6 (.NET 1.1) – Private bytes level out around 1G
      Source document load time: 16059.870 milliseconds
      Stylesheet load/compile time: 204.672 milliseconds
      Stylesheet execution time: 683552.000 milliseconds

This stylesheet wasn't very opmtized and was kinda:

<xsl:template match="/">
 <xsl:for-each select="$x">
  <xsl:variable name="Something">
   <xsl:call-template name="CreateSomething">
    <xsl:with-param name="Row" select="."/>
   </xsl:call-template>
  </xsl:variable>
  <xsl:value-of select="$Something"/>
 </xsl:for-each>
</xsl:template>

...which is sub-optimal. Not only that, but the variable Something is holding the results of the template rather than allowing it to "flow" out as data is transformed. This transform actually had two input files, the main one, and another small one that contained configuration and some other details that was selected into variable.

<xsl:variable name="Foo" select="document('foo.xml')"/>

The stylesheet was rewritten to be more template-focused ala:

<xsl:template match="Row" >
   <xsl:apply-templates select="$x[@SomeID = $someID]"/>
</xsl:template>

After this change/re-write, the stylesheet was sped up by about 66% and didn't run out of memory. However, it was still using MSXSL and we wanted to try a few other processors. I did try Saxon and a few Java/C++ parsers but they ran out of memory, so don't pick on me for not including their numbers, as this post is primarily a test of the various Microsoft XSL/T options. All these timings are generated with the -t option that all these utilities support.

  • Improved XSLT with MSXSL using MSXML4 – private bytes level out around 300M
      Source document load time: 41920 milliseconds
      Stylesheet document load time: 18.37 milliseconds
      Stylesheet compile time: 3.692 milliseconds
      Stylesheet execution time: 174327 milliseconds
  • Improved XSLT with NXSLT 1.6 (.NET 1.1) – private bytes level out around 550M
      Source document load time:     17893.370 milliseconds
      Stylesheet load/compile time:    462.974 milliseconds
      Stylesheet execution time:     629697.700 milliseconds

Interestly, but not unexpectedly, the .NET 1.1 XSLT transformations used by NXSLT are slower than the original unmanaged code in MSXML. A lot of XSLT wonks have apparently said, after the release of .NET 1.1, that when you have to do some hard-core (large) XSLT you should still use MSXSL.

We had two questions at this point - what if we used MSXML6? what if we used .NET 2.0 (whose XSLT engine was greatly improved)

However, MSXSL.exe hasn't been updated to support MSXML6 yet (the site says coming soon), and while I could go to a VBScript or whatever, I figured why not just add the support to the source of MSXSL (which is available here). I couldn't find the updated SDK header files for MSXML.H so I just hacked it together from the registry. The general gist is at the bottom of this post.

Anyway, I made a version of MSXSL that tries for MSXML6, and falls back to 4, etc. Then I got Oleg's NXSLT2 friendly command-line 2.0 stuff.

You may ask why I'm using this command-line stuff. Well, Oleg has kindly seen fit to maintain "command-line compatibility" with MSXSL.exe which makes swapping out command-line processors within our batch process very easy.

  • Improved XSLT with NXSLT2 (.NET 2.0) - private bytes level out around 500M
      Stylesheet load/compile time:   4596.000 milliseconds  
      Transformation time:           53248.000 milliseconds  
      Total execution time:          59064.000 milliseconds
  • Improved XSLT with (custom) MSXSL using MSXML6 - private bytes level out around 300M
      Source document load time:     33677 milliseconds    
      Stylesheet document load time: 4.685 milliseconds    
      Stylesheet compile time:       3.774 milliseconds    
      Stylesheet execution time:     200952 milliseconds  

Nutshell: .NET 2.0 was 10x faster than .NET 1.1. MSXML6 was 15% slower than MSXML4. This of course, was with one specific funky stylesheet and one rather big ass file. Either way, we are sticking with the MSXML4 stuff for now, but looking forward to .NET 2.0's support for this particular style (pun intended) of madness.

Updating MSXSL to choose MSXML6: I cracked open the source for MSXSL.  I couldn't find the new MSXML6.H so I added this to msxmlinf.hxx:

typedef class XSLTemplate60 XSLTemplate60;
#ifdef __cplusplus
class DECLSPEC_UUID("88d96a08-f192-11d4-a65f-0040963251e5")
XSLTemplate60;
#endif
typedef class DOMDocument60 DOMDocument60;
typedef class FreeThreadedDOMDocument60 FreeThreadedDOMDocument60;
#ifdef __cplusplus
class DECLSPEC_UUID("88d96a05-f192-11d4-a65f-0040963251e5")
DOMDocument60;
class DECLSPEC_UUID("88d96a06-f192-11d4-a65f-0040963251e5")
FreeThreadedDOMDocument60;
#endif

Then I updated the static array and factory in msxmlinf.cxx to check for the version specific ProgID:

const MSXMLInfo::StaticInfo MSXMLInfo::s_staticInfo60 =
{
    VERSION_60,
    L"6.0",

    "{88d96a08-f192-11d4-a65f-0040963251e5}",
    &__uuidof(XSLTemplate60),
    L"MSXML2.XSLTemplate.6.0",

    &__uuidof(DOMDocument60),
    L"MSXML2.DOMDocument.6.0",

    &__uuidof(FreeThreadedDOMDocument60),
    L"MSXML2.FreeThreadedDOMDocument.6.0",
};

...along with a few other things. Email me if you want the source, I don't think I'm allowed to redist this. Anyway, when I ran it the first time I got a "Access Denied 0x80004005" and stared at it for a while. Andy Phenix said, "Didn't they tighten security and break some stuff in MSXML6?" This involved using IXMLDomDocument2 and explicitly allowing the document() function to load our 'foo.xml':

VARIANT FreakingTrue;
FreakingTrue.vt = VT_BOOL;
FreakingTrue.boolVal = VARIANT_TRUE;
pStylesheet->setProperty(L"AllowDocumentFunction", FreakingTrue);

Once we turned on the document() feature, everything worked great. However, I wasn't sure if MSXML4 or MSXML6 was doing the work. (I did filemon.exe and regmon.exe as well as procexp.exe and it WAS in fact loading msxml6.dll) I noticed some cleverness, again from Oleg that allows the XSLT stylesheet to actually detect what vendor and (if MSFT) version of the XSLT engine was being used. I'd reprint it, but you should go visit his site anyway.

Thanks to Krishnan and Andy for their hard work on the new stylesheet and performance testing.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.