Scott Hanselman

Mod your Prius to get 180 MPG

June 27, '05 Comments [2] Posted in XML
Sponsored By

What a cool idea. You'd get the best of both worlds, I could go to and from work and never hit the gasoline engine, but then when it's time to drive to Redmond, I'd get 100MPG on the freeway and hit the gas engine occasionally. If only it were $1,200, and not $12,000...from Mike's List:

A startup near Los Angeles, California, called Energy Control Systems Engineering has created a system for tricking a Toyota Prius into delivered up to 180 miles per gallon in the city, and up to 100 miles per gallon on the freeway. The system involves extra batteries, which must be charged by the user, and a hack that tricks the Prius into thinking its batteries always have a full charge. The company plans to sell the hack via a company called Clean-Tech for about $12,000.

 

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

The GrokTalks are Up - PodCast Them

June 27, '05 Comments [5] Posted in DasBlog | XML
Sponsored By

PodcastHow to Watch the Videos

You can download the videos directly from their links, but you can also Podcast them (have them automatically download from an agent) using any Podcasting Application pointed to our RSS feed.

I recommend DrizzleCast as it will download the videos using only idle bandwidth using Microsoft's built in BITS technology. If you are using an RSS Reader like FeedDemon, you may already have this functionality built in, so check your software's help files.

My GrokTalk

The post for my GrokTalk is up, it was 10 Utilities in 10 Minutes. The streaming version is here, and the downloadable one is here, but I recommend you Podcast the whole lot of them!

Some Tech

Vertigo gave me an XML file with the GrokTalks listed and I wrote a quicky XSLT to transform that XML directly into a dasBlog consumable entry for each talk! I've upload the XML and the XSLT for those interested. It's pretty cool since Omar and the gang added RSS Enclosure (Podcasting) support to dasBlog

Enjoy!
Scott Hanselman

Attachments:
grok_talk_videos.xml (6.97 KB)
groktalktodasblog.xslt (1.57 KB)

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

The Recorded Version of my TechEd 2005 Code Generation Talk is Available

June 25, '05 Comments [7] Posted in ASP.NET | Corillian | TechEd | Web Services | Javascript | Speaking
Sponsored By

Cool! Looks like the Recorded Version of my TechEd 2005 Talk "ARC305: Code Generation - Architecting a New Kind of Reuse" is up with the rest of the TechEd talks and available for viewing. They sure do a nice job recording all the sessions. It includes both the PPT slides and switches to video just during the demos to show you what I was doing on my system at the time.

The Microsoft Producer program that they used doesn't render FireFoxFriendly HTML, so sorry about that.

Hopefully those of you visit my site for all the technical content can put up with the tiny bits of humor early on. There are also pauses where folks laughed, but since I was miked you can't hear them. Forgive me. It was likely funny at the time.

Here's the URL for those of you with IE and decent bandwidth. Note, it will take a second to load a bunch of stuff in the background with Javascript and AJAX.

http://microsoft.sitestream.com/TechEd2005/ARC/ARC305.htm

I'll see what I can do about getting a downloadable version if you like.

It also turns out that they want me to do the session again as a Webcast in September as part of the "Best of TechEd" Webcast series. I'm not clear on the details, but it should be posted soon at http://www.microsoft.com/events/series/teched2005.mspx.

You should be able to get to all the TechEd 2005 recorded sessions here, but again, not with Firefox.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Stripping Out Empty XmlElements in a Performant Way and the Bus Factor

June 24, '05 Comments [6] Posted in PDC | XmlSerializer
Sponsored By

We have a system that uses templates to create XML. Something like:

<root>
   <foo>{CUSTOMTEMPLATETHING1}</foo>
   <bar>{CUSTOMTEMPLATETHING2}</bar>
</root>

And the result might be:

<root>
   <foo>text content</foo>
   <bar></bar>
</root>

Notice that <bar> has "" as it's content. For a string that's OK, but for a DateTime or Decimal, not so much. In those cases (and arguably in strings when String.IsNullOrEmpty is your primary semantic need) it'd be preferable to the XmlSerializer and any other consumers to have those elements stripped out.

So, we created what we called the Rectifier. You can feel free to ponder the root or roots of the word. The early versions of the Rectifier used an uber-regular expression to strip out these tags from the source string. This system returns a full XML Document string, not an XmlReader or IXPathNavigable.

I heard a cool quote yesterday at the Portland NerdDinner while we were planning the CodeCamp.

"So you've got a problem, and you've decided to solve it with Regular Expressions. Now you've got two problems."

Since the size of the documents we passed through this system were between 10k and 100k the performance of the RegEx, especially when it's compiled and cached was fine. Didn't give it a thought for years. It worked and it worked well. It looked like this:

private static Regex regex = new Regex(@"\<[\w-_.: ]*\>\<\!\[CDATA\[\]\]\>\</[\w-_.: ]*\>|\<[\w-_.: ]*\>\</[\w-_.: ]*\>|<[\w-_.: ]*/\>|\<[\w-_.: ]*[/]+\>|\<[\w-_.: ]*[\s]xmlns[:\w]*=""[\w-/_.: ]*""\>\</[\w-_.: ]*\>|<[\w-_.: ]*[\s]xmlns[:\w]*=""[\w-/_.: ]*""[\s]*/\>|\<[\w-_.: ]*[\s]xmlns[:\w]*=""[\w-/_.: ]*""\>\<\!\[CDATA\[\]\]\>\</[\w-_.: ]*\>",RegexOptions.Compiled);

Stuff like this has what I call a "High Bus Factor." That means if the developer who wrote it is hit by a bus, you're screwed. It's nice to create a solution that anyone can sit down and start working on and this isn't one of them.

Then, lately some folks started pushing larger amounts of data through this system, in excess of 1.5 Megs and this Regular Expression started to 4, 8, 12 seconds to finish on this giant XML strings. We'd hit the other side of the knee of the exponential performance curve that you see with string processing like this.

So, Patrick had the idea to use XmlReaders and create an XmlRectifyingReader or XmlPeekingReader. Basically a fake reader, that had a reader internally and would "peek" ahead to see if we should skip empty elements. It's a complicated problem when you consider nesting, CDATA sections, attributes, namespaces, etc. But, because XmlReaders are forward only, you have to hold a lot of state as you move forward, since there's no way to back up. We gave up on this idea, since we want to fix this in a day, but it remains, in our opinion, a cool idea we'd like to try. We wanted to do something like: xs.Deserialize(new XmlRectifyingReader(new StringReader(inputString))). But, the real issue was performance - over elegance.

Then we figured we'd do an XmlReader/XmlWriter thing like:

using(StringWriter strw = new StringWriter())

{

    XmlWriter writer = new XmlTextWriter(strw);

    XmlReader reader = new XmlTextReader(new StringReader(input));

    reader.Read();

    RectifyXmlInternal(reader, writer); //This is US

    reader.Close();

    writer.Close();

    return strw.ToString();

}

We still have the unfortunate overhead of the strings, but that's what the previous input and output was, so we need, for now, to maintain the existing interface. So, we read in the XML, atom by atom, storing little bits of state and write out only those tags that we figure aren't empty. We read in a bit, write out a bit, etc. It's recursive, maintaining depth, and it's iterative as we go over siblings. The Attribute class is the best we could come up with to store everything about an attribute as we find them. We tried to grab the attributes as strings, or one big string, but the XmlReader doesn't support that coarse style.

private class Attribute

{

    public Attribute(string l, string n, string v, string p)

    {

        LocalName = l;

        Namespace = n;

        Value = v;

        Prefix = p;

    }

 

    public string LocalName = string.Empty;

    public string Namespace = string.Empty;

    public string Value = string.Empty;

    public string Prefix = string.Empty;

}

 

internal static void RectifyXmlInternal(XmlReader reader, XmlWriter writer)

{

    int depth = reader.Depth;

 

    while (true && !reader.EOF)

    {

        switch ( reader.NodeType )

        {

            case XmlNodeType.Text:

                writer.WriteString( reader.Value );

                break;

            case XmlNodeType.Whitespace:

            case XmlNodeType.SignificantWhitespace:

                writer.WriteWhitespace(reader.Value);

                break;

            case XmlNodeType.EntityReference:

                writer.WriteEntityRef(reader.Name);

                break;

            case XmlNodeType.XmlDeclaration:

            case XmlNodeType.ProcessingInstruction:

                writer.WriteProcessingInstruction( reader.Name, reader.Value );

                break;

            case XmlNodeType.DocumentType:

                writer.WriteDocType( reader.Name,

                    reader.GetAttribute( "PUBLIC" ), reader.GetAttribute( "SYSTEM" ),

                    reader.Value );

                break;

            case XmlNodeType.Comment:

                writer.WriteComment( reader.Value );

                break;

            case XmlNodeType.EndElement:

                if(depth > reader.Depth)

                    return;

                break;

        }

 

        if(reader.IsEmptyElement || reader.EOF) return;

        else if(reader.IsStartElement())

        {

            string name = reader.Name;

            string localName = reader.LocalName;

            string prefix = reader.Prefix;

            string uri = reader.NamespaceURI;

 

            ArrayList attributes = null;

 

            if(reader.HasAttributes)

            {

                attributes = new ArrayList();

                while(reader.MoveToNextAttribute() )

                    attributes.Add(new Attribute(reader.LocalName,reader.NamespaceURI,reader.Value,reader.Prefix));

            }

 

            bool CData = false;

            reader.Read();

            if(reader.NodeType == XmlNodeType.CDATA)

            {

                CData = true;

            }

            if(reader.NodeType == XmlNodeType.CDATA && reader.Value.Length == 0)

            {

                reader.Read();

            }

            if(reader.NodeType == XmlNodeType.EndElement && reader.Name.Equals(name))

            {

                reader.Read();

                if (reader.Depth < depth)

                    return;

                else

                    continue;

            }

            writer.WriteStartElement( prefix, localName, uri);

            if (attributes != null)

            {

                foreach(Attribute a in attributes)

                    writer.WriteAttributeString(a.Prefix,a.LocalName,a.Namespace,a.Value);

            }

            if(reader.IsStartElement())

            {

                if(reader.Depth > depth)

                    RectifyXmlInternal(reader, writer);

                else

                    continue;

            }

            else

            {

                if (CData)

                    writer.WriteCData(reader.Value);

                else

                    writer.WriteString(reader.Value);

                reader.Read();

            }

            writer.WriteFullEndElement();

            reader.Read();

        }

    }

}

The resulting "rectified" or empty-element stripped XML is byte for byte identical to the XML created by the original Regular Expression, so we succeeded in keeping compatiblity. The performance on small strings of XML less than 100 bytes is about 2x slower, because of the all overhead. However, as the size of the XML approaches middle part of the bell curve that repsents the typical size (10k of 100k) this technique overtakes RegularExpressions in a big way. Initial tests are between 7x and 10x faster in our typical scenario. When the XML gets to 1.5 megs this technique can process it in sub-second times. So, the Regular Expression behaves in an O(c^n) way, and this technique (scary as it is) behaves more O(n log(n)).

This lesson taught me that manipulating XML as if it were a string is often easy and quick to develop, but manipulating the infoset with really lightweight APIs like the XmlReader will almost always make life easier.

I'd be interested in hearing Oleg or Kzu's opinions on how to make this more elegant and performant, and if it's even worth the hassle. Our dream of an XmlPeekingReader or XmlRectifyingReader to do this all in one pass remains...

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

ASP.NET 2.0 XmlDataSource's XPath doesn't support namespaces

June 23, '05 Comments [2] Posted in ASP.NET | Learning .NET | TechEd | Speaking | XML
Sponsored By

I'm working (again) on the XML Chapter to our upcoming book. The book is all about ASP.NET 2.0, but XML is such an important part of ASP.NET that this chapter gets bigger and bigger. I've been updating it from the original Beta 1 version this last few months and noticed that the namespace qualification for the XmlDataSource is still broken/incomplete as it was last year in September. I talked to a bunch of people at TechEd including a number of very helpful devs and PMs who were very much interested in resolving this issue. However, unfortunately it looks like this'll be one of those features that won't make it into the final, which means one of us will have to write our own.

The basic problem is this (from the book draft):

One unfortunate caveat of the new XmlDataSource is its XPath attribute does not support documents that use namespace qualification. Examples in this chapter use the Books.xml file with a default namespace of http://examples.books.com. It is very common for XML files to use multiple namespaces, including a default namespace. As you learned when you created an XPathDocument and queried it with XPath, the namespace in which an element exists is very important.

The regrettable reality is, there is no way use a namespace qualified XPath expression or to make the XmlDataSource Control aware of a list of prefix/namespace pairs via the XmlNamespaceManager class. However, the XPath function used in the ItemTemplate of the templated DataList control can take a XmlNamespaceManager as its second parameter and query XML returned from the XmlDataSource - as long as the control does not include an XPath attribute with namespace qualification or you can just omit it all together. That said, in order for these examples to work, you must remove the namespaces from your source XML and use XPath queries that include no namespace qualification, as shown in Listing xx-xx. 

I was hoping to avoid having any caveats like this in the book, but this one will stay until there's a solution to the problem. It'd be nice if someone (Oleg, kzu, Don, me, you?) could add a namespace-aware XmlDataSource control to the Mvp.Xml project and have it ready by 2.0 ship.

As it is currently, you do this:

<asp:datalist id="DataList1" DataSourceID="XmlDataSource1" runat="server">
    <ItemTemplate>
        <p><b><%# XPath("author/first-name") %> 
              <%# XPath("author/last-name")%></b>
                    wrote <%# XPath("title") %></p>
    </ItemTemplate>
</asp:datalist>
<asp:xmldatasource id="XmlDataSource1" runat="server"
    datafile="~/Books.xml" 
    xpath="//bookstore/book"/>

And the root problem is that the boldfaced xpath expression can't use namespace qualified XPath expressions like //b:bookstore/b:book, because there's no way to pass in an XmlNamespaceManager to the XmlDataSource. Note that this doesn't apply to the TemplatedControl.XPath expression. You CAN pass in an XmlNamespaceManager like this: <%# XPath("b:author/b:first-name", myNamespaceMgr) %>. The only bummer is that there's no completely declarative way to do this; you have to have the XmlNamespaceManager in the code behind.

In an ideal world, there'd be a number of ways to let the XmlDataSource know about namespaces and prefix. Here's some ideas in order of preference:

  • "Infer" the namespace/prefixes and create an XmlNamespaceManager instance and associate it with the control automatically. Perhaps this calls for an XmlNamespaceInferringReader that creates an XmlNamespaceManager as a side effect (this wouldn't be hard, methinks)?
  • Pass in the prefixes/namespaces declaratively like Christopf wants:
     <asp:xmldatasource runat="server" id="xds1"
      datafile="~/app_data/namespacebooks.xml"
      xpath="/ba:store/dx:book">
      <asp:namespace prefix="ba" name=”http://bracketangles.net/names” />
      <asp:namespace prefix="dx" name=”http://donxml.com/names” />
      <asp:namespace prefix="ha" name=”http://hanselman.com/names” />
    </asp:xmldatasource>
  • Have an event in the code behind like this, where you can help in an associated a NamespaceManager:
    private void OnXmlDataSource1ExecutingXPath(object sender, XmlDataSourceXPathEventArgs e) {
       NameTable table = new NameTable();
       e.NamespaceManager = new XmlNamespaceManager(table);
       e.NamespaceManager.AddNamespace("a", "b");
       e.NamespaceManager.AddNamespace("myns1", "
    http://www.example.org/namespace1");
      e.NamespaceManager.AddNamespace("myns2", "
    http://www.example.org/namespace2");
    }

Any of these would be preferrable to the alternatives, which are, as I seem them:

  • Not using documents with namespaces
  • Not using the XPath attribute of the XmlDataSource
  • Using an XSLT Transformation to strip out namespaces before using the result in the XmlDataSource. Yikes.

It's a bummer, because, in my opinion, if you're using Xml without namespaces then you're just pushing around less-than/greater-than delimited files. Considering that so much effort was put into making schemas for most (all?) the ASP.NET config files and such, it's a shame if a control shipped without support for Xml Namespaces.

We're making the book very approachable for the beginner and intermediate dev, but there will be call-outs with gotchas like this that will hopefully save the advanced developer a lot of time. Also, if you're an advanced 1.1 dev, there is a lot of direct "it used to work like this, be careful because now..." exploration. I hope it'll save you time. It should be in bookstores just before ASP.NET 2.0 itself is.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.