Scott Hanselman

Postprocessing AutoClosed SGML Tags with the SGMLReader

February 15, 2006 Comment on this post [5] Posted in XML | Bugs
Sponsored By

Chris Lovett's SGMLReader is an interesting and complex piece of work. It's more complex than my brain can hold, which is good, since he wrote it and not I. It's able to parse SGML documents like HTML. However, it derives from XmlReader, so it tries (and succeeds) to look like an XmlReader. As such, it Auto-Closes Tags. Remember that SGML doesn't have to have closing tags. Specifically, it doesn't need closing tags on primitive/simple types.

Sometimes I need to parse an OFX 1.x document, a financial format that is SGML like this:

<OFX>    
<SIGNONMSGSRQV1>
<SONRQ>   
 <DTCLIENT>20060128101000 
 <USERID>654321
 <USERPASS>123456
 <LANGUAGE>ENG 
  <FI>    
   <ORG>Corillian
   <FID>1001 
  </FI>
 <APPID>MyApp  
 <APPVER>0500  
</SONRQ>
...etc...

Notice that ORG and DTCLIENT and all the other simple types have no end tags, but complex types like FI and SONRQ do have end tags. The SgmlReader class attempts to automatically insert end tags (to close the element) as I use the XmlReader.Read() method to move through the document. However, he can't figure out where the right place for an end tag is until he sees an end elements go by. Then he says, oh, crap! There's </FI>! I need to empty my stack of start elements in reverse order. This is lovely for him, but gives me a document that looks (in memory) like this:

<OFX>    
<SIGNONMSGSRQV1>
<SONRQ>   
  <DTCLIENT>20060128101000 
  <USERID>654321
    <USERPASS>123456
     <LANGUAGE>ENG 
        <FI>    
          <ORG>Corillian
           <FID>1001
</FID>
          </ORG>
        </FI>
     </LANGUAGE>
    </USERPASS>
  </USERID>
 </DTCLIENT>

...etc...

...which totally isn't the structure I'm looking for. I could write my own SgmlReader that knows more about OFX, but really, who has the time. So, my buddy Paul Gomes and I did this.

NOTE: There's one special tag in OFX called MSGBODY that is a simple type but always has an end tag, so we special cased that one. Notice also that we did all this WITHOUT changing the SgmlReader. It's just passed into the method as "reader."

protected internal static void AutoCloseElementsInternal(SgmlReader reader, XmlWriter writer)

{

    object msgBody = reader.NameTable.Add("MSGBODY");

 

    object previousElement = null;

    Stack elementsWeAlreadyEnded = new Stack();

 

    while (reader.Read())

    {

        switch ( reader.NodeType )

        {

            case XmlNodeType.Element:

                previousElement = reader.LocalName;

                writer.WriteStartElement(reader.LocalName);

                break;

            case XmlNodeType.Text:

                if(Strings.IsNullOrEmpty(reader.Value) == false)

                {

                    writer.WriteString( reader.Value.Trim());

                    if (previousElement != null && !previousElement.Equals(msgBody))

                    {

                        writer.WriteEndElement();

                        elementsWeAlreadyEnded.Push(previousElement);

                    }

                }

                else Debug.Assert(true, "big problems?");

                break;

            case XmlNodeType.EndElement:

                if(elementsWeAlreadyEnded.Count > 0

                    && Object.ReferenceEquals(elementsWeAlreadyEnded.Peek(), 
                       reader.LocalName))

                {

                    elementsWeAlreadyEnded.Pop();

                }

                else

                {

                    writer.WriteEndElement();

                }

                break;

            default:

                writer.WriteNode(reader,false);

                break;

        }

    }

}

We store the name of the most recently written start tag. If we write out a node of type XmlNodeType.Text, we push the start tag on a stack and immediately write out our own EndElement. Then, when we notice the SgmlReader starting to auto-close and send us synthetic EndElements, we ignore them if they are already at the top of our own stack. Otherwise, we let SgmlReader close non-synthetic EndElements.

The resulting OFX document now looks like this:

<OFX>
<SIGNONMSGSRQV1>
 <SONRQ>
  <DTCLIENT>20060128101000</DTCLIENT>
  <USERID>411300</USERID>
  <USERPASS>123456
</USERPASS>
  <LANGUAGE>ENG</LANGUAGE>
  <FI>
   <ORG>Corillian</ORG>
   <FID>1001</FID>
  </FI>
  <APPID>MyApp</APPID>
  <APPVER>0500</APPVER>
 </SONRQ>
...etc...

...and we can deal with it just like any other Xml Fragment, in our case, just allowing it to continue along its way in the XmlReader/XmlWriter Pipeline.

Thanks to Craig Andera for the reminder about Object.ReferenceEquals(), it's nicer than elementsWeAlreadyEnded.Peek() == (object)reader.LocalName.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service

Using the XmlSerializer to Read and Write XML Fragments

February 15, 2006 Comment on this post [2] Posted in ASP.NET | XmlSerializer
Sponsored By

This may not be interesting to you if you don't like the XmlSerializer. I use it for lots of stuff and I'm currently using it in the middle of an XmlReader/XmlWriter pipeline to grab off little object chunks via reader.ReadOuterXml (a .NET 1.1's poorman's ReadSubTree). I've got schemas for my objects that they are generated from, and as schema, they have namespaces. However, the Xml Fragments I'm grabbing off do not have namespaces, and sometimes the document doesn't (don't ask, sometimes life doesn't turn out how you'd like, eh?).

So, I need to be able to read and write Xml Fragments into and out of objects via the XmlSerializer. This uses a view techniques I've covered before like the XmlFragmentWriter (Yes there are other ways) and the XmlNamespaceUpgradeReader. I added a few properties like SuppressAllNamespaces and JustRoot to really make these bare XmlFragments.

[Test]

public void TestRoundTrip()

{

    AccountType acct = new AccountType();

    acct.AvailableBalance = 34.33M;

    acct.AvailableBalanceSpecified = true;

    acct.Number = "54321";

    acct.Description = "My Checking";

 

    XmlSerializer ser = new XmlSerializer(typeof(AccountType));

 

    //***WRITE***

    StringBuilder sb = new StringBuilder();

    using(StringWriter sw = new StringWriter(sb))

    {

        XmlFragmentWriter fragWriter = new XmlFragmentWriter(sw);

        fragWriter.SuppressAllNamespaces = true;

        ser.Serialize(fragWriter,acct);

        fragWriter.Close();

    }

 

    string result = sb.ToString();

 

    //***READ***

    AccountType acctReborn = null;

    using(StringReader sr = new StringReader(result))

    {

        acctReborn = ser.Deserialize(

            new XmlNamespaceUpgradeReader(sr,

            String.Empty,

            "http://banking.corillian.com/Account.xsd")) as AccountType;

    }

 

    Assert.IsTrue(acctReborn.AvailableBalance == 34.33M);

}

Enjoy, improve, give back. File Attachment: SerializationFragmentTest.zip (5 KB)

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service

CoComment Support for DasBlog

February 14, 2006 Comment on this post [7] Posted in ASP.NET | DasBlog | Javascript
Sponsored By

CocommentI just checked in CoComment support for DasBlog. If you want it, feel free to get the source for DasBlog via anonymous CVS after it syncs on SourceForge. Otherwise, it'll be out soon in a DasBlog 1.9 release after we get BlogML support finished (started).

CoComment is the blog comment aggregation service that everyone is all agog about. It's interesting that so many blogging services are so willing to include a javascript bookmarklet hack into their blogging engines rather that include RSS Comments support.

CoComment is clever, sure, but it's a screenscraping web-based comment aggregator that is doing the work that FeedDemon and other aggregators SHOULD be doing. SharpReader does a lovely job supporting RSS Comments as does RSS Bandit. They also support the CommentAPI (which does have one BIG problem.) Again, doesn't all of this seem simpler than CoComment's method?

DasBlog supports RSS Comments out of the box, but I think we have a few problems around GUIDs internally...however, only Luke has ever complained to me.

RSS Comments is a great idea that would solve this problem that CoComment aims to solve. I didn't/don't need another Web 2.0 application when there's a perfectly good spec waiting to be implemented. End of rant.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service

Linksys Firmware DD-WRT BitTorrent optimizations

February 14, 2006 Comment on this post [0] Posted in Musings
Sponsored By

Dd-wrt-puttyAfter updating my Linksys Router to DD-WRT I turned on syslogd, the router's equivalent of a Windows Event Log. You turn on the syslogd service within the Web Management interface, then point it to the IP address of machine running a Syslog listener. For my main Windows machine I use Kiwi Syslog Daemon.

I noticed a lot of connections being dropped and a lot of complaints of bad ICMP (ping) packets. I also noticed that my aggregate BitTorrent throughput had dropped considerably. Seems that the default 512 connections that the firmware is configured with wasn't going to work out.

After poking around about a dozen sites and forums I added these settings to my rc_startup (the router's "autoexec.bat"). These settings are an aggregate of three different site's suggestions and have worked fine for me. Your mileage may vary.

Use PuTTY to SSH into your router. Log in as 'root' with your router's admin password. At this point you can screw up your router, so be warned. These routers have NVRAM (nonvolatile RAM) so we'll write to that an "rc_startup." You can also have an rc_shutdown if you like.

To check the value of your rc_startup do this:

nvram get rc_startup

If you don't have one, you can add startup script like this. These values ignore some irritating warnings, lower the TCP timeouts to values that are more reasonable if you seed torrents and raises the connections to the max 4096.

nvram set rc_startup="
echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts
echo 1 > /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses
echo '600 1800 120 60 120 120 10 60 30 120' > /proc/sys/net/ipv4/ip_conntrack_tcp_timeouts
echo 4096 > /proc/sys/net/ipv4/ip_conntrack_max
"

nvram commit

reboot

These values have worked nicely for me the last few days. There's more details on startup scripts at the WRT Wiki. If you've got good values that have worked for you, add them in the comments of this post.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service

WatirNUt - Portable Watir Tests integrated with NUnit

February 14, 2006 Comment on this post [0] Posted in ASP.NET | Ruby | Watir | NUnit | Nant
Sponsored By

More innovation around Watir and NUnit over here at Corillian today. Dustin Woodhouse, our build guru and former QA guy is releasing "WatirNUt," an NUnit/Watir integration with a slightly different view of the world than Travis Illig's Parathesia.Test.Ruby library up on CodeProject. Both use variations on Patrick's and my ExtractResource stuff.

Dustin's stuff includes both a GUI and Console wrapper around his main engine. He embeds everything in a generated NUnit assembly - one whose code you never see. 

From Dustin's site:

WatirNUt is a utility that creates a portable, testable NUnit binary wrapper around watir test scripts and supporting files. This binary can easily be executed in NAnt's <nunit2> task, with aggregated results displayed in your web dashboard.

WatirNUt gathers information about your test suites, including the files needed to support them, and uses this information to generate NUnit test fixtures to run your test scripts. Any number of scripts can be included, and any number of supporting files can be associated with each script. WatirNUt compiles the NUnit test fixtures under a single namespace provided by you, and embeds all the scripts and supporting files as resources.

When you use any NUnit runner on the generated assembly, the Watir tests run and their results are fed and formatted back into NUnit for use in CruiseControl build reports or whatever you like.

Also, take a look at Brent Strange's QAInsight.net, as Brent is looking at SW Explorer Automation, a possible .NET competitor to Watir. Brent is one of Corillian's Senior QA Engineers.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.