Scott Hanselman

ASMX SoapExtension to Strip out Whitespace and New Lines

September 17, '07 Comments [4] Posted in ASP.NET | Web Services | XML
Sponsored By

Someone asked...

[I've got] a WebService with a WebMethod of the form.  Very Simple.

[WebMethod]
public XmlNode HelloWorld () {
                XmlDocument document = new XmlDocument();
                document.LoadXml(“<a><b><c><d>Hello World</d></c></b></a>”);
                return document;
}

What comes back is something in the response is something like

<a>
                <b>
                                <c>
                                                <d>Hello World</d>
                                </c>
                </b>
</a>

Where each level of indentation is actually only 2 characters.   I would like it to come back just like it is entered in the LoadXml call [no unneeded whitespace and no unneeded new lines.]

This is an old problem. Basically if you look at SoapServerProtocol.GetWriterForMessage, they...

return new XmlTextWriter(new StreamWriter(message.Stream, new UTF8Encoding(false), bufferSize));

...just make one. You don't get to change the settings. Of course, in WCF this is easy, but this person was using ASMX.

Enter the SoapExtension. In the web.config I'll register one.

<system.web>
    <webServices>
      <soapExtensionTypes>
        <add type="ASMXStripWhitespace.ASMXStripWhitespaceExtension, ASMXStripWhitespace"
             priority="1"
             group="High" />
      </soapExtensionTypes>
    </webServices>
...

And my class will derive from SoapExtension. There's lots of good details in this MSDN article by George Shepard a while back.

Here's my quicky implementation. Basically we're just reading in the stream of XML that was just output by the ASMX (specifically the XmlSerializer that used that XmlTextWriter we saw above) infrastructure.

using System;
using System.IO;
using System.Web.Services.Protocols;
using System.Text;
using System.Xml;

namespace ASMXStripWhitespace
{
    public class ASMXStripWhitespaceExtension : SoapExtension
    {
        // Fields
        private Stream newStream;
        private Stream oldStream;

        public MemoryStream YankIt(Stream streamToPrefix)
        {
            streamToPrefix.Position = 0L;
            XmlTextReader reader = new XmlTextReader(streamToPrefix);

            XmlWriterSettings settings = new XmlWriterSettings();
            settings.Indent = false;
            settings.NewLineChars = "";
            settings.NewLineHandling = NewLineHandling.None;
            settings.Encoding = Encoding.UTF8;
            MemoryStream outStream = new MemoryStream();
            using(XmlWriter writer = XmlWriter.Create(outStream, settings))
            {
                do
                {
                    writer.WriteNode(reader, true);
                }
                while (reader.Read());
                writer.Flush();
            }
           
            ////debug
            //outStream.Seek(0, SeekOrigin.Begin);
            ////outStream.Position = 0L;
            //StreamReader reader2 = new StreamReader(outStream);
            //string s = reader2.ReadToEnd();
            //System.Diagnostics.Debug.WriteLine(s);

            //outStream.Position = 0L;
            outStream.Seek(0, SeekOrigin.Begin);
            return outStream;
        }

        // Methods
        private void StripWhitespace()
        {
            this.newStream.Position = 0L;
            this.newStream = this.YankIt(this.newStream);
            this.Copy(this.newStream, this.oldStream);
        }

        private void Copy(Stream from, Stream to)
        {
            TextReader reader = new StreamReader(from);
            TextWriter writer = new StreamWriter(to);
            writer.WriteLine(reader.ReadToEnd());
            writer.Flush();
        }

        public override void ProcessMessage(SoapMessage message)
        {
            switch (message.Stage)
            {
                case SoapMessageStage.BeforeSerialize:
                case SoapMessageStage.AfterDeserialize:
                    return;

                case SoapMessageStage.AfterSerialize:
                    this.StripWhitespace();
                    return;
                case SoapMessageStage.BeforeDeserialize:
                    this.GetReady();
                    return;
            }
            throw new Exception("invalid stage");
        }

        public override Stream ChainStream(Stream stream)
        {
            this.oldStream = stream;
            this.newStream = new MemoryStream();
            return this.newStream;
        }

        private void GetReady()
        {
            this.Copy(this.oldStream, this.newStream);
            this.newStream.Position = 0L;
        }

        public override object GetInitializer(Type t)
        {
            return typeof(ASMXStripWhitespaceExtension);
        }

        public override object GetInitializer(LogicalMethodInfo methodInfo, SoapExtensionAttribute attribute)
        {
            return attribute;
        }

        public override void Initialize(object initializer)
        {
            //You'd usually get the attribute here and pull whatever you need off it.
            ASMXStripWhitespaceAttribute attr = initializer as ASMXStripWhitespaceAttribute;
        }
    }

    [AttributeUsage(AttributeTargets.Method)]
    public class ASMXStripWhitespaceAttribute : SoapExtensionAttribute
    {
        // Fields
        private int priority;

        // Properties
        public override Type ExtensionType
        {
            get { return typeof(ASMXStripWhitespaceExtension); }
        }

        public override int Priority
        {
            get { return this.priority; }
            set { this.priority = value;}
        }
    }
}

The order that things happen is important. The overridden call to ChainStream is where we get a new copy of the stream. The ProcessMessage switch is our opportunity to "get ready" and where we "strip whitespace."

If you want a method to use this, you have to add the attribute, in this case "ASMXStripWhitespace" to that method. Notice the attribute class just above. You can pass things into it if you like or override standard properties also.

public class Service1 : System.Web.Services.WebService
{
   [WebMethod]
   [ASMXStripWhitespace]
   public XmlNode HelloWorld () {
         XmlDocument document = new XmlDocument();
         document.LoadXml("<a><b><c><d>Hello World</d></c></b></a>");
         return document;
   }
}

The real work happens in YankIt where we just setup our own XmlSerializer and spin through the reader and writing out to the memory stream using the new settings of no new line chars and no indentation. Notice that the reader.Read() is a Do/While and not just a While. We don't want to lose the root node.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Monday, September 17, 2007 8:15:13 AM UTC
Am I the first one to comment?
Monday, September 17, 2007 11:52:57 AM UTC
You mentioned that in WCF this was easy, could you include a WCF implementation as well as a compare/contrast?
Monday, September 17, 2007 12:42:10 PM UTC
I really like the way that you encapsulated this in a simple attribute. I recently wrote an article about creating a more accurate JSON serializer based off the XmlAttributes that serilaized objects in the same way for both JSON and XML. [http://coderjournal.com/2007/08/creating-a-more-accurate-json-net-serializer/] And I like the way that you implimented the attribute to make it easy to turn the code on and off as needed. I think I am going to do the same for my JSON serializer.

Nick
Friday, September 21, 2007 2:53:56 PM UTC
Has anyone seen a component like this for eliminating whitespace from javascript? Runtime would be good, but I'm thinking more of a precompiler-like tool that takes my JS sources, strips out all non-essential whitespace, and saves to a file that gets published to my site.

The ASP.NET AJAX Toolkit ships with both formatted javascript files (for debugging, study, etc.), as well as a stripped-down version of the same code. I'd like to know if they have a tool to do that.
Craig Boland
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.