Scott Hanselman

An Xml Tidy in PowerShell or Formatting Xml with Indenting with PowerShell

July 3, '06 Comments [4] Posted in PowerShell | XML
Sponsored By


I like my XML pretty. There's no format-xml cmdlet or tidy-xml in PowerShell, so here's my first try:

#Name me tidy-xml.ps1
# - this crap written by Scott Hanselman
[System.Reflection.Assembly]::LoadWithPartialName("System.Xml") > $null
$PRIVATE:tempString = ""
if ($args[0].GetType().Name -eq "XmlDocument")
{
 $PRIVATE:tempString = $args[0].get_outerXml()
}
if ($args[0].GetType().Name -eq "String")
{
 $PRIVATE:tempString = $args[0]
}
$r = new-object System.Xml.XmlTextReader(new-object System.IO.StringReader($PRIVATE:tempString))
$sw = new-object System.IO.StringWriter
$w = new-object System.Xml.XmlTextWriter($sw)
$w.Formatting = [System.Xml.Formatting]::Indented
do { $w.WriteNode($r, $false) } while ($r.Read())
$w.Close()
$r.Close()
$sw.ToString()

Sometimes XML is thought of as strings and sometimes as [xml] in PowerShell. This script will take either a string or [xml] but will always return a string. (e.g. It's on you to do the final [xml] cast because if you did, the tidying is moot). For example:

PS> $a = "<foo><bar>asdasd</bar></foo>"
PS> ./tidy-xml $a
<foo>
  <bar>asdasd</bar>
</foo>
PS> $b = [xml]"<foo><bar>asdasd</bar></foo>"
PS> ./tidy-xml $b
<foo>
  <bar>asdasd</bar>
</foo>

I wanted to make it so I could do these scenarios. Thoughts? Remember that I need to normalize to a string for the StringReader constructor.

#couldn't because it returned an Object[] of strings and it got sloppy fast
get-content foo.xml | tidy-xml

#couldn't because it (oddly) returned an ArrayList of strings and it got sloppy fast
get-content foo.xml -ov c
tidy-xml $c

Enjoy (or improve!)

UPDATE: Here's a better version that includes a number of best-practices changes as well as the support for taking IN objects from the pipeline (like I wanted originally):

#The following cases work
#
#PS>$a
#<foo><bar>this is A</bar></foo
#PS>$b.get_OuterXml()
#<foo><bar>this is B</bar></foo
#PS>Get-Content foo.xml
#<foo>
#   <bar>this is C</bar>
#</foo>
#
#Now try the following.
#PS>sal ti tidy-xml
#PS>$a | ti
#PS>$b | ti
#PS>$c | ti
#PS>ti $a
#PS>ti $b
#PS>ti $c
#PS>$a, $b | ti
#PS>$a, $c | ti
#PS>$c, $b | ti
#PS>$a, $b, $c | ti
#
#What doesn't work here is when you pass a multiple parameter input as follows:
#tidy-xml $a, $b # doesn't work
#
#Uhm, i think i would have to change my logic "completely" to actually get that to work...
#(after refactoring "process" block...)
#
#Name me tidy-xml.ps1
# - some of this crap written by Scott Hanselman
function Tidy-Xml {
    begin {
        $private:str = ""
       
        # recursively concatenate strings from passed-in arrays of schmutz
        # not sure how to improve this...
        function ConcatString ([object[]] $szArray) {
            # return string
            $private:rStr = ""

            # Recursively call itself, if a string is also of array or a collection type
            foreach ($private:sz in $szArray) {
                if (($private:sz.GetType().IsArray) -or `
                    ($private:sz -is [System.Collections.IList])) {
                    $private:rStr += ConcatString($private:sz)
                }
                elseif ($private:sz -is [xml]) {
                    $private:rStr += $private:sz.Get_OuterXml()
                }
                else {
                    $private:rStr += $private:sz
                }
            }
            return $private:rStr;
        }
       
        # Original "Tidy-Xml" portion
        function FormatXmlString ($arg) {
            # ignore parse errors
            trap { continue; }
           
            # out-null hides output of the assembly load
            [System.Reflection.Assembly]::LoadWithPartialName("System.Xml") | out-null

            $PRIVATE:tempString = ""
            if ($arg -is [xml]){
                $PRIVATE:tempString = $arg.get_outerXml()
            }
            if ($arg -is [string]){
                $PRIVATE:tempString = $arg
            }

            # the ` tick mark is a line-continuation char
            $r = new-object System.Xml.XmlTextReader(`
                new-object System.IO.StringReader($PRIVATE:tempString))
            $sw = new-object System.IO.StringWriter
            $w = new-object System.Xml.XmlTextWriter($sw)
            $w.Formatting = [System.Xml.Formatting]::Indented

            do { $w.WriteNode($r, $false) } while ($r.Read())

            $w.Close()
            $r.Close()
            $sw.ToString()
        }
    }
   
    process {
        # For non-xml strings or types, they will be buffered and will be
        # taken care of in "end" block
        
        # this checks for objects that have been "pipe'd" in.
        if ($_) {
            # check if whatever we have appended is a valid XML or not
            $private:xmlStr = ($private:str + $_) -as [xml]
           
            if ($private:xmlStr -ne $null) {
                FormatXmlString([xml]$private:xmlStr)
                # clear the string not to be handled in "end" block
                $private:str = $null
            } else {
                if ($_ -is [string]) {
                    $private:str += $_
                } elseif ($_ -is [xml]) {
                    FormatXmlString($_)
                }
                # for an array or a collection type,
                elseif ($_.Count) {
                    # iterate each item in the collection and append
                    foreach ($i in $_) {
                        $private:line += $i
                    }
                    $private:str += $private:line
                }
            }
        }
    }

    end {
        if ([string]::IsNullOrEmpty($private:str)) {
            $private:szXml = $(ConcatString($args)) -as [xml]
            if (! [string]::IsNullOrEmpty($private:szXml)) {
                FormatXmlString([xml]$private:szXml)
            }
        } else {
            FormatXmlString([xml]$private:str)
        }
    }
}

Thanks to MonadBlog for the Updates! There's definitely some room for refactoring of the begin/end/process, but it's more funcitonal this way.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Tuesday, July 04, 2006 4:29:51 AM UTC
Hello there, Scott,
I have modified your source a bit and pasted it on http://pastehere.com/?dcuqpi as well as some comments

The code now looks like complete crap thanks to my modification... almost unreadable... sorry about botching your source.

Tuesday, July 04, 2006 5:03:56 AM UTC
Nonsense, I love it. Thanks for the modifications, I'll add them to the post and annotate the changes for everyone's (an my) ongoing education.
Scott Hanselman
Tuesday, July 04, 2006 9:48:49 PM UTC
Ah, you actually did post it.. ;) thanks

Btw, I have also posted the "COMPLETE" tabExpansion function on "http://pastehere.com/?armhfr" but i have broken one of the functionalities($host.ui.rawUi.[tab] doesn't work, meaning nested properties aren't expanded properly...)

Tuesday, July 04, 2006 10:57:58 PM UTC
I just have fixed the multi-level tab expansion and "about_[tab]"... problems.. on http://pastehere.com/?gbrsgj
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.