Scott Hanselman

Back to Basics - Trust Nothing as User Input Comes from All Over

June 24, 2009 Comment on this post [16] Posted in ASP.NET | Back to Basics
Sponsored By

There was an interesting bug recently that was initially blamed on Bing. Basically someone searched for something, clicked the first result and got a YSOD (Yellow Screen of Death.)

They were searching Bing.com for this term:

"Eugene Myers's O(ND) Diff algorithm"

When they clicked on a link that looked like a good result, they got a scary YSOD like this:


Server Error in '/' Application.


'/t:tracking/t:referrer[@url='http://www.bing.com/search?q=eugene myers's o(nd) diff algorithm&form=qblh']' has an invalid token.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.Xml.XPath.XPathException: '/t:tracking/t:referrer[@url='http://www.bing.com/search?q=eugene myers's o(nd) diff algorithm&form=qblh']' has an invalid token.
Source Error:

Stack Trace:

[XPathException: '/t:tracking/t:referrer[@url='http://www.bing.com/search?q=eugene myers's o(nd) diff algorithm&form=qblh']' has an invalid token.]

   MS.Internal.Xml.XPath.XPathParser.ParseStep(AstNode qyInput) +539

...snip...


Eek! That is scary. Because the user clicked a link on Bing and the next thing they got was an error, they figured it was Bing that caused it. Well, indirectly. What went wrong here?

The target site the user was visiting is tracking their visitors, as many sites do and should. When you visit a site from another, HTTP includes a header called "Referer" (yes, it's actually misspelled in the spec, and is misspelled in reality. Welcome to the Web.)

Since they were visiting from here:

http://www.bing.com/search?q=eugene myers's o(nd) diff algorithm&form=qblh

...then that was referrer. However, the trouble happened when the program took the HTTP Referrer blindly and built up an XPath using the HTTP referrer header directly as input.

It appears that this website is storing its tracking details in an XML file, and the programmer is trying to do a lookup on the referrer so he/she can increment a visit.

Notice that they've used a single quote around the string, but the original search included an additional quote in the string "Engine Myers's." The resulting concatenated XPath isn't valid XPath, and the system fails.

Just in case you care, the same problem happens to this poor site when searching from Google:

http://www.google.com/search?q=Eugene+Myers's+O(ND)+Diff+algorithm

Yields:


Server Error in '/' Application.

'/t:tracking/t:referrer[@url='http://www.google.com/search?q=eugene myers's o(nd) diff algorithm']' has an invalid token.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.Xml.XPath.XPathException: '/t:tracking/t:referrer[@url='http://www.google.com/search?q=eugene myers's o(nd) diff algorithm']' has an invalid token.


What's the Back to Basics lesson?  Well, there's a few:

  • Trust no user input.
  • Input comes from many locations.
    • There's explicit input like Form POSTs, but also implicit input like HTTP Referers and Cookies.
  • "Injection" attacks aren't just about SQL Inject.
    • You can inject things into XPath and Regular expressions just as easily and possibly bring down or hang sites, as well as potentially expose private information.
    • Any time you take a string from input of any kind and concatenate it into any language you're giving bad people to be bad.

Interesting (and obscure) stuff!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Hosting By
Hosted in an Azure App Service
June 24, 2009 0:09
Apparently these web sites have no global exception handling in place either. Just no excuse for that.
June 24, 2009 0:30
I can't aggree with Lee because I can imagine that the pressure to produce results was wast on the project team. And I guess that this just slipped. If I a not greatly mistaken this blew in the request pipeline so handling that would require some actual work.

And here comes a quote from Douglas Adams:

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.


On my current project my paranoia of exactly such a thing happening has produces some strange user input validation. So I am really contemplating where the golden middle is.
June 24, 2009 1:11
I'd have to agree with Lee on this one. No matter what happens, you don't want your user to ever see an error like that. It's so easy to make a quick exception handling routine in your page class. Also, if you don't have one of those, you'll never know what errors your site is getting. It' being purposefully ignorant and it's really not acceptable.
June 24, 2009 1:20
>> If I a not greatly mistaken this blew in the request pipeline so handling that would require some actual work.

Oh yes, I forgot, this is a Webforms app we're talking about. God forbid we should have to do some actual work. ;)

By the way -- exposing the stack trace like this is a far worse security risk than having your XPath parser choke.
June 24, 2009 1:22
Interesting - obvious in the hindsight of course.
June 24, 2009 1:47
I think beyond the fact that this is a web page or from an http referer, quote (or rather delimiter) safety is one of the first things you should test for in any parser. I agree with Lee and Chad that trapping unhandled exceptions is a freebie even if you don't log the information for posterity. It's pretty simple to avoid vomiting the YSOD to a user no matter what the timeline.

Good reminder, however, that user input is not to be trusted and that you should always parse and validate any input from any source with exceptional paranoia; not just to prevent hackers, but in an attempt to catch as many of these little types of gotchas as possible.

I always enjoy conversations with developers, QA, and product managers that explore the edge cases with questions like "What happens if we give it all spaces, high-order characters, all quotes, or just a whole bunch of junk?" It seems to me that many "edge cases" occur all too commonly to be ignored as often as they are.
June 24, 2009 4:01
You can't always trust the referer string that the webserver sees. I've had plenty of situations where spammers spoof the referer string with a URL to a spam / pr0n / etc site.
June 24, 2009 7:10
I think it is funny that:

<compilation debug="true"/>

is on within an internet facing application.

Also wonder why they never thought of installing ELMAH.

Who knows, maybe they will get some feedback from a user with Scott's blog post and fix their mistakes.

June 24, 2009 7:29
I saw an interesting case of "not trusting any input" a few years ago. I was parsing Apache logs, and discovered that someone had changed their referrer to have a javascript redirect to an inappropriate site. I suppose it was to get people who use Web-based log analysis tools.

J.Ja
June 24, 2009 7:42
In a related tangent, this feeds into people switching configuration options based on HTTP Referer or hostname.

People think that determining (say) the DB to connect to based on hostname is a fine idea. What they're forgetting is that the hostname is supplied by the client too.
June 24, 2009 8:27
Sometime we humans miss small things in a big picture
June 24, 2009 17:06
I think that at least part of the fault lies with the framework here. To avoid SQL injection, we can use parameterized queries so that user input cannot be misinterpreted as executable SQL. With XPath, we don't have that option - we're left with the equivalent of:

"SELECT something FROM table WHERE column = '" + userInput.Replace("'", "''") + "'"


Sure, we can use XLinq, but since there's no IQueryable support, it seems that the code could end up traversing the entire XML DOM to find the matching node. (I may be wrong on this - please feel free to correct me if I am!)

Microsoft did start to write an XQuery implementation for .NET, but the site (xqueryservices.com) has been down since 2003. According to the XML Team blog, they were considering Linq to XQuery for a future release, but that seems to have dropped off the radar - the blog makes no mention of it after February 2007.

IMHO, we either need IQueryable support in XLinq, so that a query for a node gets compiled to a proper XPath query with robust literal escaping, or we need an XPath equivalent to the SQL command and parameter constructs.
June 24, 2009 22:10
There is another problem with Bing search - http://www.bing.com/search?q=eugene%20myers%27s%20o(nd)%20diff%20%F1%F3%EF%E5%F0%20algorithm&form=qblh . This url contains some word in different language (russian) but the search input doesn't contain it. Also when you'll try to type search terms directly in url in other than English languages you'll get some strange errors (wrong encoding and so on). This is because of bug (feature?) in asp.net stack with Request.QueryString's.
June 26, 2009 17:47
Thats really shocking - YSOD on a public Microsoft website? Users should never see that.

Bing looks like it was technically "ready" meaning it served up results to search phrase, but it looks like they missed a lot of the polish thats required to really impress.

I've done searches before from Google Chrome that don't get escaped correctly (which I think is Chrome's fault actually), but Bing just displays a blank page -- no errors, no suggestions, no fixes...

June 29, 2009 18:30
It wasn't a YSOD on a Microsoft website, it was a YSOD on a site linked to from the Bing search results. There's a big difference!

Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.