Scott Hanselman

Using ISAPI_Rewrite to canonicalize ASP.NET URLs and remove default.aspx

February 22, 2007 Comment on this post [18] Posted in ASP.NET | HttpModule | Musings | Tools
Sponsored By

In the comments of my post on Google PageRanks, Jeff Atwood says:

[The existence of] Default.aspx is another reason to consider URL rewriting. A few of my rewrite rules relative to PR:
- I don't allow links to come in as codinghorror.com, I add the www. if it is not there.
- I remove index.html if it is present

This got me thinking, as it appears that are quite a few ways to get to my home page.

You get the idea...Heck, probably just by mentioning them I'm getting in trouble, right? The URI that dare not speak its name.

Away, if we start by assuming my home page is http://www.hanselman.com/blog/ and that includes the trailing slash. We know that if my browser requests http://www.hanselman.com/blog without the slash, it'll be told by the Web Server to try it again anyway, which is just wasteful.

Apache folks have mod_rewrite and love to remind ASP.NET/IIS folks about their awesomeness. Many sites rely on mod_rewrite for certain behaviors. It's really a fundamental part of the Apache experience. The IIS story becomes better in newer versions of IIS, but the easiest and most flexible way to handle these kinds of things is ISAPI_Rewrite.

Sure, one could create an HTTP Module for ASP.NET for some of this, but at some point you'll realize that you need to catch these requests WAY earlier. Now, ISAPI_Rewrite uses Regular Expressions, and now it's time for my oft-repeated favorite RegEx joke - get ready for it:

"So you've got a problem, and you want to use Regular Expressions to solve it. Now you've got two problems."

Thanks for indulging me. Yes, writing ISAPI_Rewrite stuff is freaking voodoo and I hate it. Once you've written them, they're done. Here's mine:

[ISAPI_Rewrite]
RewriteRule /blog/default\.aspx http\://www.hanselman.com/blog/ [I,RP]

RewriteCond Host: ^hanselman\.com
RewriteRule (.*) http\://www.hanselman.com$1 [I,RP]

RewriteCond Host: ^computerzen\.com
RewriteRule (.*) http\://www.hanselman.com$1 [I,RP]

RewriteCond Host: ^www.computerzen\.com
RewriteRule (.*) http\://www.hanselman.com/blog/ [I,RP]

This rules normalize (canonicalize), to the best of my ability, all the not-really-good URLs above. It'll put everyone to http://www.hanselman.com/blog/ and even take totally lame links like http://computerzen.com/blog/GooglePageRanksConsideredSubtle.aspx and make then "correct." The "I" means "case insensitive" and the "RP" means "Redirect Permanently" - an HTTP 301. If it was just "R" it'd be a 302. When you're testing with ISAPI_Rewrite, always start with "R" to do temporary redirects, because you don't get a second chance with a 301.

So now, even if someone asks for http://www.hanselman.com/blog, they'll be told where to go(here's an HTTP conversation):

  • GET /blog HTTP/1.1
    • Heh, uh, get me /blog, m'kay?
  • HTTP/1.1 301 Moved Permanently
    Location:
    http://www.hanselman.com/blog/
  • GET /blog/ HTTP/1.1
    • Gosh, sorrey (my browser is Canadian) get me /blog/ then.

And it was Good™.

This kind of control is useful in any public facing application or web site and one should take an hour or so and really think about their website's "public face." ISAPI_Rewrite can be a powerful component as part of a larger ASP.NET solution, especially one where Google Ranks do matter and hackable or "pretty" URLs are highly valued.

For us, in the banking industry, having nice URLs like http://www.foobank.com/banking/ or http://mobile.foobank.com makes everyone happy.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Hosting By
Hosted in an Azure App Service
February 22, 2007 6:12
I am using ISAPI Rewrite on a project I recently started. I must say initially it was confusing from a debugging standpoint when the URL in front of you tells you nothing of the page you want. Once you get used to moving HTTPD files around, and the regex involved, it just works.

PS. You put the ™ initial on "And it was Good™" but I didn't get the reference. Please do elaborate :-)
February 22, 2007 6:20
Gosh, sorrey (my browser is Canadian) get me /blog/ then.

LOL, love that!
February 22, 2007 6:25
The ™ is just a joke...as if "And It was Good" (what G*d supposedly said after Creation) was now Trademarked...
February 22, 2007 9:10
Here's a free ISAPI module that we use for CBC Radio 3.

It works really well. I love the fact that with the new version you can edit the .ini file on the fly and not have to restart the application pool.

February 22, 2007 10:34
looks like foobank is down.
i'll have to wait till tomorrow to withdraw all my foo.
February 22, 2007 11:09
Why keep the "www." in front? That's so 1997.
February 22, 2007 12:15
Call me old fashioned, but I like my URLs with subdomains, like ftp.hanselman.com, mail.hanselman.com, and gopher.hanselman.com.
February 22, 2007 18:25
Shouldn't the first rule include all QueryString parameters so that you're able to switch to dasBlog's admin view with "/blog/default.aspx?page=admin"? I use IIRF where I needed to apply the rule like this: RewriteRule /blog/default\.aspx(.*) /blog/$1 [I,R=301]

February 22, 2007 20:58
Alex - not sure...it works now: http://www.hanselman.com/blog/default.aspx?date=2007-02-01

Are you saying it should work like this?

http://www.hanselman.com/blog/?date=2007-02-01
February 22, 2007 21:10
gopher. ahh, the good old days!
Ian
February 22, 2007 21:18
Scott - IMHO it should work both ways. http://www.hanselman.com/blog/default.aspx?date=2007-02-01 would be shortened to http://www.hanselman.com/blog/?date=2007-02-01.

I noticed this little subtlety after logging in to my blog. The header link was /foo/default.aspx?page=admin but would be redirected to /foo/default.aspx resulting in the admin bar not being shown.
February 22, 2007 22:15
Alexander - I'm confused...I'm able to get to my admin page with my RegEx (likely due to my ignorance) - what would yours gain me?
February 22, 2007 22:38
> Why keep the "www." in front? That's so 1997.

Gee, I dunno, why don't you ask http://google.com ?

Oh wait, you can't. Because it redirects to http://www.google.com .

WORLD WIDE WEB BABY! DUBYA DUBYA DUBYA!
February 22, 2007 22:51
Scott - I suspect it's an IIRF RegEx-specific issue.

RewriteRule /(.*)/default\.aspx /$1 [I,R=301] takes me straight to http://foo/bar/ when clicking a link with href http://foo/bar/default.aspx?page=admin. One way to resolve this either to use RewriteRule ^/(.*)/default\.aspx$ /$1 [I,R=301] to match URLs that contain nothing but a vdir followed by default.aspx or to append QueryStrings like depicted above.

That said, it will gain you nothing but awareness about the differences between ISAPI_Rewrite's and IIRF's RegEx engines.
February 22, 2007 23:14
Whoops, the rule should read RewriteRule ^/(.*)/default\.aspx$ /$1/ [I,R=301]
February 23, 2007 3:30
Can all these things be performed without the ISAPI but with a normal httpmodule on an asp.net app?
Or with the default rewriter of asp.net 2.0?

I've the same problem, but hosted on wh4l, so cannot access IIS to install ISAPI
February 23, 2007 3:39
You *can* but on IIS6 you'd have to do a metabase hack to get ASP.NET associated with all extensions including NO extension...then you've got managed code "in the way" for all requests of all kinds...so, the short answer is no, not unless IIS7.
February 23, 2007 14:15
Ops... that's true... folder are not handled by ASP.NET :)

Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.