I had an interesting e-conversation with Rob Howard and Scott Watermasysk today. I had noticed recently that a number of blogs I'd visited had things like _2D00_ and similar codes in their URLs. There was a forum post a while back (July of 2006) that asked about things like hyphens getting encoded in some builds of CS2.1.
There were a bunch of posts on http://www.asp.net that had URLs like: .../2006/09/07/Startup-doesn_2700_t-always-mean-venture-capital... where the 2700 was a single quote or .../2006/09/05/Should-tags-be-moderated_3F00_... where 3F00 was a "?". The non-latin characters in these cases were being encoded in the URL with their Unicode Code Point. This was a bug in a beta of CS that was quickly fixed, but it got me thinking about URLs in blog engines and more generally. These particular URLs and their untidiness really irked me.
Different Ways to Get to the Same Place
Personally I like URLs that use Pascal Casing, like the one for this post, for example, is:
Although URLs are technically supposed to be case sensitive, and you used to see that a lot when URLs belied the underlying case -sensitivity of the file system, they aren't in our case. The only thing that would make it better, IMHO, is the removal of the .aspx extension. More on that later.
Years ago DasBlog had really lame URLs and Jeff Atwood picked on us. ;) These urls live on in some comments pages within DasBlog in some cases, unfortunately.
We started using the blog title to general the URL. This, of course, has problems when you change a title after a pile of folks link to the original URL, but unless you want the engine to keep track of every title a post has ever had and 301 to the "final URL," you've got a nasty problem. Anyway...
There's a number of options in DasBlog that affect your URLs, although DasBlog canonicalizes URLs internally and will always accept any of these formats without breaking your URLs. That is, you can change your URLs scheme and you won't be penalized.
There's options to use a + for a space, as well as including the date, so any of these are potentially valid:
Is One URL Format more Search Engine Friendly?
A number of folks have said they preferred hyphens over pluses, specifically that it helps Google. Rob mentioned during our email discussion:
The hyphens, however, are something you guys should investigate using for DasBlog. Search engines actually look at the URL for keywords. The hyphen is considered a word-break indicator, i.e. HelloWorld to Google appears as "HelloWorld" whereas Hello-World is "Hello" and "World". The underscore is also considered a word-break, but given less points.
I'm wasn't sure about this, and initially was skeptical, but it he's right - mostly. However, it seems to matter less and less, as Google seems to have added some smarts.
If you Google for "happybirthdaytomiiwiireview" all-one-word, you'll get my post on the Nintendo Wii with the URL highlighted. You'll also get that post if you Google for "Happy Birthday to Mii" as a list of words, or as a phrase with surrounding quotes because it also happens to be the title.
ASIDE: Oddly, if you Google for the phrase with hyphens (which is odd, in itself) as in happy-birthday-to-mii you'll get less results than if you do it with quotes. Not that there's any reason to do that.
Notice in the screenshot below how the word "Mii" appears bold in the URL. Not in the title, in the URL. That implies to me that Google either cares about casing, in this case the Pascal Casing of my blog's URL, or that it picked "Mii" up as a fragment and really cares about fragments of things in URLs.
Let's see which it is. If we search for "Happy Birthday to Mi" with just one i in "Mii" - where "Mi" is a fragment of "Mii" - we don't see my post anywhere at all, which implies, to me at least, that Pascal Casing in a Blog Post is likely as effective from Google's perspective in delimiting spaces as is a hyphen, so from a Search Engine Optimization (SEO) perspective, hyphens versus Pascal Casing versus whatever is pretty much a moo point.
Not moot, rather, "moo" like a cow's opinion. It just doesn't matter. It's moo.
So, pick the URL style that makes you feel good, I say.
Many Options for URLs
Scott Water has used ISAPI_Rewrite to completely remove the .aspx extension from his site, and he has nice clean URLs like http://scottwater.com/blog/archive/url-rewriting-via-isapi-rewrite/. He also has nice "hackable" URLs like http://scottwater.com/blog/search/hanselman/ which is pretty sweet. You too can remove the .ASPX extension from your ASP.NET site using ISAPI_Rewrite.
Here's some example URL styles I've seen out there in Blog Land:
- Subtext .../blog/archive/2007/02/11/Subtext_v1.9.4_quotWindwardquot_Edition_Released.aspx
- CS with ISAPI_Rewrite .../blog/archive/twitter-for-windows/
- Typo - .../articles/2007/03/27/microsoft-technical-summit
- DasBlog .../weblog/StringFormattingFun.aspx
- DasBlog with Dates - .../2007/03/27/Abschlussbericht+Zum+NET+Wintercamp+2007.aspx
- Radio Userland .../2007/04/05/itsNotTheCoverOfRollingSto.html
- MovableType - .../blog/archives/000093.html
- Blogger - .../2007/04/mulan.html
- Drupal - .../node/133257
- Blogware - .../blog/_archives/2006/8/18/2242665.html
Yes, there's 1,000 blogging engines out there, each with its own URL style, and yes, this is not an exhaustive list.
The Trailing Slash in a/an URL and removing Technology from your URL
Note that in ScottWater's case, the URLs are lower-case and include the trailing /.
There's a lot of controversy about the Trailing Slash. I've always felt that the trailing slash implied we were visiting a directory, while no slash implied we were visiting a page. Simon Willison seems to advocate for the trailing slash as in his comment at http://jessey.net/archive/2004/05/31/rewritten/.
Personally, I like the trailing slash only for the home page of this blog and set it up that way earlier this year. At least I picked one, as these things matter.
What I'd really like to do is remove the Technology from my URLs. I could remove the .aspx extension from my blog's URLs by:
- Making it output Permalink URLs with out .aspx
- Adding a ISAPI_Rewrite rule to add the .aspx before the request gets to ASP.NET
- Add some magic dust in ASP.NET 1.1 or A Form Control Adapter in ASP.NET 2.0 to change the HTML FORM Action in the case of a Post Back.
Of course, I'd need to do this without invalidating all the existing permalinks out there. The idea being that once you've put a permalink out there, it's out there. Forever. Only Feed Readers and Search Bots will respect a 301 and update their record of those links. All that static HTML out there cares not about your pretty URLs.
It's probably too late for me, Dear Reader, but perhaps not for you and your URLs. Pick a scheme and be excited about it, for these are religious issues that will never be solved.
I don't think ScottWater will mind me quoting him directly from a private email, in this case, to end this blog post:
What I meant is that if the goal is SEO, nice URLs are well…nice, but there are way better things you can do, such as writing relevant content. - Scott Watermasysk
It's true! I should stop now.