Scott Hanselman

Target: Referral Spam in dasBlog

December 4, '04 Comments [1] Posted in ASP.NET | DasBlog | XML
Sponsored By

I've pretty much solved the comment-Spam problem (only one person has voiced their distaste so far) but a recently perusal of my logs and older posts indicated a ridiculous amount of referral spam. 

This is when someone hits a post on your site and has changed/hacked the HTTP Referrer Header to indicate where they came from. If your blog adds this referrer to the page, as most to, you've just linked to Hot Gay Sex (not that there's anything wrong with Hot Sex between consenting adults : ) ) or whatever by their actions.

The story goes when Google comes around, they see that you've linked to them, and they get Google Juice via the Page Rank System.

Not only is this potentially offensive to my readers, it also obscures the posts and comments when they are filled with referrals.

Potential Solutions:

  • Stop printing out referrals on my pages.
    • Personally, I like to see them, and I think they provide value to the reader so they can see other places with information of interest. It also promotes cross-linking between my peer blogs.
  • Modify dasBlog to NOT add icky referrals.
    • This would be idea. However, it will likely be in version 1.7 in some way, either via James Snape's whitelist solution (I think a whitelist removes the point of referrals, and I'll greatly prefer a keyword-based black list) or some other technique.
    • I've avoided running a "private build" of dasBlog so far (as evidenced by my care in creating the CAPTCHA solution without recompiling) and I'd to continue as such
  • Clean the .xml files occasionally with a process
    • This is quick, easy, can be automated, and will work in the short term for me as I await dasBlog 1.7.

So, here was an opportunity to use the only dev stuff I have on my home machine, Visual Studio C# 2005 Express

Here's what I did. Use at your own risk, back up your /content directory, and know that this will only have to run on your "*.dayextra.xml" files from dasBlog. No error handling, no warrenty, but it worked for me. Enjoy.

Usage: TrackingFilter "c:\yourdasblogcontentdirectory"

File Attachment: TrackingFilter.zip (9 KB) (for VS.NET 2005, I don't know if it works in 2003)

WARNING: The words I put in the .config file are ; delimited and are unquestionably offensive. Not only do they include most of George Carlin's words but they also include "bloglines" and "artima" because they don't provide a value in my referral list.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Saturday, December 04, 2004 11:14:29 PM UTC
I just wanted to comment on your preference of a keyword blacklist. I originally implemented a blacklist as you described but I found that I had to login every day and add a new set of keywords to the list because they get very clever with their domain names.

The whitelist has required much less management. In fact the only domain I've added since it's implementation has been yours. Also the implementation doesn't delete the disallowed urls, it only removes them from the display. This means that when I get a message from dasBlog about the new referral I can just login and add to the list (if you check the linked post then you'll see the earlier referral show up now). Your Tracking Filter will dovetail nicely by cleaning up the logs at a later stage.

James.
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.