Scott Hanselman

RFC: How FeedReaders and MacGyver report blog subscribers - Tunneled User-Agent Data

February 17, '07 Comments [3] Posted in ASP.NET
Sponsored By

Sometimes I get ever so slightly depressed that the Web is so fantastically hacked together. The way we revel in AJAX sites but forget how dizzyingly high up we are, floating in layer after layer of abstraction. IP, TCP, HTTP, UTF8/ASCII Text Encoding, HTML, XML, XHTML, CSS, ECMAScript, DOM, the list goes on...there's a lot of moving parts. I wonder how the next generation will learning all the plumbing?

They can happily drag a button from the Toolbox onto their Form and Start Programming™. I think this means I'm officially old and crusty because I'm finding myself, internally, thinking "these young punks with their Ajax and their MacGyver techniques! Assembling websites with CSS Box Hacks and Paper Clips! Feh!"

At any rate, Google Reader, an online feed aggregator whose interface I'm still slightly not digging, is now reporting their subscribers.

There's different classes of Feed Readers/Aggregators that can retrieve content two ways. There's, of course, desktop and web readers who can retrieve content directly or centrally. (These are my four classifications.)

RSS Bandit and SharpReader and NetNewsWire are actual applications that you install and run locally. They reach out from your computer directly to the feed and download it directly. FeedDemon can do this too, but is a kind of hybrid, in that if you have a NewsGator subscription it's actually getting the feed content from NewsGator, not the publisher, so in that "hybrid" (my word) mode, FeedDemon looks like an online reader.

Here's a very incomplete, but you'll-get-the-idea-it-is-just-trying-to-make-a-point table:

Reader Desktop/Web Direct/Centralized
Google Reader Web Centralized
Bloglines Web Centralized
FeedDemon Desktop Direct
(can talk to NewsGator also)
NewsGator
Online
Web Centralized
SharpReader Desktop Direct
RSS Bandit Desktop Direct
(can talk to NewsGator also)
IE7 (RSS Platform) Desktop Direct
(but shared and
centralized to the OS)

FeedBurner hosts my Feed for this site, and they have a wonderful Feed-specific Special Sauce that figures out, approximately, how many folks are subscribing/reading my site. They use lots of metrics like IP address and what not to figure out Desktop readers, and they have some algorithms to recognize that IPs change and what not.

What's interesting is how these Web and/or Centralized readers reports statistics. When a bot for one of these readers retrieves your feed, they include (or tunnel ala MacGyver), the number of subscribers in their database within the User-Agent like this. (I talked about this some two years ago):

Doesn't seem ever so slightly distasteful to you? This data is totally non-standard, and living in the HTTP Headers for User-Agent. Oleg thinks it's OK per the HTTP spec, but I say bleh.

Why hunt for older headers to stuff this data into? Four years ago Tim Bray brainstormed some ideas and thought about the URL itself, then the referrer: header, or the from: header. Why not a new one?

HTTP Headers themselves are name/value pairs, fairly well structured, but it seems that rather than forcing FeedBurner to keep tables of the various formats of the various readers and make them Regular Expression their way thought it...

Old Joke but still a Good One: So you've got a problem, and you've decided to use Regular Expressions to solve it....so, now you've got two problems...

...why not add a new HTTP Header? I mean, the blogosphere has long abandoned many of the slower standards bodies in favor of a de facto standard-building process. If enough people do it, it's standard.

RFC: Why don't the bots and online aggregators start requesting feeds like this:

GET / HTTP/1.1
Host: www.hanselman.com
User-Agent: MyFeedReadingBot
Feed-Subscriber-Count: 45

It worked for SOAP Action back in the day, why not standardize a new header now? Let's let MacGyver rest.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Sunday, February 18, 2007 4:11:28 AM UTC
Out of curiosity, what don't you like about Google Reader? I'm digging myself.
Sunday, February 18, 2007 4:39:51 AM UTC
Yeah, the useragent mess struck me as a hack as well. I like the http header idea, and have implimented it (but have not pushed it to production) on my rss aggregator, www.blorq.com .
Sunday, February 18, 2007 8:59:58 AM UTC
It's a complete hack to overload fields like that (whether or not it's technically allowed), and the effect on the user agent field is to make it unusable (try being a security guy and having to deal with that mess in a forensic investigation).

Your new header idea is a good one, and in the wonderful world of structured data that's the way to go. Let's shoot some bullets at it and see what we end up with. If it holds up unscathed, then push the idea.
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.