RFC: How FeedReaders and MacGyver report blog subscribers - Tunneled User-Agent Data

February 17, 2007 Comment on this post [3] Posted in ASP.NET

Sponsored By

Sometimes I get ever so slightly depressed that the Web is so fantastically hacked together. The way we revel in AJAX sites but forget how dizzyingly high up we are, floating in layer after layer of abstraction. IP, TCP, HTTP, UTF8/ASCII Text Encoding, HTML, XML, XHTML, CSS, ECMAScript, DOM, the list goes on...there's a lot of moving parts. I wonder how the next generation will learning all the plumbing?

They can happily drag a button from the Toolbox onto their Form and Start Programming™. I think this means I'm officially old and crusty because I'm finding myself, internally, thinking "these young punks with their Ajax and their MacGyver techniques! Assembling websites with CSS Box Hacks and Paper Clips! Feh!"

At any rate, Google Reader, an online feed aggregator whose interface I'm still slightly not digging, is now reporting their subscribers.

There's different classes of Feed Readers/Aggregators that can retrieve content two ways. There's, of course, desktop and web readers who can retrieve content directly or centrally. (These are my four classifications.)

RSS Bandit and SharpReader and NetNewsWire are actual applications that you install and run locally. They reach out from your computer directly to the feed and download it directly. FeedDemon can do this too, but is a kind of hybrid, in that if you have a NewsGator subscription it's actually getting the feed content from NewsGator, not the publisher, so in that "hybrid" (my word) mode, FeedDemon looks like an online reader.

Here's a very incomplete, but you'll-get-the-idea-it-is-just-trying-to-make-a-point table:

Reader	Desktop/Web	Direct/Centralized
Google Reader	Web	Centralized
Bloglines	Web	Centralized
FeedDemon	Desktop	Direct (can talk to NewsGator also)
NewsGator Online	Web	Centralized
SharpReader	Desktop	Direct
RSS Bandit	Desktop	Direct (can talk to NewsGator also)
IE7 (RSS Platform)	Desktop	Direct (but shared and centralized to the OS)

FeedBurner hosts my Feed for this site, and they have a wonderful Feed-specific Special Sauce that figures out, approximately, how many folks are subscribing/reading my site. They use lots of metrics like IP address and what not to figure out Desktop readers, and they have some algorithms to recognize that IPs change and what not.

What's interesting is how these Web and/or Centralized readers reports statistics. When a bot for one of these readers retrieves your feed, they include (or tunnel ala MacGyver), the number of subscribers in their database within the User-Agent like this. (I talked about this some two years ago):

Bloglines/2.0 (http://www.bloglines.com; 32 subscribers)
NewsGatorOnline/2.0+(http://www.newsgator.com;+250+subscribers)
ThePort_Web/1.0;_subscribers_1
LiveJournal.com_(webmaster@livejournal.com;_for_your_url/;_1_readers)
Mozilla/5.0+(X11;+U;+Linux+i686;+en-US;+rv:1.2.1;
+Rojo+1.0;+http://www.rojo.com/corporate/help/agg/;
+Aggregating+on+behalf+of+423+subscriber(s)+
online+at+http://www.rojo.com/?feed-id=xxx)+Gecko/20021130

Doesn't seem ever so slightly distasteful to you? This data is totally non-standard, and living in the HTTP Headers for User-Agent. Oleg thinks it's OK per the HTTP spec, but I say bleh.

Why hunt for older headers to stuff this data into? Four years ago Tim Bray brainstormed some ideas and thought about the URL itself, then the referrer: header, or the from: header. Why not a new one?

HTTP Headers themselves are name/value pairs, fairly well structured, but it seems that rather than forcing FeedBurner to keep tables of the various formats of the various readers and make them Regular Expression their way thought it...

Old Joke but still a Good One: So you've got a problem, and you've decided to use Regular Expressions to solve it....so, now you've got two problems...

...why not add a new HTTP Header? I mean, the blogosphere has long abandoned many of the slower standards bodies in favor of a de facto standard-building process. If enough people do it, it's standard.

RFC: Why don't the bots and online aggregators start requesting feeds like this:

GET / HTTP/1.1
Host: www.hanselman.com
User-Agent: MyFeedReadingBot
Feed-Subscriber-Count: 45

It worked for SOAP Action back in the day, why not standardize a new header now? Let's let MacGyver rest.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

About Newsletter

Hosting By

Hosted on Linux using .NET in an Azure App Service

Comment on this post [3]

Share on BlueSky or use the Permalink and post anywhere!

February 17, 2007 20:11

Out of curiosity, what don't you like about Google Reader? I'm digging myself.

Kevin Dente

February 17, 2007 20:39

Yeah, the useragent mess struck me as a hack as well. I like the http header idea, and have implimented it (but have not pushed it to production) on my rss aggregator, www.blorq.com .

brad

February 18, 2007 0:59

It's a complete hack to overload fields like that (whether or not it's technically allowed), and the effect on the user agent field is to make it unusable (try being a security guy and having to deal with that mess in a forensic investigation).

Your new header idea is a good one, and in the wonderful world of structured data that's the way to go. Let's shoot some bullets at it and see what we end up with. If it holds up unscathed, then push the idea.

Greg Hughes

Comments are closed.