RFC: How FeedReaders and MacGyver report blog subscribers - Tunneled User-Agent Data
Sometimes I get ever so slightly depressed that the Web is so fantastically hacked together. The way we revel in AJAX sites but forget how dizzyingly high up we are, floating in layer after layer of abstraction. IP, TCP, HTTP, UTF8/ASCII Text Encoding, HTML, XML, XHTML, CSS, ECMAScript, DOM, the list goes on...there's a lot of moving parts. I wonder how the next generation will learning all the plumbing?
They can happily drag a button from the Toolbox onto their Form and Start Programming™. I think this means I'm officially old and crusty because I'm finding myself, internally, thinking "these young punks with their Ajax and their MacGyver techniques! Assembling websites with CSS Box Hacks and Paper Clips! Feh!"
At any rate, Google Reader, an online feed aggregator whose interface I'm still slightly not digging, is now reporting their subscribers.
There's different classes of Feed Readers/Aggregators that can retrieve content two ways. There's, of course, desktop and web readers who can retrieve content directly or centrally. (These are my four classifications.)
RSS Bandit and SharpReader and NetNewsWire are actual applications that you install and run locally. They reach out from your computer directly to the feed and download it directly. FeedDemon can do this too, but is a kind of hybrid, in that if you have a NewsGator subscription it's actually getting the feed content from NewsGator, not the publisher, so in that "hybrid" (my word) mode, FeedDemon looks like an online reader.
Here's a very incomplete, but you'll-get-the-idea-it-is-just-trying-to-make-a-point table:
(can talk to NewsGator also)
(can talk to NewsGator also)
|IE7 (RSS Platform)
(but shared and
centralized to the OS)
FeedBurner hosts my Feed for this site, and they have a wonderful Feed-specific Special Sauce that figures out, approximately, how many folks are subscribing/reading my site. They use lots of metrics like IP address and what not to figure out Desktop readers, and they have some algorithms to recognize that IPs change and what not.
What's interesting is how these Web and/or Centralized readers reports statistics. When a bot for one of these readers retrieves your feed, they include (or tunnel ala MacGyver), the number of subscribers in their database within the User-Agent like this. (I talked about this some two years ago):
- Bloglines/2.0 (http://www.bloglines.com; 32 subscribers)
Doesn't seem ever so slightly distasteful to you? This data is totally non-standard, and living in the HTTP Headers for User-Agent. Oleg thinks it's OK per the HTTP spec, but I say bleh.
Why hunt for older headers to stuff this data into? Four years ago Tim Bray brainstormed some ideas and thought about the URL itself, then the referrer: header, or the from: header. Why not a new one?
HTTP Headers themselves are name/value pairs, fairly well structured, but it seems that rather than forcing FeedBurner to keep tables of the various formats of the various readers and make them Regular Expression their way thought it...
Old Joke but still a Good One: So you've got a problem, and you've decided to use Regular Expressions to solve it....so, now you've got two problems...
...why not add a new HTTP Header? I mean, the blogosphere has long abandoned many of the slower standards bodies in favor of a de facto standard-building process. If enough people do it, it's standard.
RFC: Why don't the bots and online aggregators start requesting feeds like this:
GET / HTTP/1.1
It worked for SOAP Action back in the day, why not standardize a new header now? Let's let MacGyver rest.