Blog Stats are Confusing - GETs, Views, User-Agents, Readers, Eyeballs
There's some discussion going on an internal MSFT mailing list about blog statistics. I don't check my web statistics more than once a month, as I'm more interested in blog comments or what's going on in the forum. If I get a lot of comments on a post I feel good. I like to get discussions going and bounce ideas back and forth.
That said, some blogs at Microsoft track their statistics and need to know if a particular post or new theme brings in more readers. One particular blog (not mine) recently saw a 16x increase in "hits" which is probably a good thing. A discussion started, and here's part of an email I wrote with my ideas that I thought you might find interesting, Dear Reader. I've made a few [edits] to make things clearer.
I think it's killer, to be clear, so in no way do I want to take away from [that blog's] most excellent work, but the web stats [in this case] specifically "smells" wrong. Possibly a bot, spammer, something, but still, a 16x increase in web traffic [in a single] month feels exceptional. It's the ratios [of GETs to projected humans] that are confusing to me.
It'd be interesting to use some heuristics to turn the RSS Feed HTTP GETs into Unique users. For example, most RSS Readers poll so one individual will hit your feed (in my experience) between 8 and 16 times a day, depending on their reader and how long their computer is on. Online readers are smarter that Smart Client readers like Outlook and FeedDemon. This usually means one has fewer readers than they think, if they are looking at GETs.
Additionally, online readers [usually] only hit once (here's how that works) [and rather] "tunnel" your subscriber numbers in the HTTP User Agent like "NewsGatorOnline/2.0+(http://www.newsgator.com;+250+subscribers)". Meaning, you might get one hit or 10 hit, but regardless they are representative of 250 individuals. This usually means one has more readers than they think, if they are looking at GETs.
Why do I mention this? I mention it because looking at HTTP GETs isn't representative of people, but of GETs. It took me a few years to figure this out, and I've been thrilled with the analysis work done by FeedBurner (my RSS Feed is hosted there, saving me over 400 gigs of bandwidth a month) to turn GETs into Humans.
Here's a real world example. FeedBurner says I have around 22,000 regular readers [as of today...it varies based on weekday/weekend]. That's aggregated across all News Readers:

My stats package shows about 50,000 page views a day or about 1.6 million a month. This varies, confirming [an earlier] comment about folks hanging around [a site] and reading stories, which is cool. However, if I look at "hits" I see 16.5 million. Of course, that's not [a useful stat], because that included images, css, etc. Visits, on the other hand are one individual hanging around for a period of time and reading. For example (these stats don't include RSS anywhere, including bandwidth):
Page Views - 1,596,548
Visits - 806,251
Hits - 16,500,422
Bandwidth (KB) - 209,759,564
For me, these stats make sense, because I have a readership of about 20,000 that show up every few days and hang out, representing [roughly] 50% of my traffic. The other 50% comes from Search Engines and [incoming] links from other blogs. So it's important that one distinguishes between hits, page views, and visitors, and tries to correlate those back to readership, IMHO.
The question that we need Blog Stats to answer is that of readership. What does [a] 600,000 RSS hits number mean? 600k/30days is about 20k hits a day, so how often are these readers hitting the feed per day? Once we come up with a standard-ish formula, blogs could get a rough +/-30% idea of how many human eyeballs [are actually reading].
Just my two cents, thoughts?
Related Posts
- RFC: How FeedReaders and MacGyver report blog subscribers - Tunneled User-Agent Data
- Parsing my IIS Log Files with LogParser 2.2 to learn more about Blogs stats from NewsGator and NewsGatorOnline
- Adding FeedBurner FeedFlare to DasBlog
- Syndicating ComputerZen
- Permanent Redirects with HTTP 301
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter

One month ago I asked you, Dear Reader, to 



