Is the Library at Alexandria burning every day? How do we cluster the cluster?
Stuart discovered splogs today and Jeff learned to lower his blog's bandwidth. Hard learned lessons both, but both got me thinking.
Splogs: If you look at SplogSpot, their weekly splog dump XML file is 56Megs this week. I guess if you filled a library with 90% pron and 10% content (or 99% and 1%) you'd have a pretty interesting library. Does it make things hard to find? Sure, especially if the goo is mixed in along site the good stuff.
Distribution of Responsibility: Jeff's starting to distribute his content. Images here, feed there, markup here. Ideally his images would be referenced relatively in the markup and stored locally, and he'd rewrite the URLs of those images on the way out, be they hosted on S3 or Akamai, or Flickr.
Aside: The Rails guys are definitely ahead of the .NET folks on this stuff, with things like asset_host, and gems that support hosting of files at S3 and elsewhere. Distribution of content and load is a good thing, but only if you can turn it off at any time, and easily. Every external dependency you add is a potential weak point in your content delivery - and content permanence - strategy.
I went looking for something yesterday and found it, I thought, on an old broken-down Tripod.com site. When I got there, however, it was just the text, the links to CSS, some JavaScript and more importantly, images, were long gone.
Broken images on a web site are the equivalent to broken windows in a building; fix them, or they mark the beginning of the end. - Me.
(Call back to old partially-related-but-not-really-but-he'll-tell-me-it-is Atwood post :P )
Which leads us to the Day's Questions:
- Is the addition of splogs to the Global Index representative of a watering-down of content? Does the proliferation of content-free MySpace pages increase the internet's usefulness, or decrease it?
- Does the breaking apart of "atoms" of content - like this post, for example - into "quarks" of text, images, styles, etc, all hosted at different locations, affect it's permanence and, forgive me, historical relevance?
I would propose that in both cases, there are emerging problems. Spam and Splogs must exist because there are eyeballs looking at them. Otherwise they (the evil-doers) would stop, right?
Breaking apart content into multiple delivery channels at different hosts helps to offset the cost to host the content. Right now the bandwidth costs for hosting this blog are covered by advertising because I update the blog regularly.
But, if I stopped adding new content, I'd stop getting advertisers, then I'd stop paying the bandwidth bill and the blog would rot. Folks might stumble upon the rotting carcass of this blog in some far-flung theoretical future (like two years from now...WAY out there in Internet Time, people) and find only text, no images, broken javascript and wonder if a library burned? How is content permanence possible? If I don't pay my DNS bill, the site disappears. If my ISP goes out of business, the site disappears. If flickr goes out of business, many photo links on this site disappear. Is it reasonable to depend on these external services?
When the Library at Alexandria was at its peak, apparently 100 scholars lived and worked there. In the time it took to read this sentence, I'm sure 100 MySpacers have joined up. Not exactly scholars, but you get the idea. Things are moving fast, and they aren't lasting long. Some might argue that Wikipedia itself isn't "scholarly" and lowers the bar as well, although I find it useful more often than not. Either way, there's a crapload of information out there with 20% of the planet adding new content everyday.
Alexandria failed because it had no geo-located redundancy. Like the vast majority of of human knowledge, it wasn't clustered. The internet, on the other hand, is a cluster in more ways than one. But is it useful and is its usefulness permanent?
If I may mix my metaphors, is the future of the Internet a worldwide library like Alexandria at its peak, or are we doomed to collectively search a Bathroom Wall for the wisdom of the ages?
I don't know if the flash-mob that is Digg qualifies as a good filter.
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter

The State of Africa




Here's a 
WHO: