Scott Hanselman

The Weekly Source Code 35 - Zip Compressing ASP.NET Session and Cache State

October 22, '08 Comments [26] Posted in ASP.NET | ASP.NET MVC | Open Source | Source Code
Sponsored By

Recently while talking to Jeff Atwood and his development team about StackOverflow he mentioned that he compresses the Cache or Session data in ASP.NET which enables him to store about 5-10x more data. They do it with some helper methods but I thought it'd be interesting to try it myself.

There's a lot of options and thoughts and quasi-requirements on how to pull this off:

  • I could create my own SessionStateModule, basically replacing the default Session state mechanism completely.
  • I could create a number of extension methods to HttpSessionState.
  • I could just use helper methods, but that means I have to remember to use them on the in and the out and it doesn't feel like the way I'd use it day to day. However, the benefit to this approach is that it's very VERY simple and I can zip up whatever I want, whenever, and put it wherever.
  • I didn't want to accidentally put something in zipped and take it out unzipped. I want to avoid collisions.
  • I'm primarily concerned about storing strings (read: angle brackets), rather than binary serialization and compression of objects.
  • I want to be able to put zipped stuff in the Session, Application and Cache. I realized that this was the primary requirement. I didn't realize it until I started writing code from the outside. Basically, TDD, using the non-existent library in real websites.

My False Start

I initially thought I wanted it to look and work like this:

Session.ZippedItems["foo"] = someLargeThing;
someLargeThing = Session.ZippedItems["foo"]; //string is implied

But you can't do extension properties (rather than extension methods) or operator overloading.

Then I though I'd do it like this:

public static class ZipSessionExtension
{
public static object GetZipItem(this HttpSessionState s, string key)
{
//go go go
}
}

And have GetThis and SetThat all over...but that didn't feel right either.

How It Should Work

So I ended up with this once I re-though the requirements. I realized I'd want it to work like this:

Session["foo"] = "Session: this is a test of the emergency broadcast system.";
Zip.Session["foo"] = "ZipSession this is a test of the emergency broadcast system.";
string zipsession = Zip.Session["foo"];
Cache["foo"] = "Cache: this is a test of the emergency broadcast system.";
Zip.Cache["foo"] = "ZipCache: this is a test of the emergency broadcast system.";
string zipfoo = Zip.Cache["foo"];

Once I realized how it SHOULD work, I wrote it. There's a few interesting things I used and re-learned.

I initially wanted the properties to be indexed properties and I wanted to be able to type "Zip." and get intellisense. I named the class Zip and made it static. There's two static properties name Session and Cache respectively. They each have an indexer, which makes Zip.Session[""] and Zip.Cache[""] work. I prepended the word "zip" to the front of the key in order to avoid collisions with uncompressed content, and to create the illusion there were two different places.

using System.IO;
using System.IO.Compression;
using System.Diagnostics;
using System.Web;

namespace HanselZip
{
public static class Zip
{
public static readonly ZipSessionInternal Session = new ZipSessionInternal();
public static readonly ZipCacheInternal Cache = new ZipCacheInternal();

public class ZipSessionInternal
{
public string this[string index]
{
get
{
return GZipHelpers.DeCompress(HttpContext.Current.Session["zip" + index] as byte[]);
}
set
{
HttpContext.Current.Session["zip" + index] = GZipHelpers.Compress(value);
}
}
}

public class ZipCacheInternal
{
public string this[string index]
{
get
{
return GZipHelpers.DeCompress(HttpContext.Current.Cache["zip" + index] as byte[]);
}
set
{
HttpContext.Current.Cache["zip" + index] = GZipHelpers.Compress(value);
}
}
}

public static class GZipHelpers
{
public static string DeCompress(byte[] unsquishMe)
{
using (MemoryStream mem = new MemoryStream(unsquishMe))
{
using (GZipStream gz = new GZipStream(mem, CompressionMode.Decompress))
{
var sr = new StreamReader(gz);
return sr.ReadToEnd();
}
}
}

public static byte[] Compress(string squishMe)
{
Trace.WriteLine("GZipHelper: Size In: " + squishMe.Length);
byte[] compressedBuffer = null;
using (MemoryStream stream = new MemoryStream())
{
using (GZipStream zip = new GZipStream(stream, CompressionMode.Compress))
{
using (StreamWriter sw = new StreamWriter(zip))
{
sw.Write(squishMe);
}
//Dont get the MemoryStream data before the GZipStream is closed since it doesn’t yet contain complete compressed data.
//GZipStream writes additional data including footer information when its been disposed
}
compressedBuffer = stream.ToArray();
Trace.WriteLine("GZipHelper: Size Out:" + compressedBuffer.Length);
}
return compressedBuffer;
}
}
}
}

Note that if the strings you put in are shorter than about 300 bytes, they will probably get LARGER. So, you'll probably only want to use these if have strings of more than a half K. More likely you'll use these if you have a few K or more. I figure these would be used for caching large chunks of HTML.

As an aside, I used Trace.WriteLine to show the size in and the size out. Then, in the web.config I added this trace listener to make sure my Trace output from my external assembly showed up in the ASP.NET Tracing:

<system.diagnostics>
<trace>
<listeners>
<add name="WebPageTraceListener"
type="System.Web.WebPageTraceListener, System.Web,
Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a"/>
</listeners>
</trace>
</system.diagnostics>
<system.web>
<trace pageOutput="true" writeToDiagnosticsTrace="true" enabled="true"/>
...

The resulting trace output is:

image

When To Compress

See how the first zipped string got bigger? I shouldn't have put it in there, it was initially too small. The second one went from 1137 bytes to 186, so that was reasonably useful. Again, IMHO, this probably won't matter unless you were storing thousands of strings in cache that were grater than 1k, but as you get towards 10k or more, I suspect you'd get some more value.

For example, if I put 17k of basic HTML in the cache, it squishes it 3756 bytes, a 78% savings. It all depends on how repetitive the markup is and how many visitors you have. If you had a thousand visitors on a machine simultaneously, and you were caching, say, 20 chunks of 100k each, times 1000 users, you'd use about 244 megs for your cache. When I was working in banking, we might have tens of thousands of users online at a time, and we'd be caching historical checking or savings data, and that data might get to 500k or more of XML. Four savings accounts * 70,000 users * 500k of XML history might be 16gigs of RAM (across many servers, so maybe a gig or half-gig per server.) Squishing that 500k to 70k would reduce the hit to 2gigs total. It all depends on how much you're storing, how much it'll compress (how repetitive it is) and how often it's accessed.

Memcached 2 includes a setCompressThreshold that lets you adjust the minimum savings you want before it'll compress. I suspect Velocity will have some similar setting. 

Ultimately, however, this all means nothing unless you measure. For example, all this memory savings might be useless if the data is being read and written constantly. You'd be trading any savings from squishing for other potential problems like the memory you need hold the decompressed values as well as memory fragmentation. Point is, DON'T just turn this stuff on without measuring.

Nate Davis had a great comment on the StackOverflow show I wanted to share here:

If they are caching final page output in this way and they are already sending HTTP responses across the wire as gzip,
then gzipping the output cache would make great sense. But only if they are NOT first unzipping when taking the page out of cache and then re-zipping it up again before sending the response.
If they check the 'Accept-Encoding' header to make sure gzip is supported, then they can just send the gzipped output cache directly into the response stream and set the Encoding header with 'gzip'. If Accept-Encoding doesn't include gzip, then the cache would have to be unzipped, but this is a very small percentage of browsers.

Nate is pointing out that one should think about HOW this data will be used. Are you caching and entire page or iFrame? If so, you might be able to send it right back out, still compressed to the browser.

However, if you've got IIS7 and you want to cache a whole page rather than just a per user fragment, consider using IIS7's Dynamic Compression. You've already got this feature, and along with ASP.NET's OutputCaching, the system already knows about gzip compression. It'll store the gzipp'ed version and serve it directly.

My Conclusion?

  • I might use something like this if I was storing large string fragments in a memory constrained situation.
  • I always turn compression on at the Web Server level. It's solving a different problem, but there's really NO reason for your web server to be serving uncompressed content. It's wasteful not to turn it on.

Thoughts? What is this missing? Useful? Not useful?

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Survey RESULTS: What .NET Framework features do you use?

October 22, '08 Comments [47] Posted in ASP.NET | ASP.NET Dynamic Data | ASP.NET MVC | Learning .NET | Programming | Web Services | Windows Client | WPF
Sponsored By

Here's the results, as promised, of the .NET Survey I took last week.

Also, here's the disclaimer. I did this on a whim, it's not scientific, so the margin of error is +/-101%. That said, the results feel intuitively right to me, personally.

It was a single question with 14 checkboxes. You were asked to "check al the .NET Framework features that you use in your projects." The results are here after 4899 responses:

image

There were lots of good responses on Twitter and comments on the original blog post.

Folks wanted choices like "Other," "None" and "I don't use .NET." Of course, not answering the survey is a good way of reporting that. ;) In fact, 29 people looked at the survey, checked nothing and then just clicked Finish.

Here's some choice feedback:

  • A raw survey like this will be biased towards "new and cool stuff".
  • Sure you ain't shopping for an answer?...I use datareaders a lot. And datarepeaters. Everything else goes clunk.
  • Where's Silverlight?
  • No Linq-to-Objects?

I organized the survey in terms of what I called "subsystems." You could also say "product teams," I suppose. It was more trying to get a sense of the tools people reach for when they start a project.

I probably could have included IIS6, II7, etc., but I'm sure it would have gotten unruly. I could have included languages as well, but that feels like another survey. The one thing I regret not including was Silverlight. I wanted to add it, but by the time I'd realized it, I had already got 1000+ results and I decided adding it at that point would skew the results.

Of course, this could have been a multi-question, multi-page survey, but they you probably wouldn't have filled it out, right? I wanted to find a balance between getting a LARGE enough number of responses and getting data that's useful in some way to both you, Dear Reader, as well as the bosses I'll show it to.

As with all online surveys, this one is worth exact the paper its printed on. :) If you think it is useful, cool, if not, it only wasted 5 seconds of your time when you filled it out. Thanks for filling it out.

What is your analysis of the results?

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Guide to Freeing up Disk Space under Windows Vista

October 20, '08 Comments [55] Posted in Tools
Sponsored By

I've got a smallish C: drive, about 140G, but noticed that in the last week or so I'd had only 200megs free. Not cool. A few hours later, I have 84.4G free. Here's now:

  • vsp1cln.exe - After you install Vista SP1, it leaves around the original files so you can uninstall the Service Pack if you want. After a few months with the Service Pack, I've decided for myself that it's a good thing and decided I don't need the option.
    Open up an administrative command prompt. That means, click the Start Menu, type cmd.exe, then right-click on it and click "Run as Administrator." Alternatively, you can press Ctrl-Shift-Enter to run something as Administrator.
    Next, type "vsp1cln" at the command prompt. If you select yes, you'll get back around 2 to 3 gigs. The only thing, again is that you can't uninstall SP1.

    VSP1CLN
  • Disk Cleanup - It's amazing to me the number of people who DON'T run Disk Cleanup. It's even better in Vista. Just run it. Often.
  • Disable Hibernate - I have a desktop, and I prefer just three power states, sleeping, on or off. I don't use Hibernate. Plus, I have 8 gigs of RAM, and hibernation uses as much disk space as you have RAM. From an administrative command prompt, type "powercfg -h off" to get that space back. Got me back 8gigs.
  • %TEMP% Files - Even though Disk Cleanup is great, sometimes for whatever reason it doesn't get stuff out of the TEMP folder. I recommend you try to delete the TEMP folder. I do this from the command line. Open up an administrative console, type "cd /d %TEMP%" (without the quotes, of course). Then, go up one folder with "cd .." and type "rd /s temp"
    Do be warned, this command says to TRY to delete the whole folder and everything underneath it. It's very unambiguous. If you don't feel comfortable, don't do it. If you feel in over your head, don't do it. If it screws up your computer, don't email me. Next, I do a "dir temp" to see if the folder really got deleted. It usually doesn't because almost always some other program has a temp file open and the command can't get remove everything. If it DOES remove the folder, just "md temp" to get it back fresh and empty. This got me back 2.5 gigs. I'm sure you'll be surprised and get lots back.
  • System PropertiesClean up System Restore - Vista keeps backups of lots of system files every time something major (driver installation, some software installations, etc) happens, and after a while this can take up lots of space. It uses a service/subsystem called ShadowCopies and can be administered with a tool called vssadmin.
    Now, the EASIEST way to handle this is just to run Disk Cleanup, then click More Options and "Clean up…" which will delete all but the most recent System Restore data. That's what I did. That got me back lots of space back on my C: drive.
    Alternatively, you can use the vssadmin tool from an admin command prompt to to do important things. One, you can set a max size for the System Restore to get. Two, you can set an alternative drive. For example, you could have the D: drive be responsible for System Restore for the C: drive.
    You can use the commands like this. Note that you can put whatever drive letters you have in there. I ran it for each of my three drives. Note that this isn't just used for System Restore, it's also used for the "Previous Versions" feature of Vista that keeps some number of Shadow Backups in case you delete something and didn't mean it. Kind of a mini, local time machine. Point is, this isn't a feature you probably want off, just one you want kept to a max.
    vssadmin Resize ShadowStorage /On=C: /For=C: /MaxSize=15GB
  • Check Folder Sizes with WinDirStat - I've used a large number of Windows Folder Size checkers, and the one I keep coming back to is WinDirStat. It used to be OverDisk, but OverDisk isn't smart about NTFS Junction Points and tends to get confused in Vista generally. Plus, it's been on version 0.11 for something like 4 years. WinDirStat is actively developed, it's Open Source, and it works great in Vista. It's wonderfully multi-threaded and is generally fabulous. It'll help you find those crazy large log files you've forgotten about deep in %APPDATA%. It saved me 5gigs of random goo.
  • CVSRepo Properties NTFS Compression - That's right, baby, Stacker (kidding). This is a great feature of NTFS that more people should use. If you've got a bunch of folders with old crap in them, but you don't want to delete them, compress. If you've got a folder that fills up with text files or other easily compressed and frequently access stuff, compress 'em. I typically compress any and all folders that are infrequently accessed, but I'm not ready to toss. That is about 30-40% of my hard drive. Why bother to compress when Disk Space is so cheap? Well, C: drive space usually isn't. I've got a 10,000 RPM drive, and it's small. I'd like to get as much out of it as I can without the hassle of moving my Program Files to D:. More importantly, Why the heck not? Why shouldn't I compress? It's utterly painless. Just right click a folder, hit Properties, then Advanced, then Compress. Then forget about it. As long as you're not compressing a bunch of ZIP files (won't do much) then you're all set. You might consider defragging when you're done, just to tidy up.
  • Remove Old Stuff - Just go into Add/Remove Programs or Programs and Features and tidy up. There's likely a pile of old crap in there that's taking up space. I removed some Games and Game Demos and got back 5 gigs.
  • Wasteful TempFiles/ScratchFiles Settings in Popular Programs - Most programs that need scratch space have a way to set a ceiling on that Max Space. Go into Internet Explorer or Firefox, into the options and delete the Temporary Internet Files. Set a reasonable size like 250 megs or 500 megs. I've seen those cache sizes set to gigs. If you've got a speedy connection to the internet, that's just overkill.
  • Find Fat Temp File Apps and squash them - Google Earth and Microsoft Virtual Earth 3D are really fast and loose with the disk space. You can poke around for a while and next thing you know you're down 2 gigs or more. If you don't use the app a lot, delete the caches when you exit, or better yet, make the cache size for each app small.
  • works-on-my-machine-starburst ADVANCED: Use Junction Points/Hard Links/Reparse Points to move temp file folders - This is an advanced technique. If this technique kills your beloved pet cat, don't email me. You have been warning. Also, note that I'm only saying it works for me.
    I use my Zune all the time, and like many portable media players, it transcodes (squishes) video that it downloads from the web to its preferred size and codec. For example, if I download an ABCNews Podcast, it might be 600 megs, but then Zune automatically squishes it to say, 300 megs. It puts that in %AppData%\Local\Microsoft\Zune\Transcoded Files Cache. I'm not sure how to move that folder, and I've looked all over the Zune app. I know I can set the Max Size, but I want it off my drive. So, I make a SymLink. This is a way to fake out apps (Unix people know this) by making a folder POINT to another place.
    From an admin command prompt, I went into the Zune temp folder and deleted that Transoded Files Cache directory. Then I typed:
    mklink /d "Transcoded Files Cache" "d:\Zune Transcoded Files Cache"
    So that directory really points to one on my D: drive. I can confirm it with dir:
  • Directory of C:\Users\Scott\AppData\Local\Microsoft\Zune

    10/19/2008  02:24 PM    <DIR>          .
    10/19/2008  02:24 PM    <DIR>          ..
    10/19/2008  02:25 PM    <DIR>          Art Cache
    07/15/2008  08:56 AM    <DIR>          DeviceInbox
    10/19/2008  02:24 PM    <SYMLINKD>     Transcoded Files Cache [d:\Zune Transcoded Files Cache]


    Again, this is really dangerous, especially if you forget you did it. Also, not every application understands these and older disk management or directory management apps can get confused. You have been warned. I like it for me. This got me back 8 gigs of space.

What did I miss, Dear Reader? I'm sure I missed something.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Hanselminutes Podcast 134 - StackOverflow uses ASP.NET MVC - Jeff Atwood and his technical team

October 19, '08 Comments [34] Posted in ASP.NET | ASP.NET MVC | Podcast
Sponsored By

stackoverflow-logo-250 My one-hundred-and-thirty-fourth podcast is up.

Well, actually a few weeks ago, but I totally forgot to update my website with the details. You'd think somewhere around 100 shows I'd had automated this somehow. Hm. If I only I know a programmer and the data was available in some kind of universal structure syndication format…;)

Scott chats with Jeff Atwood of CodingHorror.com and most recently, StackOverflow.com. Jeff and Joel Spolsky and their technical team have created a new class of application using ASP.NET MVC. What works, what doesn't, and how did it all go down?

Subscribe: Subscribe to Hanselminutes Subscribe to my Podcast in iTunes

Do also remember the complete archives are always up and they have PDF Transcripts, a little known feature that show up a few weeks after each show.

Telerik is our sponsor for this show!

Building quality software is never easy. It requires skills and imagination. We cannot promise to improve your skills, but when it comes to User Interface, we can provide the building blocks to take your application a step closer to your imagination. Explore the leading UI suites for ASP.NET and Windows Forms. Enjoy the versatility of our new-generation Reporting Tool. Dive into our online community. Visit www.telerik.com.

As I've said before this show comes to you with the audio expertise and stewardship of Carl Franklin. The name comes from Travis Illig, but the goal of the show is simple. Avoid wasting the listener's time. (and make the commute less boring)

Enjoy. Who knows what'll happen in the next show?

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Hanselminutes Podcast 133 - Windows Live Agents and the Machine Translation Bot from MS Research

October 19, '08 Comments [1] Posted in Podcast
Sponsored By

Machine Translation My one-hundred-and-thirty-third podcast is up.

Well, actually a few weeks ago, but I totally forgot to update my website with the details. You'd think somewhere around 100 shows I'd had automated this somehow. Hm. If I only I know a programmer and the data was available in some kind of universal structure syndication format…;)

Scott visits Microsoft Research and talks to Helvecio Ribeiro, the Test Lead for Machine Translation about T-Bot, his translation bot for Windows Live Messenger.

Subscribe: Subscribe to Hanselminutes Subscribe to my Podcast in iTunes

Do also remember the complete archives are always up and they have PDF Transcripts, a little known feature that show up a few weeks after each show.

Telerik is our sponsor for this show!

Building quality software is never easy. It requires skills and imagination. We cannot promise to improve your skills, but when it comes to User Interface, we can provide the building blocks to take your application a step closer to your imagination. Explore the leading UI suites for ASP.NET and Windows Forms. Enjoy the versatility of our new-generation Reporting Tool. Dive into our online community. Visit www.telerik.com.

As I've said before this show comes to you with the audio expertise and stewardship of Carl Franklin. The name comes from Travis Illig, but the goal of the show is simple. Avoid wasting the listener's time. (and make the commute less boring)

Enjoy. Who knows what'll happen in the next show?

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.