Adventures in Debugging - Expensive Semicolons and Invalid GIFs
Ah, yes crazy bugs, they are my life. Here's today's saga. We did this from 9:30am until lunch, so we were able to figure it out in about two and half hours.
One of our systems retrieves Check Images (pictures of cleared checks). The Checks move through the system as Base64'ed strings and are eventually the separate front and back checks are displayed in the user's browser as a single image using a dynamic compositing technique I mentioned a while back.
However, it seemed that when we took the decoded from BASE64 schmutz and did basically this to convert the GIF to a JPEG:
using(MemoryStream m = new MemoryStream(bytes))
using (System.Drawing.Image image = System.Drawing.Image.FromStream(m))
Response.ContentType = "image/jpeg";
We'd get an error from the bowels of System.Drawing that there was an "invalid parameter." Reflectoring showed that Image.FromStream is managed spackle over a GDI+ method.
[DllImport("gdiplus.dll", CharSet=CharSet.Unicode, ExactSpelling=true)] internal static extern int GdipLoadImageFromStreamICM(UnsafeNativeMethods.IStream stream, out IntPtr image);
I tried loading it into a number of picture viewers, most of which said nope. Surprisingly, IE didn't have a problem with it. This is odd to me because I thought the GDI+ security fixes would apply to IE, but not so.
To review - I've got a weird GIF that shows up in IE, but that .NET and GDI+ refuse to recognize. I could look for other image libraries that would "clean" the GIFs but that's reaching. The mainframe/host system that generates and holds these GIF isn't likely to change, and even if it did it wouldn't be fast enough for this implementation.
We could just pass it all the way through the system unmolested as the GIF that it is. This would WORK but only until browsers like IE became more security aware and started slapping down invalid GIFs like this one.
So, these GIFs are invalid. But how? As with all things for me, I begin with Notepad2. I opened a bad example check image into Notepad2:
First I notice that it's a GIF87a. Noteworthy only like an old piece of gray paper from Kindergarten is noteworthy. Then we (Patrick and I - at this point I've drafted him) notice that the alphabet and numbers appears a hundred bytes in. We figured that's the color table as they are triplets and this is a grayscale gif of 128 colors. But, without getting all 0xHex-y this early on, what else can I do to determine if this is a valid GIF or not? Well, I got it to display in IE before. I'll copy it (now a bitmap) to the clipboard and save it as a GIF. It'll likely save as a GIF89 because, hey, it's like 2 better, right?
Here's the same graphic saved again. Ya, it looks totally different, so you assume my copy/paste was an invalid thing to do (in the scientific method sense). Well, hang in there, it gets worse. It's clearly a GIF89a and it clearly has a different color table. Otherwise, nothing here jumps out when comparing them with our eyes.
At this point, it's time to bite the bullet and decode the GIF header. We figure a GIF can be corrupt in two ways, either the header is bogus or the image data is. We'll do the easy one first. Time to pull out the June 15th, 1987 GIF spec from Compuserve.
Working structure by structure we produced this little nugget of uselessness:
using (FileStream f = File.Open(@"C:\Documents and Settings\shanselm\Desktop\bad.gif",FileMode.Open))
//using (FileStream f = File.Open(@"C:\Documents and Settings\shanselm\Desktop\good.gif",FileMode.Open))
using(BinaryReader reader = new BinaryReader(f))
string sigversion = new string(reader.ReadChars(6));
ushort width = reader.ReadUInt16();
ushort height = reader.ReadUInt16();
byte someshit = reader.ReadByte();
int colortable = someshit & 0x7;
byte bgcolor = reader.ReadByte();
byte apsectratio = reader.ReadByte();
int logicallength = (int)Math.Pow(2,colortable+1);
int colortablelength = (int)(3 * logicallength);
//Color table, yuck. RGB is a struct elsewhere in our file.// It's a value type, that's why we poke it back in at the bottom of the loop.
RGB RGBs = new RGB[logicallength];
for (int i = 0; i < logicallength; i++)
RGB rgb = RGBs[i];
rgb.R = reader.ReadByte();
rgb.G = reader.ReadByte();
rgb.B = reader.ReadByte();
RGBs[i] = rgb;
byte imageseparator = reader.ReadByte();
uint leftpos = reader.ReadUInt16();
uint toppos = reader.ReadUInt16();
uint widthagain = reader.ReadUInt16();
uint heightagain = reader.ReadUInt16();
byte localcolortableflags = reader.ReadByte();
int localcolortablepresent = localcolortableflags & 0x80;
int interlace = localcolortableflags & 0x40;
int sortbit = localcolortableflags & 0x20;
int localcolortable = localcolortableflags & 0x07;
//We figured if the header was bad we'd mess with it in this process somewhere
// then if we fixed it in the byte, we'd fall through to the code below// that previously hadn't worked. If Image.FromStream did work, we'd have fixed the bug// Of course, we got all the way here and there wasn't anything wrong with the GIF header!
using (MemoryStream m = new MemoryStream(bytes))
using (Image image = Image.FromStream(m))
Well, crap. We made it all this way and there didn't appear to be ANYTHING (per spec) wrong with the GIF header. We checked everything out in the Watch Window line by line. Nothing.
Ok, back to differences. How about checking them out in Beyond Compare?
Zoom in on that baby. Look real close. Notice in the upper left corner, there's not many differences. Remember that the old GIF87 is on the left, and the new one that I made via COPY/PASTE is on the right. The basic image data is the same, cool. So, really the only differences are the header, a byte or two in the middle, and what? What's that at the VERY BOTTOM RIGHT CORNER? A semicolon? In the valid image? WTF is that?
Hm...back to the spec. Since we've just decoded the header, perhaps there's a footer/trailer/terminator.
June 15, 1987
(c) CompuServe Incorporated, 1987
In order to provide a synchronization for the termination of a GIF
image file, a GIF decoder will process the end of GIF mode when the
character 0x3B hex or ';' is found after an image has been processed.
By convention the decoding software will pause and wait for an action
Lovely. Do we have an off-by-one? Are we dropping the last byte as we go through the system?
We go back to the system that sites just ahead of the mainframe check imager and request an image. We look at the byte array returned, and notice that the LAST BYTE IS MISSING. The images are trasmitted on a secure internal network using HTTP. The Content/Type is image/gif and the Content-Length HTTP Header, in this case, was 20814. That was exactly how many bytes were received.
So here's the question (that hasn't been answered):
Is it more likely that the host system has or is generating bogus/bad/invalid GIFs or that the Content-Length HTTP Header being returned by their unknown kind of Web Server is off by one and System.Net.HttpWebRequest is trusting what it's being told? I vote bogus GIFs, Patrick thinks bad Content-Length. Not sure if we'll ever know.
The fix, of course, was to check if the byte array representing this kind of GIF is terminated with 0x3b or not, and if not, append it. Once 0x3b was appended, System.Drawing and GDI+ had NO problem with the bytes.
Crisis averted. Chao continues.