Scott Hanselman

The Weekly Source Code 41 - Searching Code, Sharing Code, and Reading Code (and Comments)

April 28, '09 Comments [10] Posted in Source Code
Sponsored By

I really advocate folks reading as much source as they can because you become a better writer by reading as much as writing. That's the whole point of the Weekly Source Code - reading code to be a better developer.

Reading code in Open Source projects is a good way to learn, especially if the project has been around a while and been successful, or if you already respect the team of people working on it. Less reliably, you can find snippets of code by searching and sharing code.

Searching Code

There's basically three players in the "Code Search Engine" space. These are the ones I bump into all the time. There could be others.

  • Koders.com (where Phil Haack used to work)
  • Krugle - MSDN is using Krugle to power the MSDN Code Search Preview that searches MSDN Library, MSDN Code Gallery, and CodePlex.
    • (Note, the Krugle MSDN Code Search is going up and down, as it's in preview and they are actively developing it. If you get a login dialog, that's why. Wait a while.)
  • Google Code Search - If you have lots of code on your site you can make its existence explicit to Google by extending your Sitemap to include the CodeSearch extensions. You can also point them to your Subversion repo if it's publically available.
  • Codase - Been around forever, and still up and running, but nothing new on their homepage for 4 years? What's up? Weird. I don't count them as a player, but they were first, as far as I can see.

Searching the web is 99% free text search, as you know. Every once in a while, I'll use an advanced technique like searching with "filetype:ppt" or with a numeric range like "$1500..$3000" but seriously, that's like once in a 100 searches or less. I challenge you if you say you use it more. Free text searching almost always gets it right, unless you're searching for homonyms or something general.

Because free text is so good, while I like the idea of a code search engine and I do use them, I'm not sure if I need anything more than a "code-specific free text search." I personally believe that being really specific in your search query is a really good way to filter your way out of the ONE result you need.

Sometimes, however, when searching code you will occasionally want the advanced techniques.

Here's some examples:

If you're looking for a specific implementation of something, like an MD5 hash or a BTree, these search engines can be really useful. They can't tell you anything about quality though.

Sharing Code

On your blog...

I started using SyntaxHighlighter on my blog for all my code snippets and I'm bummed I didn't start earlier. The best aspect of it is that all my snippets are inside of <pre> tags. That means they're easily indexed and not littered with markup. The syntax highlighting is added on the client side by JavaScript. I use the PreCode plugin in Windows Live Writer and wrote up on I do it on my blog. It's fast becoming THE way to post code inline. I even convinced ScottGu (by doing it for him without asking) to use SyntaxHighlighter when I converted the Nerddinner PDF to HTML.

On the web or via IM...

You know by know that pasting code in an IM window is a recipe for pain, and misplaced emoticons. I use code-pasting services for this instead.

I really like using Josh Goebel's Pastie for sharing code, although he doesn't formally support C# or VB. I'm slowly moving to Gist.Github.com though.

The best social-code-sharing snippet sites are:

  • Pastie.org - Favorite of Rubyists and the "original." Couldn't be simpler.
  • Pastebin.com - Favorite of IRC users and supports C# and dozens of other languages. Also nice because it supports "expiration" of your code in a day or a month. Nice for email or IM.
  • Gist.GitHub.com - Winner for best name, as "Gist" (pronounced "Jist") meaning "Essense" is a great way to express what you're trying to, ahem, express. Just the gist. Supports virtually all languages as well as a "private" option. It also supports versioning, which is unique. This makes sense since Github is a "social source control system." Here's a screencast explaining how this concept can by taking to the next level.

I've had an interesting conversation or two about making sharing code easier with Jeff. We'll see where that goes.

Code Comments

Kind of unrelated, but still fun...I think that every developer should have a blog, or at least an outlet for writing. Those that don't, often use Code Comments to express themselves.

There was a great post at StackOverflow asking for the "best comment in source code you have ever encountered." This, of course, turned into a list of the worst comments ever found in source code, because that's how programmers work, right? Best == Worst. ;) A lot like the Daily WTF.

You can find some interesting stuff if you use the code search engines to search for stuff that shouldn't ordinarily be in code. This guy searched the Linux Source Code for swear words and graphed them over time.

Some other non-explicit examples...

  • Found on Koder's searching for Poop
    ; Poor-man's Object-Oriented Programming
    ;                   or
    ;                  POOP
    (module POOP
       (import Utility)
  • Found on Krugle searching for Mind-Numbing

    // mind numbing: let caller use sane calling convention (as per javadoc, 3 params),
    439 // OR the 2.0 calling convention (no ptions) - we really love backward compat, don't we?

  • Found on Google Search searching for Hate
     # God, I hate DTDs.  I really do.  Why this idiot standard still
     # plagues us is beyond me.
  • Found on Google Search searching for Horrible
     case 'H':
          horrible++;
          break;
  • Found on Koders searching for "God Himself"...not really a comment, but interesting.
     (c.query_gender().equals("male") ? "He" : (c.query_gender().equals("female") ? "She" : "It"))
              + " is " +
              ((c.query_level() == client.WIZ_GOD) ?
                    "the Almighty God himself\n\rBeware of his wrath if you don't follow his laws!" :
                    ((c.query_level() > client.MORTAL) ? "a powerful immortal" : "a puny mortal")))+ "\n\r"
  • Found on Google Search searching for "profoundly bad"
    if isinstance(real_child, SilentMock):
       raise TypeError("Replacing a mock with another mock is a profoundly bad idea.\n" +
        "Try re-using mock \"%s\" instead" % (name,))
  • Found on Google Search looking for "Pure Evil"
     my $db = delete $access->{db};
              # This is pure evil.
              $db->DESTROY;
  • Found on Google Search searching for Poop, but only in Ruby files.
    "Stimpy-drool",
    "poopy",
    "poop",
    "craptacular carpet droppings",

I'm sure if you search, you'll find lots of great stuff in comments, much more colorful than this. For example, the Greatest Code Comment Ever (Line 107) hit tip to Cam Soper:

uint32 sign=[fh readUInt32BE];
uint32 marker=[fh readUInt32BE];
uint32 chunklen=[fh readUInt32BE];
off_t nextchunk=[fh offsetInFile]+((chunklen+3)&~3);
// At this point, I'd like to take a moment to speak to you about the Adobe PSD format.
// PSD is not a good format. PSD is not even a bad format. Calling it such would be an
// insult to other bad formats, such as PCX or JPEG. No, PSD is an abysmal format. Having
// worked on this code for several weeks now, my hate for PSD has grown to a raging fire
// that burns with the fierce passion of a million suns.
// If there are two different ways of doing something, PSD will do both, in different
// places. It will then make up three more ways no sane human would think of, and do those
// too. PSD makes inconsistency an art form. Why, for instance, did it suddenly decide
// that *these* particular chunks should be aligned to four bytes, and that this alignement
// should *not* be included in the size? Other chunks in other places are either unaligned,
// or aligned with the alignment included in the size. Here, though, it is not included.
// Either one of these three behaviours would be fine. A sane format would pick one. PSD,
// of course, uses all three, and more.
// Trying to get data out of a PSD file is like trying to find something in the attic of
// your eccentric old uncle who died in a freak freshwater shark attack on his 58th
// birthday. That last detail may not be important for the purposes of the simile, but
// at this point I am spending a lot of time imagining amusing fates for the people
// responsible for this Rube Goldberg of a file format.
// Earlier, I tried to get a hold of the latest specs for the PSD file format. To do this,
// I had to apply to them for permission to apply to them to have them consider sending
// me this sacred tome. This would have involved faxing them a copy of some document or
// other, probably signed in blood. I can only imagine that they make this process so
// difficult because they are intensely ashamed of having created this abomination. I
// was naturally not gullible enough to go through with this procedure, but if I had done
// so, I would have printed out every single page of the spec, and set them all on fire.
// Were it within my power, I would gather every single copy of those specs, and launch
// them on a spaceship directly into the sun.
//
// PSD is not my favourite file format.

Enjoy your search and read more code!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Wednesday, April 29, 2009 3:03:56 AM UTC
For Code Sharing, you should also check out Snipplr.

Another thing to consider is where to put all these pieces of code. I created a product for managing code snippets with this in mind. Details about the second version, which is in beta and integrated with Snipplr for sharing code online are on my blog, here (includes a download link). Any feedback that anyone has would be greatly appreciated.
Wednesday, April 29, 2009 4:02:27 AM UTC
What, no, "thanks to Cam for tipping me to the PSD comment?" I'm hurt. :)
Wednesday, April 29, 2009 7:00:21 AM UTC
I inherited a CMS with some really odd comments. The worst was after a huge chunk of very complex code with no comments and no naming convention. It simply said:
//This will break… hopefully I won’t work here when it does.
Wednesday, April 29, 2009 7:01:08 AM UTC
Re. code formatting: there's various scenarios you need to consider. If people mostly read your posts from an aggregator, inline formatting is best. If people read your post from a web page, then certainly SyntaxHighlighter is best (IMO). If you want to strike a balance, and if the source snippet is small enough, then a bitmap screenshot of the code is best.

My WLW plugin does all three. You can download it here. Please let me know if there's anything you'd like see added or changed.

Cheers,

Steve Dunn
http://blog.dunnhq.com
Wednesday, April 29, 2009 4:28:46 PM UTC

- I am more into looking at source of complete apps than code snippets. It's important for me to learn how objects are interacting together and how a good app is built by design, use of design patterns, using the latest technologies.. etc.. So for this reason, I have downloaded source for several apps and I use the free Yahoo desktop Search (not available by Yahoo anymore. It's a free version of X1) to index all the source. This way I can quickly know how other coders used some method or class. Kinda my version of help. BTW, I find X1 the best desk search engine because I like the real time preview and it's fast. Better than Google desktop (hate the web interface), Copernic ..etc.
However I truly wish these tools can differentiate between something like "method()" & "method(..)". Meaning they can filter punctuations. It helps me filter out most of the irrelevant results.

- Need to train myself more to differentiate between code that looks cool and code that's over engineered! Sometimes I am impressed by some code but then I wonder if the coder ever heard of "keep things as simple as possible but not simpler".

- If anyone can suggest very good open source apps, I appreciate it. Very much prefer code with comments. I also prefer if you start with the most obscured ones. Cause I already have the popular ones many are recommending. :)
Abdu
Wednesday, April 29, 2009 7:36:00 PM UTC
Best comment I heard of when talking to my friend. He found some code in production that he apparently wrote when having a bad breakup with the girlfriend. I've paraphrased it a little bit but you'll get the idea

//This function takes one parameter and returns an array of objects
//The parameter can be defaulted to True just like my Ex Girlfriend
Function MyWhoreExGirlfriend(bool Variable)
Aaron
Friday, May 01, 2009 6:27:50 PM UTC
As a code pasting service, <a href="http://www.codepad.org>Codepad</a> works great. It even can run the code for you and provide the output, which can be a great way for the receiver of the code (say in IRC) to debug it and show that it works now.

Friday, May 01, 2009 6:33:30 PM UTC
Yikes, Codepad.
Thursday, May 14, 2009 9:15:59 PM UTC
PSD is (1) a binary format (2) designed in the early days of personal computing (3) for a Macintosh with a big-endian 68k processor (4) with about 20 years of features agglomerated onto it (5) by programmers who've long since cashed in their options and left.

So, let's see, what would be a more productive use of time: (a) Reverse-engineering PSD and then complaining about how hard it was (b) Sending in the damn form to get a copy of the spec.
Tom
Friday, May 15, 2009 9:11:25 AM UTC
I'm not sure if anyone has mentioned this before: Git is a *great* tool for looking at other people's code. The beauty of using Git as your source control is that you get the entire history of a project on your local machine. This makes it very easy to track the evolution of a system or component's design as you can step back in time without a lot of hassle.
If the code you're accessing is not in Git but in a CVS or Subversion repository you can get the same results by using git-svn or git-cvs.

Cheers, Chris Olivier
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.