Scott Hanselman

Google Code Search - Now you can search the Bathroom Wall of Code

October 6, '06 Comments [8] Posted in Programming
Sponsored By

Everyone is agog about Google searching code.

I find the language detection stuff to be really interesting. Are they using heuristics or just the file extension to figure out what language the code is? Probably extensions, but it'd be clever if they also used code keywords to guess.

One point that I think should be addressed by a future version is tuning of relevance data. If you search for DasBlog (not a really valid code search), you'll find folks that reference DasBlog libraries, and source inside ZIP files, but not the ACTUAL source at the ACTUAL location. It'd be nice to see them understand where the authoritative source of source is.

A few advanced tricks are:

  • Restrict search to "C"-based languages
    • lang:^(c|c#|c\+\+)$
  • Avoid GPLed code
    • -license:gpl

You can also include Google Code Search on your own side as a GDATA (~GoogleRSS) feed.  However, you can't restrict code searches by site using site: which is a bummer and limits its usefulness. I'd like to be able to have folks search for source on my blog.

Be sure to read their Google Code Search FAQ if you want to block them (robots.txt) from crawling your code.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Friday, October 06, 2006 5:38:35 PM UTC
Hey Scott, I thought you might be interested in some of the more nefarious uses of this service that are now easily available to the wrong hands. I was doing some looking around and it appears that a lot of username and passwords can be found with the service.

See here for more:
http://midspot.wordpress.com/2006/10/06/google-code-search-reveals-too-much/

Cheers
Friday, October 06, 2006 5:43:19 PM UTC
I wonder if sourceforge has a robots.txt or has told Google in some other way not to crawl their repositories. I tried the search: file:assemblyinfo.cs dasblog

It found the code zipped up in the nightly builds on dasblog.info - but nothing from SF.

Not including sourceforge results seems like a major shortcoming.
Friday, October 06, 2006 5:48:58 PM UTC
Jon - is that nefarious, or just a way to search for poor coding practices? It'd be interesting to have someone do an analysis of the world's source to see how much "sucks." Basically static analysis of the planet for best practices. I suspect we'd not do well.
Friday, October 06, 2006 6:00:52 PM UTC
Hey, a new kind of ego search. Check this out:
http://www.google.com/codesearch?hl=en&lr=&q=hanselman

I like the reference to your blog entry on P3P about using an HttpModule.

Neat, now you can find your name in other peoples code.
Friday, October 06, 2006 8:10:40 PM UTC
http://koders.com/

The above link takes you to another code search engine that was around long before the Google offering, so it's not exactly a new thing.
David Morris
Friday, October 06, 2006 11:51:08 PM UTC
Hi Scott,

"you can't restrict code searches by site using site: which is a bummer and limits its usefulness"
I'm wondering if the API to the code search has a workaround ?

BR,
~A
Saturday, October 07, 2006 3:34:17 AM UTC

www.krugle.com is another code search engine.
mustard76
Saturday, October 07, 2006 7:43:29 PM UTC
http://www.codefetch.com searches code from books.

http://www.codase.com. I tried that but server gives error messages. Home page doesn't indicate it searches by C# or VB.NET. News Dates go bacl tp 1995 so I am not sure what's the deal with it.

Abdu
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.