Scott Hanselman

Back to Basics: Assert your assumptions and diff your source code

March 13, '14 Comments [20] Posted in ASP.NET | Back to Basics
Sponsored By

I've done a whole series of "Back to Basics" posts that I encourage you to check out. Sometimes I'll do a post as a result of a reader emailing me for help and this is one such post.

A person emailed me with an ASP.NET app was behaving differently on his computer vs. another developer's computer.

On his machine when he hit a protected page foo.aspx?returnurl=http://mymachine.domain.com

he would get a FORM element like this:

<form action="foo.aspx?returnurl=http://mymachine.domain.com">

but on everyone else's machines their HTML was:

<form action="foo.aspx">

They debugging and were frustrated and eventually reached out. They said:

1. there's nothing going on in the aspx of login.aspx that would append the querystring.

2. there's nothing going on in the code-behind of the aspx that manipulates Form.Action or messes with the Page.Render in any way.

So, I'm stumped, because the querystring is included on my machine, but not on others. I've tried comparing IIS components, web.config differences, application pool runtime type, machine.config differences, possible differences in Modules for IIS (IISrewrite), but nothing is giving me love. 

I suggested that they assert assumptions and start diffing everything. You can see in the last paragraph that they're comparing stuff but I think you really have to diff everything.

When something "works here but not there" my answer is always, what has changed? What's different? If the answer is "nothing is different" I'm just gonna say it again:

"What's different?"

What are some things we can check?

  • Code
    • Do you know what's on disk?
    • Do you know what ended up in memory? These are different things.
  • Configuration
    • There's local and machine-wide config to check
  • Network Traffic
    • This is often overlooked. The Internet is not a black box, but you'd be surprised how few people hook up a packet sniffer or even just Fiddler to look at HTTP traffic.
    • I've talked to developers who have said "well, that's under SSL so I can't see it." Oh, my friend, if you only knew.

I had them do a sniff and see if there was a difference in HTTP traffic. My assumption was that the HTTP_REFERER HTTP header was different and there was some code that was causing the page to render differently.

We went back and forth over a few days and my reader became frustrated and just added this line in their app's Page_Load:

this.Form.Action = Request.Url.ToString();

Here they are basically reasserting the Form action by pulling it from the URL. It's gross and it's a hack. It's a Band-Aid on Cancer.

They then started looking at the source for ASP.NET proper and then decided to disassemble the code that was running on the other person's machine. They then started to assert their assumptions.

Is the code running what's on disk? For a compiled language, do the binaries reflect the source?

They looked in Temporary ASP.NET files at the compiled ASPX markup pages and found this.

//ExternalSource("D:\WebApplications\Foo\login.aspx",27)
__ctrl.Method = "post";

//ExternalSource("D:\WebApplications\Foo\login.aspx",27)
__ctrl.Action = "login.aspx";

What? Why is someone setting the FORM Action manually? And there's a line number.

They had diff compared all the source code but not the markup/views/html.

Their markup:

<form id="Form1" method="post" runat="server">

Other person's markup:

<form id="Form1" method="post" runat="server" action="Login.aspx">

The other person had hard-coded the action in their source markup. They'd been diffing everything but the markup.

When you are comparing two code-bases, make sure to compare everything or you might just lose a day or two like this person.

Thanks to my reader for sharing this and letting me in on this debugging adventure.

Related Links:


Sponsor: Big thanks to Red Gate for sponsoring the blog feed this week. Check out the Free Starter Edition of their release management tool! Deploy your SQL Server databases, .NET apps and services in a single, repeatable process with Red Gate’s Deployment Manager. Get started now with the free Starter Edition.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

NuGet Package(s) of the Week #12 - Accessing Google Spreadsheets with GData from C# *and* hosting Razor Templates to generate HTML from a Console App

January 10, '13 Comments [22] Posted in ASP.NET MVC | Back to Basics | NuGet | NuGetPOW | Open Source
Sponsored By

The Red Pump Project

Sometimes I write apps for charities on the side. Recently I've been doing some charity coding on the side for The Red Pump Project. They are a non-profit focused on raising awareness about the impact of HIV/AIDS on women and girls. I encourage you to check them out, donate some money, or join their mailing list.

Side Note: Folks often ask me how they can get more experience and wonder if Open Source is a good way. It is! But, charities often need code too! You may be able to have the best of both worlds. Donate your time and your code...and work with them to open source the result. Everyone wins. You get knowledge, the charity get results, the world gets code.

Anyway, this charity has a Google Spreadsheet that holds the results of a survey of users they take. You can create a Form from a Google Spreadsheet; it's a very common thing.

In the past, they've manually gone into the spreadsheet and copied the data out then painstakingly - manually - wrapped the data with HTML tags and posted donors names (who have opted in) to their site.

It got the point where this tedium was the #1 most hated job at The Red Pump Project. They wanted to recognize donors but they aren't large enough yet to have a whole donation platform CRM, instead opting to use Google Apps and free tools.

I figured I could fix this and quickly. Over a one hour Skype last night with Luvvie, one of The Red Pump Founders, we 'paired' (in that I wrote code and she validated the results as I typed) and made a little app that would loop through a Google Spreadsheet and make some HTML that was then uploaded to a webserver and used as a resource within their larger blogging platform.

Yes there are lots of simpler and better ways to do this but keep in mind that this is the result of a single hour, paired directly with the "on site customer" and they are thrilled. It also gives me something to build on. It could later to moved into the cloud, automated, moved to the server side, etc. One has to prioritize and this solution will save them dozens of hours of tedious work this fund raising season.

Here's our hour.

Step 1 - Access Google Spreadsheet via GDATA and C#

I was not familiar with the Google GData API but I knew there was one.  I made a console app and downloaded the Google Data API installer. You can also get them via NuGet:

image

I added references to Google.GData.Client, .Extensions, and .Spreadsheets. Per their documentation, you have to walk and object model, traversing first to find the Spreadsheet with in your Google Drive, then the Worksheet within a single Spreadsheet, and then the Rows and Columns as Cells within the Worksheet. Sounds like moving around a DOM. Get a reference, save it, dig down, continue.

So, that's Drive -> Spreadsheet -> Worksheet -> Cells (Rows, Cols)

The supporters of the Red Pump Project call themselves "Red Pump Rockers" so I have a class to hold them. I want their site, url and twitter. I have a "strippedSite" property which will be the name of their site with only valid alphanumerics so I can make an alphabet navigator later and put some simple navigation in a sorted list.

public class Rocker
{
public string site { get; set; }
public string strippedSite { get; set; }
public string url { get; set; }
public string twitter { get; set; }
}

Again, this is not my finest code but it works well given constraints.

var rockers = new List<Rocker>();

SpreadsheetsService myService = new SpreadsheetsService("auniquename");
myService.setUserCredentials(gmaillogin@email.com, "password");

// GET THE SPREADSHEET from all the docs
SpreadsheetQuery query = new SpreadsheetQuery();
SpreadsheetFeed feed = myService.Query(query);

var campaign = (from x in feed.Entries where x.Title.Text.Contains("thetitleofthesheetineed") select x).First();

// GET THE first WORKSHEET from that sheet
AtomLink link = campaign.Links.FindService(GDataSpreadsheetsNameTable.WorksheetRel, null);
WorksheetQuery query2 = new WorksheetQuery(link.HRef.ToString());
WorksheetFeed feed2 = myService.Query(query2);

var campaignSheet = feed2.Entries.First();

// GET THE CELLS

AtomLink cellFeedLink = campaignSheet.Links.FindService(GDataSpreadsheetsNameTable.CellRel, null);
CellQuery query3 = new CellQuery(cellFeedLink.HRef.ToString());
CellFeed feed3 = myService.Query(query3);

uint lastRow = 1;
Rocker rocker = new Rocker();

foreach (CellEntry curCell in feed3.Entries) {

if (curCell.Cell.Row > lastRow && lastRow != 1) { //When we've moved to a new row, save our Rocker
rockers.Add(rocker);
rocker = new Rocker();
}

//Console.WriteLine("Row {0} Column {1}: {2}", curCell.Cell.Row, curCell.Cell.Column, curCell.Cell.Value);

switch (curCell.Cell.Column) {
case 4: //site
rocker.site = curCell.Cell.Value;
Regex rgx = new Regex("[^a-zA-Z0-9]"); //Save a alphanumeric only version
rocker.strippedSite = rgx.Replace(rocker.site, "");
break;
case 5: //url
rocker.url = curCell.Cell.Value;
break;
case 6: //twitter
rocker.twitter = curCell.Cell.Value;
break;
}
lastRow = curCell.Cell.Row;
}

var sortedRockers = rockers.OrderBy(x => x.strippedSite).ToList();

At this point I have thousands of folks who "Rock The Red Pump" in a list called sortedRockers, sorted by site A-Z. I'm ready to do something with them.

Step 2 - Generate HTML (first wrong, then later right with Razor Templates)

They wanted a list of website names linked to their sites with an optional twitter name like:

Scott's Blog - @shanselman

I started (poorly) with a StringBuilder. *Gasp*

This is a learning moment, because it was hard and it was silly and I wasted 20 minutes of Luvvie's time. Still, it gets better, keep reading.

Here's what I wrote, quickly, and first. Don't judge, I'm being honest here.

foreach (Rocker r in sortedRockers){
string strippedName = r.strippedSite;

if (char.ToUpperInvariant(lastCharacter) != char.ToUpperInvariant(strippedName[0])) {
sb.AppendFormat("<h2><a name=\"{0}\">{0}</a></h2>", char.ToUpperInvariant(strippedName[0]));
}

sb.AppendFormat("<a href=\"{1}\" target=\"_blank\">{0}</a>", r.site, r.url);

if (!string.IsNullOrWhiteSpace(r.twitter)){
r.twitter = r.twitter.Replace("@", "");
sb.AppendFormat(" &mdash; <a href=\"http://twitter.com/{0}\">@{0}</a>", r.twitter);
}
sb.AppendFormat("<br>");

lastCharacter = strippedName[0];
}
sb.AppendFormat("</body></html>");

This works fine. It's also nuts and hard to read. Impossible to debug and generally confusing. Luvvie was patient but I clearly lost her here.

I realized that I should probably have used Razor Templating from within my Console App for this. I asked on StackOverflow as well.

UPDATE: There's a great answer by Demis from ServiceStack on StackOverflow showing how to use ServiceStack and Razor to generate HTML from Razor templates.

I ended up using RazorEngine, largely because of the promise of their first two lines of code on their original home page.  There is also RazorMachine, Nancy, and a post by Andrew Nurse (author of much of Razor itself) as other options.

RazorEngine in NuGet

But, these two lines right at their top of their site were too enticing to ignore.

string template = "Hello @Name.Name! Welcome to Razor!";
string result = Razor.Parse(template, new { Name = "World" });

(Open Source Developers Take Heed: Where's the easy quickstart code sample on your home page?)

This allowed me to change all that StringBuilder goo above into a nice clear Razor template in a string. I also added the alphabet navigation and the letter headers easily.

<html><head><link rel="stylesheet"" href="style.css" type="text/css" media="screen"/></head><body>

//Navigation - A B C D, etc.
@foreach(char x in ""ABCDEFGHIJKLMNOPQRSTUVWXYZ"".ToList()) {
<a href=""#@x"">@x</a>
}

@functions {
//need @functions because I need this variable in a wider scope
char lastCharacter = '0';
}

@foreach(var r in Model) {
var theUpperChar = char.ToUpperInvariant(r.strippedSite[0]);

//Make a capital letter "heading" when letters change
if (lastCharacter != theUpperChar) {
<h2><a name="@theUpperChar">@theUpperChar</a></h2>
}

<a href="@r.url" target="_blank">@r.site</a>

if (!string.IsNullOrWhiteSpace(r.twitter)) {
var twitter = r.twitter.Replace("@", String.Empty);
<text>&mdash;</text> <a href="http://twitter.com/@twitter">@twitter</a>
}
<br/>
lastCharacter = theUpperChar;
}
</body></html>

And the "do it" code ended up being:

string result = Razor.Parse(template, sortedRockers);
File.WriteAllText("2013list.html", result);

StringBuilders are fine, to a point. When it gets hairy, consider a templating engine of some kind.

Step 3 - Upload a File with FTP with C#

Now what? They want the little app to upload the result. Mads Kristensen to the rescue 7 years ago!

private static void Upload(string ftpServer, string userName, string password, string filename)
{
using (System.Net.WebClient client = new System.Net.WebClient())
{
client.Credentials = new System.Net.NetworkCredential(userName, password);
client.UploadFile(ftpServer + "/" + new FileInfo(filename).Name, "STOR", filename);
}
}

Then it's just

Upload("ftp://192.168.1.1", "UserName", "Password", @"2013list.html");

Conclusion

You can see that this is largely a spike, but it's a spike that solves a problem, today. Later we can build on it, move it to a server process, build a front end on it, and come up with more ways for them to keep  using tools like Google Spreadsheet while better integrating with their existing websites.

Consider donating your coding time to your favorite local charity today!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Comparing two techniques in .NET Asynchronous Coordination Primitives

December 11, '12 Comments [13] Posted in Back to Basics | Learning .NET
Sponsored By

Last week in my post on updating my Windows Phone 7 application to Windows 8 I shared some code from Michael L. Perry using a concept whereby one protects access to a shared resource using a critical section in a way that works comfortably with the new await/async keywords. Protecting shared resources like files is a little more subtle now that asynchronous is so easy. We'll see this more and more as Windows 8 and Windows Phone 8 promote the idea that apps shouldn't block for anything.

After that post, my friend and mentor (he doesn't know he's my mentor but I just decided that he is just now) Stephen Toub, expert on all things asynchronous, sent me an email with some excellent thoughts and feedback on this technique. I include some of that email here with permission as it will help us all learn!

I hadn’t seen the Awaitable Critical Section helper you mention below before, but I just took a look at it, and while it’s functional, it’s not ideal.  For a client-side solution like this, it’s probably fine.  If this were a server-side solution, though, I’d be concerned about the overhead associated with this particular implementation.

I love Stephen Toubs's feedback in all things. Always firm but kind. Stephen Cleary makes a similar observation in the comments and also points out that immediately disabling the button works too. ;) It's also worth noting that Cleary's excellent AsyncEx library has lots of async-ready primitives and supports both Windows Phone 8 and 7.5.

The SemaphoreSlim class was updated on .NET 4.5 (and Windows Phone 8) to support async waits. You would have to build your own IDisposable Release, though. (In the situation you describe, I usually just disable the button at the beginning of the async handler and re-enable it at the end; but async synchronization would work too).

Ultimately what we're trying to do is create "Async Coordination Primitives" and Toub talked about this in February.

Here's in layman's terms what we're trying to do, why it's interesting and a definition of a Coordinate Primitive (stolen from MSDN):

Asynchronous programming is hard because there is no simple method to coordinate between multiple operations, deal with partial failure (one of many operations fail but others succeed) and also define execution behavior of asynchronous callbacks, so they don't violate some concurrency constraint. For example, they don't attempt to do something in parallel. [Coordination Primitives] enable and promote concurrency by providing ways to express what coordination should happen.

In this case, we're trying to handled locking when using async, which is just one kind of coordination primitive. From Stephen Toub's blog:

Here, we’ll look at building support for an async mutual exclusion mechanism that supports scoping via ‘using.’

I previously blogged about a similar solution (http://blogs.msdn.com/b/pfxteam/archive/2012/02/12/10266988.aspx), which would result in a helper class like this:

Here Toub uses the new lightweight SemaphoreSlim class and indulges our love of the "using" pattern to create something very lightweight.

public sealed class AsyncLock
{
private readonly SemaphoreSlim m_semaphore = new SemaphoreSlim(1, 1);
private readonly Task<IDisposable> m_releaser;

public AsyncLock()
{
m_releaser = Task.FromResult((IDisposable)new Releaser(this));
}

public Task<IDisposable> LockAsync()
{
var wait = m_semaphore.WaitAsync();
return wait.IsCompleted ?
m_releaser :
wait.ContinueWith((_, state) => (IDisposable)state,
m_releaser.Result, CancellationToken.None,
TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
}

private sealed class Releaser : IDisposable
{
private readonly AsyncLock m_toRelease;
internal Releaser(AsyncLock toRelease) { m_toRelease = toRelease; }
public void Dispose() { m_toRelease.m_semaphore.Release(); }
}
}

How lightweight and how is this different from the previous solution? Here's Stephen Toub, emphasis mine.

There are a few reasons I’m not enamored with the referenced AwaitableCriticalSection solution. 

First, it has unnecessary allocations; again, not a big deal for a client library, but potentially more impactful for a server-side solution.  An example of this is that often with locks, when you access them they’re uncontended, and in such cases you really want acquiring and releasing the lock to be as low-overhead as possible; in other words, accessing uncontended locks should involve a fast path.  With AsyncLock above, you can see that on the fast path where the task we get back from WaitAsync is already completed, we’re just returning a cached already-completed task, so there’s no allocation (for the uncontended path where there’s still count left in the semaphore, WaitAsync will use a similar trick and will not incur any allocations).

Lots here to parse. One of the interesting meta-points is that a simple client-side app with a user interacting (like my app) has VERY different behaviors than a high-throughput server-side application. Translation? I can get away with a lot more on the client side...but should I when I don't have to?

His solution requires fewer allocations and zero garbage collections.

Overall, it’s also just much more unnecessary overhead.  A basic microbenchmark shows that in the uncontended case, AsyncLock above is about 30x faster with 0 GCs (versus a bunch of GCs in the AwaitableCriticalSection example.  And in the contended case, it looks to be about 10-15x faster.

Here's the microbenchmark comparing the two...remembering of course there's, "lies, damned lies, and microbenchmarks," but this one is pretty useful. ;)

class Program
{
static void Main()
{
const int ITERS = 100000;
while (true)
{
Run("Uncontended AL ", () => TestAsyncLockAsync(ITERS, false));
Run("Uncontended ACS", () => TestAwaitableCriticalSectionAsync(ITERS, false));
Run("Contended AL ", () => TestAsyncLockAsync(ITERS, true));
Run("Contended ACS", () => TestAwaitableCriticalSectionAsync(ITERS, true));
Console.WriteLine();
}
}

static void Run(string name, Func<Task> test)
{
var sw = Stopwatch.StartNew();
test().Wait();
sw.Stop();
Console.WriteLine("{0}: {1}", name, sw.ElapsedMilliseconds);
}

static async Task TestAsyncLockAsync(int iters, bool contended)
{
var mutex = new AsyncLock();
if (contended)
{
var waits = new Task<IDisposable>[iters];
using (await mutex.LockAsync())
for (int i = 0; i < iters; i++)
waits[i] = mutex.LockAsync();
for (int i = 0; i < iters; i++)
using (await waits[i]) { }
}
else
{
for (int i = 0; i < iters; i++)
using (await mutex.LockAsync()) { }
}
}

static async Task TestAwaitableCriticalSectionAsync(int iters, bool contended)
{
var mutex = new AwaitableCriticalSection();
if (contended)
{
var waits = new Task<IDisposable>[iters];
using (await mutex.EnterAsync())
for (int i = 0; i < iters; i++)
waits[i] = mutex.EnterAsync();
for (int i = 0; i < iters; i++)
using (await waits[i]) { }
}
else
{
for (int i = 0; i < iters; i++)
using (await mutex.EnterAsync()) { }
}
}
}

Stephen Toub is using Semaphore Slim, the "lightest weight" option available, rather than RegisterWaitForSingleObject:

Second, and more importantly, the AwaitableCriticalSection is using a fairly heavy synchronization mechanism to provide the mutual exclusion.  The solution is using Task.Factory.FromAsync(IAsyncResult, …), which is just a wrapper around ThreadPool.RegisterWaitForSingleObject (see http://blogs.msdn.com/b/pfxteam/archive/2012/02/06/10264610.aspx).  Each call to this is asking the ThreadPool to have a thread block waiting on the supplied ManualResetEvent, and then to complete the returned Task when the event is set.  Thankfully, the ThreadPool doesn’t burn one thread per event, and rather groups multiple events together per thread, but still, you end up wasting some number of threads (IIRC, it’s 63 events per thread), so in a server-side environment, this could result in degraded behavior.

All in all, a education for me - and I hope you, Dear Reader - as well as a few important lessons.

  • Know what's happening underneath if you can.
  • Code Reviews are always a good thing.
  • Ask someone smarter.
  • Performance may not matter in one context but it can in another.
  • You can likely get away with this or that, until you totally can't. (Client vs. Server)

Thanks Stephen Toub and Stephen Cleary!

Related Reading

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

The Internet is not a black box. Look inside.

November 12, '12 Comments [31] Posted in Back to Basics | Musings
Sponsored By

All too often I see programmers trying to solve their problems on the internet by blindly "flipping switches."

Change something, hit refresh in the browser. "Why is that cached? What's going on?" Change something else, hit refresh in the browser. "What's the deal?"

You may have heard the term "cargo cult programming" where islanders after World War II would wave sticks hoping that planes full of supplies would fly over. They drew a conclusion that the sticks waving caused the planes to come.

Think about abstractions. This is a good reminder for the beginner and the long-time expert. This applies not just to computers but to cars, light bulbs, refrigerators and more.

What are you not seeing? Look underneath.

When coding on the web, remember that effectively NOTHING is hidden from you.

A friend emailed with a question about some CSS files not caching. This is a smart guy with a long question about a confusing behavior in the browser. I asked - as I often ask - what's happening underneath? Did you look inside?

Are you using Fiddler? Did you press F12 in your browser of choice and explore their network tools? Are you using WireShark?

Literally this moment, as I am writing this post, I just noticed that the Twitter box on my blog here doesn't have my latest tweet embedded.

Where's my tweet?

I could hit refresh a bunch of times, google around for vague terms, email a friend, or I could look inside.

I hit F12 in my browser. I look at the Network tab, and sort by Status.

Remember to use the Network Tools in your browser

Hey, suddenly my Twitter API call is a 404. First, that's lame of them. They should have redirected me, but alas, no one respects the permalink anymore. #getoffmylawn

With this single  insight I am now armed with googleable terms. I do a single search for "twitter user timeline json api" and see at the Twitter Developer Center that they've changed the format to included "api." and a version number.

I change my template to call this changed URL https://api.twitter.com/1/statuses/user_timeline/shanselman.json?callback=twitterCallback2&count=10 instead, and hit Refresh in my browser, once.

There's my tweet?

There's my tweet. No joke, this just happened. Good timing, I think.

You decide how deep you want to go down the rabbit hole. I am not expecting everyone to be a neurosurgeon or a professional network engineer but I firmly believe that digging just one layer deeper in all things will enhance your life and your work.

Learn basic HTTP debugging and ALWAYS check your result codes. Even if you are a non-technical blogger, learn how to check for 404s and 301s and 500s and assert your assumptions.

The world - and the internet - is not a black box. Look inside.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb

Back To Basics: You aren't smarter than the compiler. (plus fun with Microbenchmarks)

June 26, '12 Comments [36] Posted in Back to Basics
Sponsored By

Microbenchmarks are evil. Ya, I said it. Folks spend hours in tight loops measuring things trying to find out the "best way" to do something and forget that while they are changing 5ms between two techniques they've missed the 300ms Database Call or the looming N+1 selects issue that has their ORM quietly making even more database calls.

My friend Sam Saffron says we should "take global approach to optimizations." Sam cautions us to avoid trying to be too clever.

// You think you're slick. You're not.
// faster than .Count()? Stop being clever.
var count = (stuff as ICollection<int>).Count;

All that said, let's argue microbenchmark, shall we? ;)

I did a blog post a few months back called "Back to Basics: Moving beyond for, if and switch" and as with all blog posts where one makes a few declarative statement (or shows ANY code at all, for that matter) it inspired some spirited comments. The best of them was from Chris Rigter in defense of LINQ.

I started the post by showing this little bit of counting code:

var biggerThan10 = new List;
for (int i = 0; i < array.Length; i++){
if (array [i] > 10)
biggerThan10.Add (array[i]);
}

and then changed it into LINQ which can be either of these one liners

var a = from x in array where x > 10 select x; 
var b = array.Where(x => x > 10);

and a few questions came up like this one from Teusje:

"does rewriting code to one line make your code faster or slower or is it not worth talking about these nanoseconds?"

The short answer is, measure it. The longer answer is measure it yourself. You have the power to profile your code. If you don't know what's happening, profile it. There's some interesting discussion on benchmarking small code samples over on this StackOverflow question.

Now, with all kinds of code like this folks go and do microbenchmarks. This usually means doing something trivial a million times in a tight loop. That's lots of fun and I'm doing to do JUST that very thing right now with Chris's good work, but it's important to remember that your code is usually NOT doing something trivial a million times in a tight loop. Unless it is.

Knaģis says:

"Unfortunately LINQ has now created a whole generation of coders who completely ignores any perception of writing performant code. for/if are compiled into nice machine code, whereas .Where() creates instances of enumerator class and then iterates through that instance using MoveNext method...Please, please do not advocate for using LINQ to produce shorter, nicer to read etc. code unless it is accompanied by warning that it affects performance"

I think that LINQ above could probably be replaced with "datagrids" or "pants" or "google" or any number of conveniences but I get the point. Some code is shown in the comments where LINQ appears to be 10x slower. I can't reproduce his result.

Let's take Chris's comment and deconstruct it. First, taking an enumerable Range as an array and spinning through it.

var enumerable = Enumerable.Range(0, 9999999);
var sw = new Stopwatch();
int c = 0;

// approach 1

sw.Start();
var array = enumerable.ToArray();
for (int i = 0; i < array.Length; i++)
{
if (array[i] > 10)
c++;
}
sw.Stop();
Console.WriteLine("Enumerable.ToArray()");
Console.WriteLine(c.Dump());
Console.WriteLine(sw.ElapsedMilliseconds.Dump());

The "ToArray()" part takes 123ms and the for loop takes 9ms on my system. Arrays are super fast.

Starting from the enumerable itself (not the array!) we can try the Count() one liner:

// approach 2
Console.WriteLine("Enumerable.Count()");
sw.Restart();
c = enumerable.Count(x => x > 10);
sw.Stop();
Console.WriteLine(c.Dump());
Console.WriteLine(sw.ElapsedMilliseconds.Dump());

It takes 86ms.

I can try it easily in Parallel over 12 processors but it's not a large enough sample nor is it doing enough work to justify the overhead.

// approach 3
Console.WriteLine("Enumerable.AsParallel() (12 procs)");
sw.Restart();
c = enumerable.AsParallel().Where(x => x > 10).Count();
sw.Stop();
Console.WriteLine(c.Dump());
Console.WriteLine(sw.ElapsedMilliseconds.Dump());

It adds overhead and takes 129ms. However, you see how easy it was to try a naïve parallel loop in this case. Now you know how to try it (and measure it!) in your own tests.

Next, let's do something stupid and tell LINQ that everything is an object so we are forced to do a bunch of extra work. You'd be surprised (or maybe you wouldn't) how often you find code like this in production. This is an example of coercing types back and forth and as you can see, you'll pay the price if you're not paying attention. It always seems like a good idea at the time, doesn't it?

//Approach 4 - Type Checking?
Console.WriteLine("Enumerable.OfType(object) ");
var objectEnum = enumerable.OfType<object>().Concat(new[] { "Hello" });
sw.Start();
var objectArray = objectEnum.ToArray();
for (int i = 0; i < objectArray.Length; i++)
{
int outVal;
var isInt = int.TryParse(objectArray[i].ToString(), out outVal);
if (isInt && Convert.ToInt32(objectArray[i]) > 10)
c++;
}
sw.Stop();
Console.WriteLine(c.Dump());
Console.WriteLine(sw.ElapsedMilliseconds.Dump());

That whole thing cost over 4 seconds. 4146ms in fact. Avoid conversions. Tell the compiler as much as you can up front so it can be more efficient, right?

What if we enumerate over the types with a little hint of extra information?

// approach 5
Console.WriteLine("Enumerable.OfType(int) ");
sw.Restart();
c = enumerable.OfType<int>().Count(x => x > 10);
sw.Stop();
Console.WriteLine(c.Dump());
Console.WriteLine(sw.ElapsedMilliseconds.Dump());

Nope, the type check wasn't necessarily in this case. It took 230ms and added overhead. What if this was parallel?

// approach 6
Console.WriteLine("Enumerable.AsParallel().OfType(int) ");
sw.Restart();
c = enumerable.AsParallel().OfType<int>().Where(x => x > 10).Count();
sw.Stop();
Console.WriteLine(c.Dump());
Console.WriteLine(sw.ElapsedMilliseconds.Dump());

That's 208ms, consistently. Slightly faster, but ultimately I shouldn't be doing unnecessary work.

In this simple example of looping over something simple, my best bet turned out to be either the Array (super fast if it was an Array to start) or a simple Count() with LINQ. I measured, so I would know what was happening, but in this case the simplest thing also performed the best.

What's the moral of this story? Measure and profile and make a good judgment. Microbenchmarks are fun and ALWAYS good for an argument but ultimately they exists only so you can know your options, try a few, and pick the one that does the least work. More often than not (not always, but usually) the compiler creators aren't idiots and more often than not the simplest syntax will the best one for you.

Network access, database access, unnecessary serializations, unneeded marshaling, boxing and unboxing, type coercion - these things all take up time. Avoid doing them and when do you do them, don't just know why you're doing them, but also that you are doing them.

Is it fair to say "LINQ is evil and makes things slow?" No, it's fair to say that code in general can be unintuitive if you don't know what's going on. There can be subtle side-effects whose time can get multiplied inside of a loop. This includes type checking, type conversion, boxing, threads and more.

The Rule of Scale: The less you do, the more you can do of it.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb
Page 1 of 7 in the Back to Basics category Next Page

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.