Scott Hanselman

The Weekly Source Code 52 - You keep using that LINQ, I dunna think it means what you think it means.

June 18, '10 Comments [25] Posted in ASP.NET | Data | Learning .NET | LINQ | Source Code
Sponsored By

Remember good developers don't just write source code, they also READ it. You don't just become a great poet by writing lots of poems. Read and absorb as well. Do check out the Source Code category of my blog here, there is (as of today) 15 pages of posts on Source Code you can check out.

Recently my friend Jonathan Carter (OData Dude, my name for him) was working with a partner on some really weird stuff that was happening with a LINQ to SQL query. Remember that every abstraction sometimes leaks and that the whole port of an abstraction is "raise the level" so you don't have to worry about something.

Plumbing is great because it abstracts away water delivery. For all I know, there's a dude with a bucket who runs to my house when I turn on the tap. Doesn't matter to me, as long as I get water. However, sometimes something goes wrong with that dude, and I don't understand what's up with my water. This happened to JC and this partner.

In this example, we're using the AdventureWorks Sample Database to make this point. Here's some sample code the partner sent us to reproduce the weirdness.

protected virtual Customer GetByPrimaryKey(Func<customer, bool> keySelection)
{
AdventureWorksDataContext context = new AdventureWorksDataContext();

return (from r in context.Customers select r).SingleOrDefault(keySelection);
}

[TestMethod]
public void CustomerQuery_Test_01()
{
Customer customer = GetByPrimaryKey(c => c.CustomerID == 2);
}

[TestMethod]
public void CustomerQuery_Test_02()
{
AdventureWorksDataContext context = new AdventureWorksDataContext();
Customer customer = (from r in context.Customers select r).SingleOrDefault(c => c.CustomerID == 2);
}

CustomerQuery_Test_01 calls the GetByPrimaryKey method. That method takes a Func as a parameter. He's actually passing in a lamdba expression into the GetByPrimaryKey function. That makes the method reusable and is the beginning of some nice helper functions for his DAL (Data Access Layer). He's split up the query into two places. Seems reasonable, right?

Well, if you run this in Visual Studio - and in this example, I'll use the Intellitrace feature to see the actual SQL that was executed, although you can also use SQL Profiler - we see:

Wrong SQL in the Watch Window

Here's the query in text:

SELECT [t0].[CustomerID], [t0].[NameStyle], [t0].[Title], 
[t0].[FirstName], [t0].[MiddleName], [t0].[LastName],
[t0].[Suffix], [t0].[CompanyName], [t0].[SalesPerson],
[t0].[EmailAddress], [t0].[Phone], [t0].[PasswordHash],
[t0].[PasswordSalt], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Customer] AS [t0]

Um, where's the WHERE clause? Will LINQ to SQL kill my pets and cause me to lose my job? Does Microsoft suck? Let's take a look at the second query, called in CustomerQuery_Test_02():

SELECT [t0].[CustomerID], [t0].[NameStyle], [t0].[Title], 
[t0].[FirstName], [t0].[MiddleName], [t0].[LastName],
[t0].[Suffix], [t0].[CompanyName], [t0].[SalesPerson],
[t0].[EmailAddress], [t0].[Phone], [t0].[PasswordHash],
[t0].[PasswordSalt], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Customer] AS [t0]
WHERE [t0].[CustomerID] = @p0

OK, there it is, but why does the second LINQ query cause a WHERE clause to be emitted but the first doesn't? They look like basically the same code path, just one is broken up.

The first query is clearly returning ALL rows to the caller, which then has to apply the LINQ operators to do the WHERE in memory, on the caller. The second query is using the SQL Server (as it should) to do the filter, then returns WAY less data.

Here's the deal. Remember that LINQ cares about two things, IEnumerable stuff and IQueryable. The first lets you foreach over a collection, and the other includes all sorts of fun stuff that lets you query that stuff. Folks build on top of those with LINQ to SQL, LINQ to XML, LINQ to YoMomma, etc.

When you are working with something that is IQueryable; that is, the source is IQueryable, you need to make sure you are actually usually the operators for an IQueruable, otherwise you might fall back onto an undesirable result, as in this database case with IEnumerable. You don't want to return more data from the database to a caller than is absolutely necessary.

From JC, with emphasis mine:

The IQueryable version of SingleOrDefault, that takes a lambda, actually takes an Expression>, whereas the IEnumerable version, takes a Func. Hence, in the below code, the call to SingleOrDefault, is treating the query as if it was LINQ To Objects, which executes the query via L2S, then performs the SingleOrDefault on the in memory collection. If they changed the signature of GetByPrimaryKey to take an Expression>, it would work as expected.

What's a Func and what's an Expression? A Func<> (pronounced "Funk") represents a generic delegate. Like:

Func<int,int,double> divide=(x,y)=>(double)x/(double)y;
Console.WriteLine(divide(2,3));

And an Expression<> is a function definition that can be compiled and invoked at runtime. Example"

Expression<Func<int,int,double>> divideBody=(x,y)=>(double)x/(double)y;
Func<int,int,double> divide2=divideBody.Compile();
write(divide2(2,3));

So, the partner doesn't want a Func (a Func that takes a customer and returns a bool, they want a compliable Expression with a Func that takes a Customer and returns a bool. I'll have to add "using System.Linq.Expressions;" as well.

protected virtual Customer GetByPrimaryKey(Expression<Func<customer,bool>> keySelection)
{
AdventureWorksDataContext context = new AdventureWorksDataContext();

return (from r in context.Customers select r).SingleOrDefault(keySelection);

}

[TestMethod]
public void CustomerQuery_Test_01()
{
Customer customer = GetByPrimaryKey(c => c.CustomerID == 2);
}

[TestMethod]
public void CustomerQuery_Test_02()
{
AdventureWorksDataContext context = new AdventureWorksDataContext();
Customer customer = (from r in context.Customers select r).SingleOrDefault(c => c.CustomerID == 2);
}

Just changed that one line, so that GetByPrimaryKey takes a Expression> and I get the SQL I expected:

Corrected SQL in the Watch Window

Someone famous once said, "My code has no bugs, it runs exactly as I wrote it."

Layers of Abstraction are tricky, and you should always assert your assumptions and always look at the SQL that gets generated/created/executed by your DAL before you put something into production. Trust no one, except the profiler.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Friday, June 18, 2010 4:36:51 AM UTC
Wow, very informative post. I have to to check all my linq statements now [:(]
Friday, June 18, 2010 5:03:35 AM UTC
I think we're soulmates. I wrote a similar article about the perils of the ol' IQueryable a few weeks back.

Since IQueryable inherits from IEnumerable, and since an Expression<Func<>> and a Func<> seem to behave exactly the same way, it's pretty easy to inadvertently retrieve a whopping helping of records from your database without even realizing it--especially if you're new to L2S.

A sidenote which perhaps further illustrates the point: "from r in context.Customers select r" is equivalent to "context.Customers". So, if I'm not mistaken, you could just as easily write (assuming you're referencing System.Linq and System.Data.Linq):

context.Customers.SingleOrDefault(keySelection);




Friday, June 18, 2010 5:07:41 AM UTC
Very very good information about Linq. Need to rethink now onwards when writing linq statements. :)
Friday, June 18, 2010 7:23:25 AM UTC
Nice post! However, you seem to have a few typos here and there. The most common ones are

<Expression<func><int, int, double>> should be <Expression<Func<int, int, double>>

and

</int, int, double> and some other weird closing tags showing up in a couple of places.

Most of the typos are in the three code examples before the last screenshot from SQL Profiler, and in the text right before them.

Keep up the good job!
Friday, June 18, 2010 7:24:19 AM UTC
Very nice posting. If I may, I would just like to give you the hint of checking your spelling a little more. It was kind of hard for me to read the first time. Especially if english is not your native language. But anyways...nice and very informative posting! Thank you
Marcel
Friday, June 18, 2010 8:48:05 AM UTC
This could be the verry core of our perf issue we're having right now!
Love the timing :)
boriscallens
Friday, June 18, 2010 9:04:10 AM UTC
"a fync that takes a Customer...". What a very French thing to say.

Kidding aside, thanks for a very informative post. Again. Isn't it at least slightly annoying being so awesome all the time? Knowing that it would might alleviate the pain I feel every day over not being as awesome as some of you guys...
Friday, June 18, 2010 9:48:15 AM UTC
Who can tell me what's this : Func<Customer, bool="">
A optional parameter with default value? A empty string for a bool? I replace it with Func<Customer, bool>, and get the different result in my machine! Everything works well, I get "where" statement without using Expression!
Danny
Friday, June 18, 2010 11:40:04 AM UTC
I submitted SyntaxHighlighter feedback to Alex Gorbatchev on his forum regarding LINQ syntax bugs. Now I'm curious if those extra attributes mentioned above by lycken are yours or his. I gotta check out SyntaxHighlighter again to see if there has been progress. Otherwise, nice trick. Heh. ;-)

M Thomas
Friday, June 18, 2010 11:44:19 AM UTC
Ya, sorry, my blog keeps trying to close my LINQ expressions like they are HTML! I've made some changes. We'll see if they stick. Ctrl-F5 to refresh!
Friday, June 18, 2010 11:46:16 AM UTC
@Danny,

I'm curious as well. Again, my un-awesomness is shining through here, but I don't quite get that part.
Friday, June 18, 2010 11:58:27 AM UTC
Like you said reading code is very important. I immediately saw the issue of not having the Expression wrapping, because I have been reading the code for LINQ through Reflector. Which I recommend to everybody, reading the code of the .NET Framework is amazing, and gives good insight into what the code of a production framework with long life looks like.

Getting back to LINQ... To give a little more background on why the Func-only approach was not working is because the Expression wrapper allows the LINQ-provider (in this case LINQ to SQL0 to walk through your Func delegate and generate a SQL string. And without the ability to walk through the delegate and figure out that you have a Customer object with a property of CustomerId that should be equal to 2, it cannot generate the necessary SQL.

The key to understanding LINQ, more specifically IQueryable, is that LINQ is not compiled to SQL it's interpreted to SQL. And it's done so by creating a binary tree, you remember those from COMP SCI 101, of expressions that is walked by the provider. And each of these nodes can be any expression in this namespace.

http://msdn.microsoft.com/en-us/library/system.linq.expressions.aspx

It's very interesting stuff, defintily worth checking out.
Friday, June 18, 2010 12:45:00 PM UTC
I am not familiar with the 'YoMomma' data store.
Friday, June 18, 2010 1:29:50 PM UTC
Second what Nick Berardi says. Also don't be fooled by debuggers that force evaluations.
theCoach
Friday, June 18, 2010 1:41:48 PM UTC
I dunnow but it seems that LINQ has far more leaks than other abstractions. I've seen these kind of "bad LINQ leads to really bad SQL" posts quite a lot.
Friday, June 18, 2010 2:05:39 PM UTC
And THAT'S why it's so important to learn how things work, not just how to use it. I think that one trait is the difference between an average developer and a great one.
Friday, June 18, 2010 2:53:44 PM UTC
Will definitely add this article to my list of "performance enhancers/pitfalls" to look at.
Thank you Scott!
Friday, June 18, 2010 3:53:58 PM UTC
Great post! I hadn't thought of this before, but it makes perfect sense now that you mention it. This very well could have just saved me a day or two of debugging in the future.
Friday, June 18, 2010 4:09:46 PM UTC
Nice post Scott. What's funny is that I predicted that mistake from the first paragraph ;). As a self taught parser/compiler/interpreter enthusiast, the mistake was rather obvious; linq Expressions are simply ASTs. They can be walked using the Visitor Pattern to generate whatever you'd like: SQL, sexps, bytecode, etc. Funcs/delagates, on the other hand, are essentially pointers to compiled code. A linq provider couldn't make sense of that, so to expect a Func<T,bool> to work is a pretty big mistake.

Ultimately, as you noted, the func argument that he passed in resulted in the c# compiler using the IEnumerable extension method. Internally, that method will call IEnumerable.GetEnumerator, resulting in the query without the where clause.

I'd say that it was a misunderstanding of the core concepts that make linq work that caused this issue for Mr. Carter's partner, as opposed to a leaky abstraction.
Friday, June 18, 2010 7:55:57 PM UTC
As of this posting, the sentence "Just changed that one line, so that GetByPrimaryKey takes a Expression and I get the SQL I expected" still lacks balanced angle brackets, and you probably want to say "an Expression" rather than "a".

Also, I'm guessing that the whole "point" of an expression is to raise the level; you currently say that it's the whole port.

Apart from copyediting details, there are more problems here than your coders' grasp of L2S. Me, I had a sinking feeling from the first line, when they'd written a function called GetByPrimaryKey() that does not actually get by primary key, but merely applies whatever the heck function you give it. Then, when their second statement initialized an IDisposable that was not explicitly disposed, I knew we weren't exactly in for a tour de force of meticulousness.

But yeah, that's how L2S works. Yep. [Hock, spit.] Expressions, not strictly parsed until runtime. Is there any other way to understand it? (There are clearly some ways of programming it without understanding; I'm not asking about those.)

I'm with Charles Strahan. This doesn't seem like an example of two coders with a minor misunderstanding of the framework; the lesson here isn't that L2S is creaky and full of gotchas (though it is). This particular example seems like two coders with only a superficial understanding of what L2S is and does.
Friday, June 18, 2010 8:07:03 PM UTC
I have written test programs using Linq to Sql. While I understand the motivation for it, in practice, it seems too limiting. Once you start really using it, you run into quirks and limitations everywhere. Firstly, the SQL code that Linq to Sql generates is often bizarrely verbose and complicated, and hence of questionable efficiency. I have written very simply Linq queries, whose equivalent T-SQL is elementary, and the T-SQL generated by the query was so nested and complicated, that I literally could not understand a line of it, although it worked (somehow). Then, there is the limitation that Table objects must represent real objects in the database (they cannot represent the result of an arbitrary join across many tables, for example, the way a DataTable can. This forces retrieval of entire database objects where one or two fields are all that are needed. Finally, there are pages and pages of specific Linq methods and expressions that Linq to Sql either cannot handle, or that behave differently from Linq to Objects, because of limitations in the ability to translate Linq to T-SQL. What you described in your blog is just one more such gotcha. In the end, Linq to SQL is a means to write Vb/C# code rather than T-SQL, but with so many additional complicated limitations and learning curve that it is not worth it. Therefore, my recommendation is to write the best T-SQL you know how, fill the results into DataSets and DataTables, and use DataTableExtensions to execute Linq against the DataTables. That way you have all the flexibility, all the ability of Linq, and there are no peculiarities in the behavior of the Linq queries, and you always know exactly what you are getting, and it is much easier to debug.
Claudio
Friday, June 18, 2010 8:31:34 PM UTC
@Danny, I was curious about your question, so I posed it on StackOverflow here.

As far as Jon Skeet is concerned, you're talking about a piece of code that is not valid C#. So, to prove your case, do you have an example of a Func with an equals sign in it that does actually compile?

I feel like I'm asking for a tape of the Abominable Snowman here...
Friday, June 18, 2010 9:16:40 PM UTC
Good tip
Venkat
Monday, June 21, 2010 9:20:38 PM UTC
Very nice stuff Scott!
Sunday, July 11, 2010 5:11:31 PM UTC
Nice post, very useful.

I also have a post talking about the detail of how does the IQueryable<T> query methods' source code like, and how does the above whole things work:

Understanding LINQ to SQL (10) Implementing LINQ to SQL Provider

http://weblogs.asp.net/dixin/archive/2010/05/12/understanding-linq-to-sql-10-implementing-linq-to-sql-provider.aspx
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.