Scott Hanselman

Reactive Extensions (Rx) is now Open Source

November 6, '12 Comments [10] Posted in LINQ | Open Source
Sponsored By

A few years back I did a podcast with Erik Meijer about Reactive Extensions for .NET (Rx). Since then thousands of people have enjoyed using Rx in the projects and a number of open source projects like ReactiveUI (also on the podcast) have popped up around it. Even GitHub for Windows uses Reactive Extensions. In fact, GitHub uses Rx a LOT in their Windows product. My friend Paul at GitHub says they liked the model so much they made a Mac version!

“GitHub for Windows uses the Reactive Extensions for almost everything it does, including network requests, UI events, managing child processes (git.exe). Using Rx and ReactiveUI, we've written a fast, nearly 100% asynchronous, responsive application, while still having 100% deterministic, reliable unit tests. The desktop developers at GitHub loved Rx so much, that the Mac team created their own version of Rx and ReactiveUI, called ReactiveCocoa, and are now using it on the Mac to obtain similar benefits.” – Paul Betts, GitHub

Today, Microsoft Open Technologies announced the open sourcing of Reactive Extensions! You can get the code with git up on Codeplex at https://rx.codeplex.com. You can’t stop the open source train! Congrats to the team!

There’s a LOT included, so be stoked. It’s not just Rx.NET, but also the C++ library as well as RxJS for JavaScript! Now everyone gets to play with IObservable<T> and IObserver<T>.

  • Reactive Extensions:
    • Rx.NET: The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators.
    • RxJS: The Reactive Extensions for JavaScript (RxJS) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators in JavaScript which can target both the browser and Node.js.
    • Rx++: The Reactive Extensions for Native (RxC) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators in both C and C++.
  • Interactive Extensions
    • Ix: The Interactive Extensions (Ix) is a .NET library which extends LINQ to Objects to provide many of the operators available in Rx but targeted for IEnumerable<T>.
    • IxJS: An implementation of LINQ to Objects and the Interactive Extensions (Ix) in JavaScript.
    • Ix++: An implantation of LINQ for Native Developers in C++

A great way to learn about why Rx is useful is to check out the Rx Koan’s project or to read the IntroToRx online e-book.

Why do I think Rx matters? It’s a way to do asynchronous operations on event streams. Rather than hooking up click events and managing state with event handlers all over, you effectively “query” an infinite stream of events with LINQ. You can declaratively sequence events…no flags, no state machine.

For example, here’s a dragging event created (composed) via Mouse button and Mouse move events:

IObservable<Event<MouseEventArgs>> draggingEvent =
from mouseLeftDownEvent in control.GetMouseLeftDown()
from mouseMoveEvent in control.GetMouseMove().Until(control.GetMouseLeftUp())
select mouseMoveEvent;

Even better, Rx makes it easier (or possible!) to create event-based tests that are asynchronous, like this example from Jafar Husain:

Rating rating = new Rating();
IObservable<Unit> test = // Unit is an object that represents null.
ObservableExtensions
.DoAsync(() => TestPanel.Children.Add(rating))
.WaitFor(TestPanel.GetLayoutUpdated()) // Extension method GetLayoutUpdated converts the event to observable
.DoAsync(() => rating.Value = 1.0) // Calls the Ignite EnqueueCallback method
.WaitFor( // waits for an observable to raise before going on
// listen to all the actual value change events and filters them until ActualValue reaches Value
rating
.GetActualValueChanged() // extension method that converts ActualValueChanged event to IObservable
.SkipWhile(actualValueChangedEvent => actualValueChangedEvent.EventArgs.NewValue != rating.Value))
// check to make sure the actual value of the rating item is set appropriately now that the animation has completed
.Assert(() => rating.GetRatingItems().Last().ActualValue == 1.0) // crawls the expression tree and makes a call to the appropriate Assert method

Test.Subscribe(() => TestPanel.Children.Remove(rating)); //run the test and clean up at the end.

There’s amazing Time-related operators that let you simulate events over time. Note the Buffer and Subscribe calls.

var myInbox = EndlessBarrageOfEmail().ToObservable();

// Instead of making you wait 5 minutes, we will just check every three seconds instead. :)
var getMailEveryThreeSeconds = myInbox.Buffer(TimeSpan.FromSeconds(3)); // Was .BufferWithTime(...

getMailEveryThreeSeconds.Subscribe(emails =>
{
Console.WriteLine("You've got {0} new messages! Here they are!", emails.Count());
foreach (var email in emails)
{
Console.WriteLine("> {0}", email);
}
Console.WriteLine();
});

You can use await and async, like in this example returning the number 42 after 5 seconds:

static async void button_Click()
{
int x = await Observable.Return(42).Delay(TimeSpan.FromSeconds(5));
// x with value 42 is returned after 5 seconds
label.Text = x.ToString();
}
I’m just showing you the parts that tickle me, but one could easily teach a 10 week university course on Rx, and I’m still a beginner myself!

Here’s some more resources to check out about Rx. Congrats to the team for their contribution to Open Source!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web

Back to Basics: Big O notation issues with older .NET code and improving for loops with LINQ deferred execution

September 15, '11 Comments [58] Posted in Back to Basics | Learning .NET | LINQ
Sponsored By

Earlier today Brad Wilson and I were discussing a G+ post by Jostein Kjønigsen where he says, "see if you can spot the O(n^2) bug in this code."

public IEnumerable<Process> GetProcessesForSession(string processName, int sessionId)
{
var processes = Process.GetProcessByName(processName);
var filtered = from p in processes
where p.SessionId == sessionId
select p;
return filtered;
}

This is a pretty straightforward method that calls a .NET BCL (Base Class Library) method and filters the result with LINQ. Of course, when any function calls another one that you can't see inside (which is basically always) you've lost control. We have no idea what's going on in GetProcessesByName.

Let's look at the source to the .NET Framework method in Reflector. Our method calls Process.GetProcessesByName(string).

public static Process[] GetProcessesByName(string processName)
{
return GetProcessesByName(processName, ".");
}

Looks like this one is an overload that passes "." into the next method Process.GetProcessesByName(string, string) where the second parameter is the machineName.

This next one gets all the processes for a machine (in our case, the local machine) then spins through them doing a compare on each one in order to build a result array to return up the chain.

public static Process[] GetProcessesByName(string processName, string machineName)
{
if (processName == null)
{
processName = string.Empty;
}
Process[] processes = GetProcesses(machineName);
ArrayList list = new ArrayList();
for (int i = 0; i < processes.Length; i++)
{
if (string.Equals(processName, processes[i].ProcessName, StringComparison.OrdinalIgnoreCase))
{
list.Add(processes[i]);
}
}
Process[] array = new Process[list.Count];
list.CopyTo(array, 0);
return array;
}

if we look inside GetProcesses(string), it's another loop. This is getting close to where .NET calls Win32 and as these classes are internal there's not much I can do to fix this function other than totally rewrite the internal implementation. However, I think I've illustrated that we've got at least two loops here, and more likely three or four.

public static Process[] GetProcesses(string machineName)
{
bool isRemoteMachine = ProcessManager.IsRemoteMachine(machineName);
ProcessInfo[] processInfos = ProcessManager.GetProcessInfos(machineName);
Process[] processArray = new Process[processInfos.Length];
for (int i = 0; i < processInfos.Length; i++)
{
ProcessInfo processInfo = processInfos[i];
processArray[i] = new Process(machineName, isRemoteMachine, processInfo.processId, processInfo);
}
return processArray;
}

This code is really typical of .NET circa 2002-2003 (not to mention Java, C++ and Pascal). Functions return arrays of stuff and other functions higher up filter and sort.

When using this .NET API and for looping over the results several times, I'm going for(), for(), for() in a chain, like O(4n) here.

Note: To be clear, it can be argued that O(4n) is just O(n), cause it is. Adding a number like I am isn't part of the O notation. I'm just saying we want to avoid O(cn) situations where c is a large enough number to affect perf.

image

Sometimes you'll see nested for()s like this, so O(n^3) here where things get messy fast.

Squares inside squares inside squares representing nested fors

LINQ is more significant than people really realize, I think. When it first came out some folks said "is that all?" I think that's unfortunate. LINQ and the concept of "deferred execution" is just so powerful but I think a number of .NET programmers just haven't taken the time to get their heads around the concept.

Here's a simple example juxtaposing spinning through a list vs. using yield. The array version is doing all the work up front, while the yield version can calculate. Imagine a GetFibonacci() method. A yield version could calculate values "just in time" and yield them, while an array version would have to pre-calculate and pre-allocate.

public void Consumer()
{
foreach (int i in IntegersList()) {
Console.WriteLine(i.ToString());
}

foreach (int i in IntegersYield()) {
Console.WriteLine(i.ToString());
}
}

public IEnumerable<int> IntegersYield()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}

public IEnumerable<int> IntegersList()
{
return new int[] { 1, 2, 4, 8, 16, 16777216 };
}

Back to our GetProcess example. There's two issues at play here.

First, the underlying implementation where GetProcessesInfos eventually gets called is a bummer but it's that way because of how P/Invoke works and how the underlying Win32 API returns the data we need. It would certainly be nice if the underlying API was more granular. But that's less interesting to me than the larger meta-issue of a having (or in this case, not having) a LINQ-friendly API.

The second and more interesting issue (in my option) is the idea that the 2002-era .NET Base Class Library isn't really setup for LINQ-friendliness. None of the APIs return LINQ-friendly stuff or IEnumerable<anything> so that when you change together filters and filters of filters of arrays you end up with O(cn) issues as opposed to nice deferred LINQ chains.

When you find yourself returning arrays of arrays of arrays of other stuff while looping and filtering and sorting, you'll want to be aware of what's going on and consider that you might be looping inefficiently and it might be time for LINQ and deferred execution.

image

Here's a simple conversion attempt to change the first implementation from this classic "Array/List" style:

ArrayList list = new ArrayList();
for (int i = 0; i < processes.Length; i++)
{
if (string.Equals(processName, processes[i].ProcessName, StringComparison.OrdinalIgnoreCase))
{
list.Add(processes[i]);
}
}
Process[] array = new Process[list.Count];
list.CopyTo(array, 0);
return array;

To this more LINQy way. Note that returning from a LINQ query defers execution as LINQ is chainable. We want to assemble a chain of sorting and filtering operations and execute them ONCE rather than for()ing over many lists many times.

if (processName == null) { processName = string.Empty; }

Process[] processes = Process.GetProcesses(machineName); //stop here...can't go farther?

return from p in processes
where String.Equals(p.ProcessName, processName, StringComparison.OrdinalIgnoreCase)
select p; //the value of the LINQ expression being returned is an IEnumerable<Process> object that uses "yield return" under the hood
Here's the whole thing in a sample program.
static void Main(string[] args)
{
var myList = GetProcessesForSession("chrome.exe", 1);
}

public static IEnumerable<Process> GetProcessesForSession(string processName, int sessionID)
{
//var processes = Process.GetProcessesByName(processName);
var processes = HanselGetProcessesByName(processName); //my LINQy implementation
var filtered = from p in processes
where p.SessionId == sessionID
select p;
return filtered;
}

private static IEnumerable<Process> HanselGetProcessesByName(string processName)
{
return HanselGetProcessesByName(processName, ".");
}

private static IEnumerable<Process> HanselGetProcessesByName(string processName, string machineName)
{
if (processName == null)
{
processName = string.Empty;
}
Process[] processes = Process.GetProcesses(machineName); //can't refactor farther because of internals.

//"the value of the LINQ expression being returned is an IEnumerable<Process> object that uses "yield return" under the hood" (thanks Mehrdad!)

return from p in processes where String.Equals(p.ProcessName == processName, StringComparison.OrdinalIgnoreCase) select p;

/* the stuff above replaces the stuff below */
//ArrayList list = new ArrayList();
//for (int i = 0; i < processes.Length; i++)
//{
// if (string.Equals(processName, processes[i].ProcessName, StringComparison.OrdinalIgnoreCase))
// {
// list.Add(processes[i]);
// }
//}
//Process[] array = new Process[list.Count];
//list.CopyTo(array, 0);
//return array;
}

This is a really interesting topic to me and I'm interested in your opinion as well, Dear Reader. As parts of the .NET Framework are being extended to include support for asynchronous operations, I'm wondering if there are other places in the BCL that should be updated to be more LINQ friendly. Or, perhaps it's not an issue at all.

Your thoughts?

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web

The Weekly Source Code 56 - Visual Studio 2010 and .NET Framework 4 Training Kit - Code Contracts, Parallel Framework and COM Interop

August 12, '10 Comments [11] Posted in ASP.NET | ASP.NET Ajax | ASP.NET Dynamic Data | ASP.NET MVC | BCL | Learning .NET | LINQ | OData | Open Source | Programming | Source Code | VB | Web Services | Win7 | Windows Client | WPF
Sponsored By

Do you like a big pile of source code? Well, there is an imperial buttload of source in the Visual Studio 2010 and .NET Framework 4 Training Kit. It's actually a 178 meg download, which is insane. Perhaps start your download now and get it in the morning when you get up. It's extremely well put together and I say Kudos to the folks that did it. They are better people than I.

I like to explore it while watching TV myself and found myself looking through tonight. I checked my blog and while I thought I'd shared this with you before, Dear Reader, I hadn't. My bad, because it's pure gold. With C# and VB, natch.

Here's an outline of what's inside. I've heard of folks setting up lunch-time study groups and going through each section.

C# 4 Visual Basic 10 
F# Parallel Extensions
Windows Communication Foundation Windows Workflow
Windows Presentation Foundation ASP.NET 4
Windows 7 Entity Framework
ADO.NET Data Services (OData) Managed Extensibility Framework
Visual Studio Team System RIA Services
Office Development  

I love using this kit in my talks, and used it a lot in my Lap Around .NET 4 talk.

There's Labs, Presentations, Demos, Labs and links to online Videos. It'll walk you step by step through loads of content and is a great starter if you're getting into what's new in .NET 4.

Here's a few of my favorite bits, and they aren't the parts you hear the marketing folks gabbing about.

Code Contracts

Remember the old coding adage to "Assert Your Expectations?" Well, sometimes Debug.Assert is either inappropriate or cumbersome and what you really need is a method contract. Methods have names and parameters, and those are contracts. Now they can have conditions like "don't even bother calling this method unless userId is greater than or equal to 0 and make sure the result isn't null!

Code Contracts continues to be revised, with a new version out just last month for both 2008 and 2010. The core types that you need are included in mscorlib with .NET 4.0, but you do need to download the tools to see them inside Visual Studio. If you have VS Pro, you'll get runtime checking and VS Ultimate gets that plus static checking. If I have static checking and the tools I'll see a nice new tab in Project Properties:

Code Contracts Properties Tab in Visual Studio

I can even get Blue Squigglies for Contract Violations as seen below.

A blue squigglie showing that a contract isn't satisfied

As a nice coincidence, you can go and download Chapter 15 of Jon Skeet's C# in Depth for free which happens to be on Code Contracts.

Here's a basic idea of what it looks like. If you have static analysis, you'll get squiggles on the lines I've highlighted as they are points where the Contract isn't being fulfilled. Otherwise you'll get a runtime ContractException. Code Contracts are a great tool when used in conjunction with Test Driven Development.

using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics.Contracts;

namespace ContractsDemo
{
[ContractVerification(true)]
class Program
{
static void Main(string[] args)
{
var password = GetPassword(-1);
Console.WriteLine(password.Length);
Console.ReadKey();
}

#region Header
/// <param name="userId">Should be greater than 0</param>
/// <returns>non-null string</returns>
#endregion
static string GetPassword(int userId)
{
Contract.Requires(userId >= 0, "UserId must be");
Contract.Ensures(Contract.Result<string>() != null);

if (userId == 0)
{
// Made some code to log behavior

// User doesn't exist
return null;
}
else if (userId > 0)
{
return "Password";
}

return null;
}
}
}

COM Interop sucks WAY less in .NET 4

I did a lot of COM Interop back in the day and it sucked. It wasn't fun and you always felt when you were leaving managed code and entering COM. You'd have to use Primary Interop Assemblies or PIAs and they were, well, PIAs. I talked about this a little bit last year in Beta 1, but it changed and got simpler in .NET 4 release.

Here's a nice little sample I use from the kit that gets the Processes on your system and then makes a list with LINQ of the big ones, makes a chart in Excel, then pastes the chart into Word.

If you've used Office Automation from managed code before, notice that you can say Range[] now, and not get_range(). You can call COM methods like ChartWizard with named parameters, and without including Type.Missing fifteen times. As an aside, notice also the default parameter value on the method.

static void GenerateChart(bool copyToWord = false)
{
var excel = new Excel.Application();
excel.Visible = true;
excel.Workbooks.Add();

excel.Range["A1"].Value2 = "Process Name";
excel.Range["B1"].Value2 = "Memory Usage";

var processes = Process.GetProcesses()
.OrderByDescending(p => p.WorkingSet64)
.Take(10);
int i = 2;
foreach (var p in processes)
{
excel.Range["A" + i].Value2 = p.ProcessName;
excel.Range["B" + i].Value2 = p.WorkingSet64;
i++;
}

Excel.Range range = excel.Range["A1"];
Excel.Chart chart = (Excel.Chart)excel.ActiveWorkbook.Charts.Add(
After: excel.ActiveSheet);

chart.ChartWizard(Source: range.CurrentRegion,
Title: "Memory Usage in " + Environment.MachineName);

chart.ChartStyle = 45;
chart.CopyPicture(Excel.XlPictureAppearance.xlScreen,
Excel.XlCopyPictureFormat.xlBitmap,
Excel.XlPictureAppearance.xlScreen);

if (copyToWord)
{
var word = new Word.Application();
word.Visible = true;
word.Documents.Add();

word.Selection.Paste();
}
}

You can also embed your PIAs in your assemblies rather than carrying them around and the runtime will use Type Equivalence to figure out that your embedded types are the same types it needs and it'll just work. One less thing to deploy.

Parallel Extensions

The #1 reason, IMHO, to look at .NET 4 is the parallelism. I say this not as a Microsoft Shill, but rather as a dude who owns a 6-core (12 with hyper-threading) processor. My most favorite app in the Training Kit is ContosoAutomotive. It's a little WPF app that loads a few hundred thousand cars into a grid. There's an interface, ICarQuery, that a bunch of plugins implement, and the app foreach's over the CarQueries.

This snippet here uses the new System.Threading.Task stuff and makes a background task. That's all one line there, from StartNew() all the way to the bottom. It says, "do this chunk in the background." and it's a wonderfully natural and fluent interface. It also keeps your UI thread painting so your app doesn't freeze up with that "curtain of not responding" that one sees all the time.

private void RunQueries()
{
this.DisableSearch();
Task.Factory.StartNew(() =>
{
this.BeginTiming();
foreach (var query in this.CarQueries)
{
if (this.searchOperation.Token.IsCancellationRequested)
{
return;
}

query.Run(this.cars, true);
};
this.EndSequentialTiming();
}, this.searchOperation.Token).ContinueWith(_ => this.EnableSearch());
}

StartNew() also has a cancellation token that we check, in case someone clicked Cancel midway through, and there's a ContinueWith at the end that re-enables or disabled Search button.

Here's my system with the queries running. This is all in memory, generating and querying random cars.12% CPU across 12 processors single threaded

And the app says it took 2.3 seconds. OK, what if I do this in parallel, using all the processors?

2.389 seconds serially

Here's the changed code. Now we have a Parallel.ForEach instead. Mostly looks the same.

private void RunQueriesInParallel()
{
this.DisableSearch();
Task.Factory.StartNew(() =>
{
try
{
this.BeginTiming();
var options = new ParallelOptions() { CancellationToken = this.searchOperation.Token };
Parallel.ForEach(this.CarQueries, options, (query) =>
{
query.Run(this.cars, true);
});
this.EndParallelTiming();
}
catch (OperationCanceledException) { /* Do nothing as we cancelled it */ }
}, this.searchOperation.Token).ContinueWith(_ => this.EnableSearch());
}

This code says "go do this in a background thread, and while you're there, parallelize this as you like." This loop is "embarrassingly parallel." It's a big for loop over 2 million cars in memory. No reason it can't be broken apart and made faster.

Here's the deal, though. It was SO fast, that Task Manager didn't update fast enough to show the work. The work was too easy. You can see it used more CPU and that there was a spike of load across 10 of the 12, but the work wasn't enough to peg the processors.

19% load across 12 processors 

Did it even make a difference? Seems it was 5x faster and went from 2.389s to 0.4699 seconds. That's embarrassingly parallel. The team likes to call that "delightfully parallel" but I prefer "you're-an-idiot-for-not-doing-this-in-parallel parallel," but that was rejected.

0.4699 seconds when run in parallel. A 5x speedup.

Let's try something harder. How about a large analysis of Baby Names. How many Roberts born in the state of Washington over a 40 year period from a 500MB database?

Here's the normal single-threaded foreach version in Task Manager:

One processor chilling.

Here's the parallel version using 96% CPU.

6 processes working hard!

And here's the timing. Looks like the difference between 20 seconds and under 4 seconds.

PLINQ Demo

You can try this yourself. Notice the processor slider bar there at the bottom.

ProcessorsToUse.Minimum = 1;
ProcessorsToUse.Maximum = Environment.ProcessorCount;
ProcessorsToUse.Value = Environment.ProcessorCount; // Use all processors.

This sample uses "Parallel LINQ" and here's the two queries. Notice the "WithDegreeofParallelism."

seqQuery = from n in names
where n.Name.Equals(queryInfo.Name, StringComparison.InvariantCultureIgnoreCase) &&
n.State == queryInfo.State &&
n.Year >= yearStart && n.Year <= yearEnd
orderby n.Year ascending
select n;

parQuery = from n in names.AsParallel().WithDegreeOfParallelism(ProcessorsToUse.Value)
where n.Name.Equals(queryInfo.Name, StringComparison.InvariantCultureIgnoreCase) &&
n.State == queryInfo.State &&
n.Year >= yearStart && n.Year <= yearEnd
orderby n.Year ascending
select n;

The .NET 4 Training Kit has Extensibility demos, and Office Demos and SharePoint Demos and Data Access Demos and on and on. It's great fun and it's a classroom in a box. I encourage you to go download it and use it as a teaching tool at your company or school. You could do brown bags, study groups, presentations (there's lots of PPTs), labs and more.

Hope you enjoy it as much as I do.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web

The Weekly Source Code 52 - You keep using that LINQ, I dunna think it means what you think it means.

June 18, '10 Comments [25] Posted in ASP.NET | Data | Learning .NET | LINQ | Source Code
Sponsored By

Remember good developers don't just write source code, they also READ it. You don't just become a great poet by writing lots of poems. Read and absorb as well. Do check out the Source Code category of my blog here, there is (as of today) 15 pages of posts on Source Code you can check out.

Recently my friend Jonathan Carter (OData Dude, my name for him) was working with a partner on some really weird stuff that was happening with a LINQ to SQL query. Remember that every abstraction sometimes leaks and that the whole port of an abstraction is "raise the level" so you don't have to worry about something.

Plumbing is great because it abstracts away water delivery. For all I know, there's a dude with a bucket who runs to my house when I turn on the tap. Doesn't matter to me, as long as I get water. However, sometimes something goes wrong with that dude, and I don't understand what's up with my water. This happened to JC and this partner.

In this example, we're using the AdventureWorks Sample Database to make this point. Here's some sample code the partner sent us to reproduce the weirdness.

protected virtual Customer GetByPrimaryKey(Func<customer, bool> keySelection)
{
AdventureWorksDataContext context = new AdventureWorksDataContext();

return (from r in context.Customers select r).SingleOrDefault(keySelection);
}

[TestMethod]
public void CustomerQuery_Test_01()
{
Customer customer = GetByPrimaryKey(c => c.CustomerID == 2);
}

[TestMethod]
public void CustomerQuery_Test_02()
{
AdventureWorksDataContext context = new AdventureWorksDataContext();
Customer customer = (from r in context.Customers select r).SingleOrDefault(c => c.CustomerID == 2);
}

CustomerQuery_Test_01 calls the GetByPrimaryKey method. That method takes a Func as a parameter. He's actually passing in a lamdba expression into the GetByPrimaryKey function. That makes the method reusable and is the beginning of some nice helper functions for his DAL (Data Access Layer). He's split up the query into two places. Seems reasonable, right?

Well, if you run this in Visual Studio - and in this example, I'll use the Intellitrace feature to see the actual SQL that was executed, although you can also use SQL Profiler - we see:

Wrong SQL in the Watch Window

Here's the query in text:

SELECT [t0].[CustomerID], [t0].[NameStyle], [t0].[Title], 
[t0].[FirstName], [t0].[MiddleName], [t0].[LastName],
[t0].[Suffix], [t0].[CompanyName], [t0].[SalesPerson],
[t0].[EmailAddress], [t0].[Phone], [t0].[PasswordHash],
[t0].[PasswordSalt], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Customer] AS [t0]

Um, where's the WHERE clause? Will LINQ to SQL kill my pets and cause me to lose my job? Does Microsoft suck? Let's take a look at the second query, called in CustomerQuery_Test_02():

SELECT [t0].[CustomerID], [t0].[NameStyle], [t0].[Title], 
[t0].[FirstName], [t0].[MiddleName], [t0].[LastName],
[t0].[Suffix], [t0].[CompanyName], [t0].[SalesPerson],
[t0].[EmailAddress], [t0].[Phone], [t0].[PasswordHash],
[t0].[PasswordSalt], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Customer] AS [t0]
WHERE [t0].[CustomerID] = @p0

OK, there it is, but why does the second LINQ query cause a WHERE clause to be emitted but the first doesn't? They look like basically the same code path, just one is broken up.

The first query is clearly returning ALL rows to the caller, which then has to apply the LINQ operators to do the WHERE in memory, on the caller. The second query is using the SQL Server (as it should) to do the filter, then returns WAY less data.

Here's the deal. Remember that LINQ cares about two things, IEnumerable stuff and IQueryable. The first lets you foreach over a collection, and the other includes all sorts of fun stuff that lets you query that stuff. Folks build on top of those with LINQ to SQL, LINQ to XML, LINQ to YoMomma, etc.

When you are working with something that is IQueryable; that is, the source is IQueryable, you need to make sure you are actually usually the operators for an IQueruable, otherwise you might fall back onto an undesirable result, as in this database case with IEnumerable. You don't want to return more data from the database to a caller than is absolutely necessary.

From JC, with emphasis mine:

The IQueryable version of SingleOrDefault, that takes a lambda, actually takes an Expression>, whereas the IEnumerable version, takes a Func. Hence, in the below code, the call to SingleOrDefault, is treating the query as if it was LINQ To Objects, which executes the query via L2S, then performs the SingleOrDefault on the in memory collection. If they changed the signature of GetByPrimaryKey to take an Expression>, it would work as expected.

What's a Func and what's an Expression? A Func<> (pronounced "Funk") represents a generic delegate. Like:

Func<int,int,double> divide=(x,y)=>(double)x/(double)y;
Console.WriteLine(divide(2,3));

And an Expression<> is a function definition that can be compiled and invoked at runtime. Example"

Expression<Func<int,int,double>> divideBody=(x,y)=>(double)x/(double)y;
Func<int,int,double> divide2=divideBody.Compile();
write(divide2(2,3));

So, the partner doesn't want a Func (a Func that takes a customer and returns a bool, they want a compliable Expression with a Func that takes a Customer and returns a bool. I'll have to add "using System.Linq.Expressions;" as well.

protected virtual Customer GetByPrimaryKey(Expression<Func<customer,bool>> keySelection)
{
AdventureWorksDataContext context = new AdventureWorksDataContext();

return (from r in context.Customers select r).SingleOrDefault(keySelection);

}

[TestMethod]
public void CustomerQuery_Test_01()
{
Customer customer = GetByPrimaryKey(c => c.CustomerID == 2);
}

[TestMethod]
public void CustomerQuery_Test_02()
{
AdventureWorksDataContext context = new AdventureWorksDataContext();
Customer customer = (from r in context.Customers select r).SingleOrDefault(c => c.CustomerID == 2);
}

Just changed that one line, so that GetByPrimaryKey takes a Expression> and I get the SQL I expected:

Corrected SQL in the Watch Window

Someone famous once said, "My code has no bugs, it runs exactly as I wrote it."

Layers of Abstraction are tricky, and you should always assert your assumptions and always look at the SQL that gets generated/created/executed by your DAL before you put something into production. Trust no one, except the profiler.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web

The Weekly Source Code 51 - Asynchronous Database Access and LINQ to SQL Fun

March 2, '10 Comments [25] Posted in ASP.NET | ASP.NET MVC | LINQ | Open Source | Source Code
Sponsored By

You can learn a lot by reading other people's source code. That's the idea behind this series, "The Weekly Source Code." You can certainly become a better programmer by writing code but I think good writers become better by reading as much as they can.

I was poking around in the WebFormsMVP project's code and noticed an interesting pattern.

You've seen code to get data from a database and retrieve it as an object, like this:

public Widget Find(int id)
{
Widget widget = null;
widget = (from w in _db.Widgets
where w.Id == id
select w).SingleOrDefault();
return widget;
}

This code is synchronous, meaning basically that it'll happen on the same thread and we'll wait around until it's finished. Now, here's an asynchronous version of the same code. It's a nice combination of the the new (LINQ, in this case, LINQ to SQL) and the older (DataReaders, etc). The LINQ (to SQL) query is in query, then they call GetCommand to get the underlying SqlCommand for that query. Then, they call BeginExecuteReader on the SqlCommand which starts asynchronous execution of that command.

SqlCommand _beginFindCmd = null;

public IAsyncResult BeginFind(int id, AsyncCallback callback, Object asyncState)
{
var query = from w in _db.Widgets
where w.Id == id
select w;
_beginFindCmd = _db.GetCommand(query) as SqlCommand;
_db.Connection.Open();
return _beginFindCmd.BeginExecuteReader(callback, asyncState, System.Data.CommandBehavior.CloseConnection);
}

public Widget EndFind(IAsyncResult result)
{
var rdr = _beginFindCmd.EndExecuteReader(result);
var widget = (from w in _db.Translate<Widget>(rdr)
select w).SingleOrDefault();
rdr.Close();
return widget;
}

When it's done, in this example, EndFind gets called and they call DataContext.Translate<T> passing in the type they want (Widget) and the source, the DataReader retrieved from EndExecuteReader. It's an asynchronous LINQ to SQL call.

I found it clever so I emailed my parallelism friend and expert Stephen Toub and asked him if this was any or all of the following:

a. clever

b. necessary

c. better done with PFX/TPL (Parallel Extensions to the .NET Framework/Task Parallel Library)

Stephen said, in his own get-down-to-business fashion:

a) It's a standard approach to converting a LINQ query to a command to be executed with more control over how it's executed.  That said, I don't see it done all that much, so in that capacity it's clever.

b) It's necessary to run the query asynchronously; otherwise, the call to MoveNext on the enumerator will block. And if ADO.NET's MARS support is used (multiple asynchronous result sets), you could have multiple outstanding operations in play.

c) TPL can't improve upon the interactions with SQL Server, i.e. BeginExecuteReader will still need to be called.  However, TPL can be used to wrap the call such that you get a Task<Widget> back, which might be a nicer API to consume.  Once you have it as a Task, you can do useful things like wait for it, schedule work for when its done, wait for multiple operations or schedule work when multiple operations are done, etc.

One other thing that's interesting, is the WebFormsMVP project's PageAsyncTaskManagerWrapper:

namespace WebFormsMvp.Web
{
/// <summary>
/// Represents a class that wraps the page's async task methods
/// </summary>
public class PageAsyncTaskManagerWrapper : IAsyncTaskManager
{
readonly Page page;

/// <summary />
public PageAsyncTaskManagerWrapper(Page page)
{
this.page = page;
}

/// <summary>
/// Starts the execution of an asynchronous task.
/// </summary>
public void ExecuteRegisteredAsyncTasks()
{
page.ExecuteRegisteredAsyncTasks();
}

/// <summary>
/// Registers a new asynchronous task with the page.
/// </summary>
/// <param name="beginHandler">The handler to call when beginning an asynchronous task.</param>
/// <param name="endHandler">The handler to call when the task is completed successfully within the time-out period.</param>
/// <param name="timeout">The handler to call when the task is not completed successfully within the time-out period.</param>
/// <param name="state">The object that represents the state of the task.</param>
/// <param name="executeInParallel">The vlaue that indicates whether the task can be executed in parallel with other tasks.</param>
public void RegisterAsyncTask(BeginEventHandler beginHandler, EndEventHandler endHandler, EndEventHandler timeout, object state, bool executeInParallel)
{
page.RegisterAsyncTask(new PageAsyncTask(beginHandler, endHandler, timeout, state, executeInParallel));
}
}
}

They made a nice wrapper for these existing System.Web.UI.Page methods and they use it like this, combined with the asynchronous LINQ to SQL from earlier:

AsyncManager.RegisterAsyncTask(
(asyncSender, ea, callback, state) => // Begin
{
return widgetRepository.BeginFindByName(e.Name, callback, state);
},
result => // End
{
var widget = widgetRepository.EndFindByName(result);
if (widget != null)
{
View.Model.Widgets.Add(widget);
}
},
result => { } // Timeout
, null, false);
AsyncManager.ExecuteRegisteredAsyncTasks();

They fire off their task, which then does its database work asynchronously, and then it all comes together.

I'll leave (for now) the wrapping of the APIs to return a Task<TResult> as an exercise for the reader, but it'd be nice to see if this pattern can benefit from the Task Parallel Library or not.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Page 1 of 5 in the LINQ category Next Page

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.