Scott Hanselman

The Weekly Source Code 51 - Asynchronous Database Access and LINQ to SQL Fun

March 03, 2010 Comment on this post [25] Posted in ASP.NET | ASP.NET MVC | LINQ | Open Source | Source Code
Sponsored By

You can learn a lot by reading other people's source code. That's the idea behind this series, "The Weekly Source Code." You can certainly become a better programmer by writing code but I think good writers become better by reading as much as they can.

I was poking around in the WebFormsMVP project's code and noticed an interesting pattern.

You've seen code to get data from a database and retrieve it as an object, like this:

public Widget Find(int id)
{
Widget widget = null;
widget = (from w in _db.Widgets
where w.Id == id
select w).SingleOrDefault();
return widget;
}

This code is synchronous, meaning basically that it'll happen on the same thread and we'll wait around until it's finished. Now, here's an asynchronous version of the same code. It's a nice combination of the the new (LINQ, in this case, LINQ to SQL) and the older (DataReaders, etc). The LINQ (to SQL) query is in query, then they call GetCommand to get the underlying SqlCommand for that query. Then, they call BeginExecuteReader on the SqlCommand which starts asynchronous execution of that command.

SqlCommand _beginFindCmd = null;

public IAsyncResult BeginFind(int id, AsyncCallback callback, Object asyncState)
{
var query = from w in _db.Widgets
where w.Id == id
select w;
_beginFindCmd = _db.GetCommand(query) as SqlCommand;
_db.Connection.Open();
return _beginFindCmd.BeginExecuteReader(callback, asyncState, System.Data.CommandBehavior.CloseConnection);
}

public Widget EndFind(IAsyncResult result)
{
var rdr = _beginFindCmd.EndExecuteReader(result);
var widget = (from w in _db.Translate<Widget>(rdr)
select w).SingleOrDefault();
rdr.Close();
return widget;
}

When it's done, in this example, EndFind gets called and they call DataContext.Translate<T> passing in the type they want (Widget) and the source, the DataReader retrieved from EndExecuteReader. It's an asynchronous LINQ to SQL call.

I found it clever so I emailed my parallelism friend and expert Stephen Toub and asked him if this was any or all of the following:

a. clever

b. necessary

c. better done with PFX/TPL (Parallel Extensions to the .NET Framework/Task Parallel Library)

Stephen said, in his own get-down-to-business fashion:

a) It's a standard approach to converting a LINQ query to a command to be executed with more control over how it's executed.  That said, I don't see it done all that much, so in that capacity it's clever.

b) It's necessary to run the query asynchronously; otherwise, the call to MoveNext on the enumerator will block. And if ADO.NET's MARS support is used (multiple asynchronous result sets), you could have multiple outstanding operations in play.

c) TPL can't improve upon the interactions with SQL Server, i.e. BeginExecuteReader will still need to be called.  However, TPL can be used to wrap the call such that you get a Task<Widget> back, which might be a nicer API to consume.  Once you have it as a Task, you can do useful things like wait for it, schedule work for when its done, wait for multiple operations or schedule work when multiple operations are done, etc.

One other thing that's interesting, is the WebFormsMVP project's PageAsyncTaskManagerWrapper:

namespace WebFormsMvp.Web
{
/// <summary>
/// Represents a class that wraps the page's async task methods
/// </summary>
public class PageAsyncTaskManagerWrapper : IAsyncTaskManager
{
readonly Page page;

/// <summary />
public PageAsyncTaskManagerWrapper(Page page)
{
this.page = page;
}

/// <summary>
/// Starts the execution of an asynchronous task.
/// </summary>
public void ExecuteRegisteredAsyncTasks()
{
page.ExecuteRegisteredAsyncTasks();
}

/// <summary>
/// Registers a new asynchronous task with the page.
/// </summary>
/// <param name="beginHandler">The handler to call when beginning an asynchronous task.</param>
/// <param name="endHandler">The handler to call when the task is completed successfully within the time-out period.</param>
/// <param name="timeout">The handler to call when the task is not completed successfully within the time-out period.</param>
/// <param name="state">The object that represents the state of the task.</param>
/// <param name="executeInParallel">The vlaue that indicates whether the task can be executed in parallel with other tasks.</param>
public void RegisterAsyncTask(BeginEventHandler beginHandler, EndEventHandler endHandler, EndEventHandler timeout, object state, bool executeInParallel)
{
page.RegisterAsyncTask(new PageAsyncTask(beginHandler, endHandler, timeout, state, executeInParallel));
}
}
}

They made a nice wrapper for these existing System.Web.UI.Page methods and they use it like this, combined with the asynchronous LINQ to SQL from earlier:

AsyncManager.RegisterAsyncTask(
(asyncSender, ea, callback, state) => // Begin
{
return widgetRepository.BeginFindByName(e.Name, callback, state);
},
result => // End
{
var widget = widgetRepository.EndFindByName(result);
if (widget != null)
{
View.Model.Widgets.Add(widget);
}
},
result => { } // Timeout
, null, false);
AsyncManager.ExecuteRegisteredAsyncTasks();

They fire off their task, which then does its database work asynchronously, and then it all comes together.

I'll leave (for now) the wrapping of the APIs to return a Task<TResult> as an exercise for the reader, but it'd be nice to see if this pattern can benefit from the Task Parallel Library or not.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Hosting By
Hosted in an Azure App Service
March 03, 2010 1:17
Quite clever! Have to check this tomorrow at work ... will fit nicely in my Umbraco usercontrols
March 03, 2010 2:11
Clever? Yes. Useful? On that point I'm not convinced. Typically when I parallelize code I look for the largest independent quantum of work and start there. In a webserver scenario I would define that as everything between the beginning of the request through to the end of the response. On a modern server with fast context switching, infinite RAM, and an ever-increasing number of cores, if a thread blocks [for a reasonable amount of time, as is usually the case with DB queries such as the one in your example] what's the harm? Just grab another from the pool. I guess I shouldn't speak without being more familiar with the WebFormsMVP project and the specific problem they're trying to solve here, but I'm always wary of building out architecture just for architecture's sake.

That said, if you still want to go asynchronous then the Rx framework might be very useful in this scenario.
March 03, 2010 2:36
This isn't about parallel code as much as it is about not monopolizing a ThreadPool thread when all you're doing is waiting on IO from the database server. Using this technique, the current thread is returned to the pool and can be used to service other requests while we wait on the database. When we get a response from the database, we get a new thread out of the pool and continue processing. This may not make an individual request any faster, but it ought to help with scalability, as we're no longer using a thread to wait on IO.

The TPL is more about computation work than IO, so I'm not sure to what extent it would make this pattern easier to use - but it's a perfect fit for F#'s asyncronous computation expressions, and Async.FromBeginEnd.
March 03, 2010 2:37
I'm with Mike here... Clever, but where's the real benefit (in this scenario)? Am I missing something?
March 03, 2010 3:36
Can you tell us how do you format code (color scheme and line number) block on this blog? Thanks
March 03, 2010 3:52
@"Green Square" -- I agree with what you say given three assumptions: 1. Said thread pool is fixed in size, and 2. the database query in question has already been completely optimized and still requires a non-negligible amount of processing time, and 3. performance metrics show that the fixed-size thread pool is consistently running out of threads given your traffic load. Unless all these are true I think time could be better spent better optimizing the DB query and/or caching its results in RAM. Of course I need to be cautious of sweeping overgeneralizations--there might be some particular problem the WebFormsMVP guys are trying to solve here.
March 03, 2010 4:09
We do this in the demo project to demonstrate our support for async page tasks in a testable fashion. Of course, like any performance oriented optimization, this should only be done once you've profiled and determined that you can actually benefit from async IO operations. This is generally the case when making expensive database calls (which of course can sometimes benefit from caching), or calling out to external web services where the wait time can be measured in seconds. Moving those calls to async IO threads and async page tasks reduces the chance of you starving the worker thread pool that ASP.NET uses to respond to incoming requests, and thus having your app start to refuse connections.
March 03, 2010 4:30
The before code could have been a lot simpler:
public Widget Find(int id) {
return _db.Widgets.SingleOrDefault(w => w.Id == id);
}


The replacement could probably be simpler too - will take a look when I get a few minutes.

[)amien
March 03, 2010 4:57
@Damien:
Absolutely. I wanted to mention the same thing you mentioned :-) but I saw your comment.
I really hate to use the From ... In keywords in LINQ these days.
March 03, 2010 5:01
They do have a site up that is taking a million hits using this MVP project. My question is if your server was not under heavy load the connection should be made the work done and closed. If there was the need for it to go async. then it should just kick in if it's there therefor if that's the case it's clever unless I'm missing something
March 03, 2010 7:10
I think I missed something. That code seems to be quite complicated for what we want to achieve and in case in were we want to use the same code sync and async we would need to implementations.

I would rather go for something easier using the new Task class:

System.Threading.Tasks.Task<int> t1 = new System.Threading.Tasks.Task<int>(DoSomething);
t1.Start();
System.Threading.Tasks.Task<int> t2 = new System.Threading.Tasks.Task<int>(DoSomethingElse);
t2.Start();

System.Threading.Tasks.Task.WaitAll(new[] { t1, t2 });

t1.Result & t2.Result ready to use
March 03, 2010 10:00
I agree about reading other peoples code to learn more. I believe that I got where I am today by consuming/using/reading open source projects and so on.

It would be great if you could do this more often! This was very interesting.
March 03, 2010 14:07
I am still not able to wrap my heads around why would I need to do Async in Web scenario ? I will be able to serve up response to the browser only when all is done. So what will I gain here?

Example : I have an Action that gets all the Todo list and renders them in a view. Will I gain any benefit if I do it in parallel ?

P.S : Idea is definetly clever, but I am debating in which scenario should we use it in web environment ?
March 03, 2010 17:38
I wrote a fluent framework that makes calling Linq to Sql queries asynchronously very pretty:

void CustomersAsync()
{
Data.NwindDb db = new Data.NWindDb();
db.QueryAsync( db.Customers.Where(c=>c.PostalCode = "98053") )
.OnCompleted( Customers_Completed )
.Begin();
}
void Customers_Completed(object sender, AsyncQuery e)
{
if( e.Error != null ) return;
Data.Customer cust = e.Read<Data.Customer>().FirstOrDefault();
Debug.Print("Retrieved customer Id {0}", cust.Id);
}

It also adds extension methods to DataContext like MultiQueryAsync where you can pass an SqlCommand or a string to execute a stored procedure that returns multiple readers that get translated to Linq objects with the call to Read<T>(). In the completed method, you just called Read<T>() multiple times to read the multiple recordsets. The AsyncQuery disposes of the DataContext for you after it's Completed event is handled (if the AsyncQuery.AutoDispose property is left to it's default of true).
March 03, 2010 17:43
Scott,

Nice example. I extracted the first part of the sample and created an ExecuteAsync<T>(query, Action<IEnumerable<T>> onReady) extension method to the DataContext class to make life a little bit easier. You can find a sample and the source code over here.

Cheers,

Wesley
March 04, 2010 2:32
Any Suggesions using workflow with asp.net Mvc
March 04, 2010 9:48
@JeroenH

I think the point here is that it's very useful when you want to load up a lot of data without blocking the executing thread. In a web application, this has great potential when your application first starts up - you could load up all your lookup/cached data using this technique, without having to worry about a massive startup time.

Excellent post, thanks Scott - I actually didn't know you could do this!
March 05, 2010 6:52
Hi Scott
I wonder if the same works with Entity Framework instead of L2S ?
Thank you,
March 05, 2010 8:57
For those asking why you would to do this in a web scenario, it has nothing to do with the performance of a single request (although if you are making multiple IO calls then this could make it faster thanks to parallelism), and all to do with increasing the amount of simultaneous requests your app/server can handle. It only applies if you have CPU to spare and have a high amount of request threads waiting on responses from IO operations, e.g. calls to DBs, web-services or the file system.

For more information read this great article and watch this great session from Tech.Ed US a few years back.
March 09, 2010 17:17
Very useful in my current project.. Thanks for sharing.
April 23, 2010 11:00
Hi,

I've made a simple IQueryable<T> extension method Async() that easily allows you to run any query async on the threadpool. It is described here.
May 06, 2010 8:27
First, Scott - I'm a big fan - been listening to your podcast for a couple of years now.

Second, Damian - Thanks for your articulate response on the benefits/reasons for performing async tasks in the web scenario. I totally get the benefit of releasing the current thread back to the pool for other client request processing.

However, where I work, we're using web services as app servers behind the DMZ (4-tiered solution), and it seems to me that ws web methods would benefit from the same idea of a PageAsyncTask framework. However, since it's only a web "method," the synchronization is really left up to us, the web method developer, right? It looks like we just have to block the thread and wait for all tasks, parallel or not, to finish before continuing with returning the result to the caller. This doesn't really release any threads back to the pool on the ws side, does it? It just seems there would be benefit to releasing threads back to the pool in the ws tier, too, if we're going to be waiting on IO bound ops. After all, my ws's need to remain scalable to handle "client" requests from my asp.net web app, right?

Any thoughts on this?
May 07, 2010 0:54
Eric - this response from Damian:


ASP.NET supports this too, just uses a slightly different pattern: http://msdn.microsoft.com/en-us/library/aa480516.aspx.

Basically if you turn your single [WebMethod] declaration into separate BeginXXX and EndXXX [WebMethod] declarations, it all magically works. Of course to get the real benefit you need to ensure that the IAsyncResult you’re returning from the BeginXXX method is waiting on an IO thread (DB or web service call) and not some arbitrary worker thread you created yourself, otherwise you’re just stealing from the same worker pool of threads that ASP.NET uses to service requests.
May 07, 2010 8:34
Wow - that looks like it goes all the way back to 1.0 or 1.1. Awesome! Thanks for the response guys.
November 20, 2010 18:17
Some people are asking why.

Let's say you have several database calls that must be executed on PageLoad.

Call1()
Call2()
Call3()

This is going to execute Call1, wait till return, then Call2, wait for return, then Call3 wait for return.

If these are called Asynchronously then all 3 can be called at the same time.

Keep in mind, for the webpage, it still wait for all three to be returned in order to continue. (This is why above it's mentioned that you wouldn't do this for just one call)

Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.