Scott Hanselman

Solving the Shakespeare Million Monkeys Problem in Real-time with Parallelism and SignalR

November 12, '11 Comments [26] Posted in ASP.NET | Javascript | SignalR
Sponsored By

A monkey with a skull. Oh yes.A little over 18 months ago I was talking to Stephen Toub (he of the Parallel Computing fame) about parallelism and the kinds of problems it could solve.

I said, naively, "could we solve the million monkey's problem?"

He said, "the what?"

"You know, if you have an infinite number of monkeys and an infinite number of keyboards they will eventually write Shakespeare."

We brainstormed some ideas (since Stephen is a smarter than I, this consisted mostly of him gazing thoughtfully into the air while I sat on my hands) and eventually settled on an genetic algorithm. We would breed thousands of generations of (hypothetical) monkeys a second and then choose which ones would be allowed to perpetuate the species based solely on their ability to write Shakespeare.

We used the .NET 4 Task Parallel Library to make it easier for the algorithm to scale to available hardware. I mean, anyone can foreach over a million monkeys. But loops like that in parallel over 12 processors takes talent, right? Well, kinda. A lot of it is done for you by the Parallelism Features in .NET and that's the point. It's Parallel Processing for the Masses.

We created a WinForms version of this application and I've used it on and off to demonstrate parallel computing on .NET. Then Paul Batum and I went to the Keeping It Realtime conference to demonstrate SignalR this last week. I didn't want to do the same "here's a real-time chat app" or "here's a map that shows its results in real-time" demos that one always does at these kinds of things. I suggested that we port our WinForms Shakespeare Monkey demo to ASP.NET and SignalR and that's what Paul proceeded to do.

Looks like 80,000 generations of monkeys

When doing something that is crazy computationally intensive but also needs to return real-time results you might think to use node for the real-time notification part and perhaps spawn off another process and use C or something for the maths and then have them talk to each others. We like node and you can totally run node on IIS or even write node in WebMatrix. However, node is good at some things and .NET is good at some things.

For example, .NET is really good at CPU-bound computationally intensive stuff, like, I dunno, parallel matrix multiplication in F# or the like. ASP.NET is good at scaling web sites like Bing, or StackOverflow. You may not think IIS and ASP.NET when you think about real-time, but SignalR uses asynchronous handlers and smart techniques to get awesome scale when using long-polling and scales even more in our labs when using an efficient protocol like WebSockets with the new support for WebSockets in .NET 4.5.

So, we wanted to see if you combined asynchronous background work, use as many processors as you have, get real-time status updates via SignalR over long-polling or Web Sockets, using C#, .NET 4.5, ASP.NET and IIS.

It takes about 80,000 generations of monkeys at thousands of monkey generations a second (there's 200 monkeys per generation) to get the opening line of Hamlet. So that's ~16,000,000 monkeys just to get this much text. As they say, that's a lot of monkeys.

Here's the general idea of the app. The client is pretty lightweight and quite easy. There's two boxes, two buttons and a checkbox along side some text. There's some usual event wireup with started, cancelled, complete, and updateProgress, but see how those are on a monkey variable? That's from $.connection.monkeys. It could be $.connection.foo, of course, as long as it's hanging off $.connection.

Those functions are client side but we raise them from the server over the persistent connection then update some text.

<script src="Scripts/jquery-1.6.4.min.js"></script>    
<script src="Scripts/json2.min.js"></script>
<script src="Scripts/jquery.signalR.min.js"></script>
<script src="signalr/hubs"></script>
<script>
$(function () {
$('#targettext').val('To be or not to be, that is the question;\nWhether \'tis nobler in the mind to suffer\n\The slings and arrows of outrageous fortune,\n\Or to take arms against a sea of troubles,\n\And by opposing, end them.');

var monkeys = $.connection.monkeys,
currenttext = $('#currenttext'),
generationSpan = $('#generation'),
gpsSpan = $('#gps');

monkeys.updateProgress = function (text, generation, gps) {
currenttext.val(text);
generationSpan.text(generation);
gpsSpan.text(gps);
};

monkeys.started = function (target) {
$('#status').text('Working...');
$('#targettext').val(target);
$('#cancelbutton').removeAttr('disabled');
};

monkeys.cancelled = function () {
$('#status').text('Cancelled');
$('#cancelbutton').attr('disabled', 'disabled');
};

monkeys.complete = function () {
$('#status').text('Done');
$('#cancelbutton').attr('disabled', 'disabled');
};

$.connection.hub.start({}, function () {
$('#startbutton').click(function (event) {
$('#status').text('Queued...');
monkeys.startTyping($('#targettext').val(), $('#isparallel').is(':checked'));
});

$('#cancelbutton').click(function (event) {
monkeys.stopTyping();
});
});

});
</script>

The magic start with $.connection.hub.start. The hub client-side code is actually inside ~/signalr/hubs. See how that's include a the top? That client-side proxy is generated based on what hub or hubs are on the server side.

The server side is structured like this:

[HubName("monkeys")]
public class MonkeyHub : Hub
{
public void StartTyping(string targetText, bool parallel)
{
}

public void StopTyping()
{
}

}

The StartTyping and StopTyping .NET methods are callable from the client-side via the monkeys JavaScript object. So you can call server-side C# from the client-side JavaScript and from the C# server you can call methods in JavaScript on the client. It'll make the most sense if you debug it and watch the traffic on the wire. The point is that C# and Json objects can flow back and forth which blurs the line nicely between client and server. It's all convention over configuration. That's how we talk between client and server. Now, what about those monkeys?

You can check out the code in full, but StartTyping is the kick off point. Note how it's reporting back to the Hub (calling back to the client) constantly. Paul is using Hub.GetClients to talk to all connected clients as broadcast. This current implementation allows just one monkey job at a time. Other clients that connect will see the job in progress.

public void StartTyping(string targetText, bool parallel)
{
var settings = new GeneticAlgorithmSettings { PopulationSize = 200 };
var token = cancellation.Token;


currentTask = currentTask.ContinueWith((previous) =>
{
// Create the new genetic algorithm
var ga = new TextMatchGeneticAlgorithm(parallel, targetText, settings);
TextMatchGenome? bestGenome = null;
DateTime startedAt = DateTime.Now;

Hub.GetClients<MonkeyHub>().started(targetText);

// Iterate until a solution is found or until cancellation is requested
for (int generation = 1; ; generation++)
{
if (token.IsCancellationRequested)
{
Hub.GetClients<MonkeyHub>().cancelled();
break;
}

// Move to the next generation
ga.MoveNext();

// If we've found the best solution thus far, update the UI
if (bestGenome == null ||
ga.CurrentBest.Fitness < bestGenome.Value.Fitness)
{
bestGenome = ga.CurrentBest;

int generationsPerSecond = generation / Math.Max(1, (int)((DateTime.Now - startedAt).TotalSeconds));
Hub.GetClients<MonkeyHub>().updateProgress(bestGenome.Value.Text, generation, generationsPerSecond);

if (bestGenome.Value.Fitness == 0)
{
Hub.GetClients<MonkeyHub>().complete();
break;
}
}
}
}, TaskContinuationOptions.OnlyOnRanToCompletion);
}

If he wanted, he could use this.Caller to communicate with the specific client that called StartTyping. Inside ga.MoveNext we make the decision to go parallel or not based on that checkbox. This is where we pick two random high quality parent monkeys from our population for a potential future monkey. Hopefully one whose typing looks more like Shakespeare and less like a Regular Expression.

By simply changing from Enumerable.Range to ParallelEnumerable.Range we can start taking easily parallelizable things and using all the processors on our machine. Note the code is the same otherwise.

private TextMatchGenome[] CreateNextGeneration()
{
var maxFitness = _currentPopulation.Max(g => g.Fitness) + 1;
var sumOfMaxMinusFitness = _currentPopulation.Sum(g => (long)(maxFitness - g.Fitness));

if (_runParallel)
{
return (from i in ParallelEnumerable.Range(0, _settings.PopulationSize / 2)
from child in CreateChildren(
FindRandomHighQualityParent(sumOfMaxMinusFitness, maxFitness),
FindRandomHighQualityParent(sumOfMaxMinusFitness, maxFitness))
select child).
ToArray();
}
else
{
return (from i in Enumerable.Range(0, _settings.PopulationSize / 2)
from child in CreateChildren(
FindRandomHighQualityParent(sumOfMaxMinusFitness, maxFitness),
FindRandomHighQualityParent(sumOfMaxMinusFitness, maxFitness))
select child).
ToArray();
}
}

My 12 proc desktop does about 3800 generations a second in parallel.

Hexacore Super Computer!

Big thanks to Paul for the lovely port of this to SignalR and to Stephen Toub for the algorithm. 

The code for the SignalR monkeys demo is on my BitBucket. Right now it needs .NET 4.5 and the Visual Studio Developer Preview, but you could remove a few lines and get it working on .NET 4, no problem.

Note that that SignalR works on .NET 4 and up and you can play with it today. You can even chat with the developers in the SignalR chat app in the 'aspnet' room at http://chatapp.apphb.com. Just /nick yourself then /join aspnet.

No monkeys were hurt in the writing of this blog post.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Saturday, November 12, 2011 1:22:20 AM UTC
I'm a mo mo mo MOkey times one Million in Parallel!!
Saturday, November 12, 2011 1:32:52 AM UTC
I am not broadly experienced. But this is an incredibly creative demo! Thanks for sharing. I am trying to catch up to speed on everything going on in web app development. As a relative latecomer without formal schooling, there is so damn much to absorb!
Saturday, November 12, 2011 2:01:40 AM UTC
Cool demo! Are you familiar with Richard Dawkins' (did you know he was a programmer?) similar example with "methinks it is like a weasel"? That's what turned me on to genetic programming lot these many years ago.
Saturday, November 12, 2011 3:03:50 AM UTC
Do I follow you way to much or I allready saw you showing this example ? Awsame exaple btw
Saturday, November 12, 2011 3:56:56 AM UTC
Couldn't open in VS 2010. I changed format # in the .sln file. Looks like that's not enough because problem also when opening the .csproj file. I am not spending any more time trying to figure where the incompatibility is coming from.
Saturday, November 12, 2011 4:06:01 AM UTC
Hi Scott,
It might not work on .NET 4.0. I am getting the following exception, using the .Net 4.0 Platform Update 1.


Could not load type 'System.Web.Security.AntiXss.AntiXssEncoder' from assembly 'System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a

@Tony
The initial hiccup of loading the solution was easily overcome by opening the csproj and changing v4.5 to v4.0

Also don't forget to run

Install-package SignalR

in your package manager to update SignalR reference.

Going to fire up my Win8 VM and try this on .NET 4.5 now
Saturday, November 12, 2011 4:08:15 AM UTC
i dont want to downplay your work, but i always thought that the million monkeys program was meant to find the Shakespeare text on its own, not by knowing what to find and then computing fitness based upon the desired solution. i went into reading this article sure that it was going to convince me to use my computer as a "distributed computing monkey", folding@home-style.

that being said, i think that this is a great example of signalR and AI. i especially appreciate the fact that you broached the "should we use node" architectural discussion-- very objective.

thanks!
Saturday, November 12, 2011 4:40:40 AM UTC
Worked on 4.5 like a charm. Did get my laptop hot 'under the collar' briefly. My aging dual core managed only 682 generations/sec.

Now to try some evil things with SignalR ;-).

Thanks once again Scott.

Note: If your project refuses to load because you don't have localhost configured, edit the project and remove that line. Then set it up to run on iis express (if you like).
Saturday, November 12, 2011 4:41:32 AM UTC
:) Sure, the standard monkeys problem is just a brute force problem done like foldingathome. But it's also a ridiculous problem and is akin to slapping the keyboard. And it still knows what success looks like (as it compares it existing Shakespeare.  

Not saying this is better, it's not. It's different. But I will say this is way more fun.

Also, folks, l will update for .NET 4, but check my G+ page. A guy named Dor did the work for us already.
Scott Hanselman
Saturday, November 12, 2011 11:12:13 AM UTC
Scott, check out the Infinite Monkey Cage on BBC Radio 4, not quite programming but interesting & current science.
Saturday, November 12, 2011 12:45:44 PM UTC
I agree with Micah Smith, this demo has nothing to with the Infinite monkey theorem at all. It's just a nonsense demo, but the technology you are using is cool.
Palesz
Saturday, November 12, 2011 1:06:23 PM UTC
Surely given an "infinite number of monkeys and an infinite number of keyboards" they will write everything that can ever be written?

And if there's an infinite number of them, they'll do it instantly. Or will they never finish? my brain hurts.

Saturday, November 12, 2011 3:34:44 PM UTC
Those that can't be bothered to figure out how to make this work in .NET 4 probably need the million monkeys to write their code as well.

Scott made a cool demo. If you want to learn something, sometimes you have to actually put in a little effort.

Just sayin'
Saturday, November 12, 2011 6:03:24 PM UTC
Dear scott,

I failed to read over the 4 javascript include line,with src=signalr/hubs.
Is this somekind of directory include, or is it just as js file without extention?

Greeting Rolf

P.s. When are you in holland again?
Rolf de Vries
Saturday, November 12, 2011 6:44:19 PM UTC
A cool related story about a local(to me) engineer interested in the same problem.

http://www.rgj.com/article/20111016/BIZ15/110160324/Reno-software-engineer-goes-ape-Shakespeare?odyssey=tab|topnews|text|Business

Thanks for the great article.

Phil
Phil
Sunday, November 13, 2011 1:55:02 AM UTC
Got it going on 4.0. Thanks to Dor's notes on Scott's G+ page.

Interestingly Win7 + .NET 4.0 clocked 701 Generations/Sec.
Sunday, November 13, 2011 4:29:23 PM UTC
But where's the use of IMPS (Infinite Monkey Protocol Suite: RFC 2795)?
Sunday, November 13, 2011 8:10:13 PM UTC
Spooky - I can't imagine, while reading Bob Reselman's post on How much data is big data that he has seen your post, but somehow he manages to finish up with the Hamlet thing. Maybe you two should arrange these things in advance.
Sunday, November 13, 2011 8:11:28 PM UTC
BTW - the mechanism for using OpenId here is very counter-intuitive. You don't get any feedback on the process until you save your comment.
Monday, November 14, 2011 5:38:12 AM UTC
I'm actually using a WCF duplex service on my silverlight oob application to send messages between clients but it do not scales well... SignalR has a client for silverlight (i did not find)? Would it be a better option?
Monday, November 14, 2011 10:39:56 AM UTC
Hmm, the virtual monkeys in my monkey web farm only process "Lorem ipsum"s, they must have inherited some web designers gen...
Monday, November 14, 2011 5:36:51 PM UTC
Does it work on mono? </joke> instantrimshot.com
Monday, November 14, 2011 7:38:21 PM UTC
I made a few changes to the genetic algorithm coding and improved the performance of the overall demo considerably.

Firstly I changed the algorithm to use tournament selection instead of roulette wheel selection and double point crossover instead of single point. Finally I found that for best performance a crossover rate of 1.0 and mutation rate of 0.3 was ideal for a population of 200.

Thanks Scott for this article, it works great on .NET 4 and was really great fun to code.
Jacques Coertze
Tuesday, November 15, 2011 2:29:11 AM UTC
How can you not supply a link to this Simpsons clip?

A thousand monkeys working on a thousand typewriters
JackAce
Tuesday, November 15, 2011 5:31:05 AM UTC
For anyone who doesn't want to put in the extra work, I took the time to fork Scott's project on BitBucket and make it compatible with .Net 4.0. I can verify that it runs in VS 2010 on Windows 7 (I just ran it several times on my machine). Should be sufficient to pitch it to your co-workers. I know my team is really excited about what we can do with this!

My fork on BitBucket
Wednesday, November 23, 2011 11:16:48 AM UTC
Great work. Also is there a place to get the original WinFroms program of monkeys?

On a side note, could you please tell me how to create a sign-in module like yours? I mean did you program all of it or is there a service for this? I want it so baaaaaaad...
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.