Faster Builds with MSBuild using Parallel Builds and Multicore CPUs
UPDATE: I've written an UPDATE on how to get MSBuild building using multiple cores from within Visual Studio. You might check that out when you're done here.
Jeff asked the question "Should All Developers Have Manycore CPUs?" this week. There are number of things in his post I disagree with.
First, "dual-core CPUs protect you from badly written software" is just a specious statement as it's the OS's job to protect you, it shouldn't matter how many cores there are.
Second, "In my opinion, quad-core CPUs are still a waste of electricity unless you're putting them in a server" is also silly. Considering that modern CPUs slow down when not being used, and use minimal electricity when compared to your desk lamp and monitors, I can't see not buying the best (and most) processors) that I can afford. The same goes with memory. Buy as much as you can comfortably afford. No one ever regretted having more memory, a faster CPU and a large hard drive.
Third he says,"there are only a handful of applications that can truly benefit from more than 2 CPU cores" but of course, if you're running a handful of applications, you can benefit even if they are not multi-threaded. Just yesterday I was rendering a DVD, watching Arrested Development, compiling an application, reading email while my system was being backed up by Home Server. This isn't an unreasonable amount of multitasking, IMHO, and this is why I have a quad-proc machine.
That said, the limits to which a machine can multi-task are often limited to the bottleneck that sits between the chair and keyboard. Jeff, of course, must realize this, so I'm just taking issue with his phrasing more than anything.
He does add the disclaimer, which is totally valid: "All I wanted to do here is encourage people to make an informed decision in selecting a CPU" and that can't be a bad thing.
Now, enough picking on Jeff, let's talk about my reality as a .NET Developer and a concrete reason I care about multi-core CPUs. Jeff compiled SharpDevelop using 2 cores and said "I see nothing here that indicates any kind of possible managed code compilation time performance improvement from moving to more than 2 cores."
When I compiled SharpDevelop via "MSBuild SharpDevelop.sln" (which uses one core) it took 11 seconds:
TotalMilliseconds : 11207.7979
Adding the /m:2 parameter to MSBuild yielded a 35% speed up:
TotalMilliseconds : 7190.3041
And adding /m:4 yielded (from 1 core) a a 59% speed up:
TotalMilliseconds : 4581.4157
Certainly when doing a command line build, why WOULDN'T I want to use all my CPUs? I can detect how many there are using an Environment Variable that is set automatically:
But if I just say /m to MSBuild like
It will automatically use all the cores on the system to create that many MSBuild processes in a pool as seen in this Task Manager screenshot:
The MSBuild team calls these "nodes" because they are cooperating and act as a pool, building projects as fast as they can to the point of being I/O bound. You'll notice that their PIDs (Process IDs) won't change while they are living in memory. This means they are recycled, saving startup time over running MSBuild over and over (which you wouldn't want to do, but I've seen in the wild.)
You might wonder, why do we not just use one multithreaded process for MSBuild? Because each building project wants its own current directory (and potentially custom tasks expect this) and each PROCESS can only have one current directory, no matter how many threads exist.
When you run MSBuild on a SLN (Solution File) (which is NOT an MSBuild file) then MSBuild will create a "sln.cache" file that IS an MSBuild file.
Some folks like to custom craft their MSBuild files and others like to get the auto-generate one. Regardless, when you're calling an MSBuild task, one of the options that gets set is (from an auto-generated file):
<MSBuild Condition="@(BuildLevel1) != ''" Projects="@(BuildLevel1)" Properties="Configuration=%(Configuration); Platform=%(Platform); ...snip... BuildInParallel="true" UnloadProjectsOnCompletion="$(UnloadProjectsOnCompletion)" UseResultsCache="$(UseResultsCache)"> ...
When you indicate BuildInParallel you're asking for parallelism in building your Projects. It doesn't cause Task-level parallelism as that would require a task dependency tree and you could get some tricky problems as copies, etc, happened simultaneously.
However, Projects DO often have dependency on each other and the SLN file captures that. If you're using a Visual Studio Solution and you've used Project References, you've already given the system enough information to know which projects to build first, and which to wait on.
More Granularity (if needed)
If you are custom-crafting your MSBuild files, you could turn off parallelism on just certain MSBuild tasks by adding:
to specific MSBuild Tasks and then just those sub-projects wouldn't build in parallel if you passed in a property from the command line:
MSBuild /m:4 /p:BuildInParallel=false
But this an edge case as far as I'm concerned.
How does BuildInParallel relate to the MSBuild /m Switch?
Certainly, if you've got a lot of projects that are mostly independent of each other, you'll get more speed up than if your solution's dependency graph is just a queue of one project depending on another all the way down the line.
In conclusion, BuildInParallel allows the MSBuild task to process the list of projects which were passed to it in a parallel fashion, while /m tells MSBuild how many processes it is allowed to start.
If you have multiple cores, you should be using this feature on big builds from the command line and on your build servers.
Thanks to Chris Mann and Dan Mosley for their help and corrections.
- MSBuild Blog - The MSBuild team's blog.
- MSBuild Sidekick v2 - A visual editor, ala NantPad, for MSBuild files.
- MSBuild Visualizer - An old project of Mitch Denny's trying to make large MSBuild files easier to visualize.
- How to get MSBuild building using multiple cores from within Visual Studio