Assembly Fiefdoms: What's the Right Number of Assemblies/Libraries?
Deciding how to physically partition up your application isn't a .NET specific topic, but I'll use the word "Assemblies" and use .NET terminology since we work largely in .NET.
What's the right number of assemblies? When do you split up functionality into another assembly? Should you use ILMerge to make a single über-assembly? Perhaps one class or one namespace should be inside each assembly?
Here's my thinking...organized very simply.
- The number of assemblies should NOT relate or correlate in any way to the number of developers.
- Just as your logical design doesn't have to be paralleled in your physical deployment, your number of employees shouldn't affect the number of assemblies you have. If you are a single developer, or if you're 50, there's a "comfortable" number (a number that's right for that app) of assemblies your application should have , and that number doesn't change if you add or remove people.
- Your source control system shouldn't affect your assembly count.
- May karma help you if you're using Visual Source Safe or a source control system that requires exclusive checkouts, but try to avoid letting your source management system indirectly or directly put pressure on developer to move towards a certain number of assemblies. Avoid Assembly Fiefdoms.
- 1 Namespace != 1 Assembly
- Consider first organizing your code logically (namespaces) rather than physically (assemblies). Use namespace hierarchies that make sense. System.IO and System.Compression and System.IO.Ports come to mind.
Robert Martin has an excellent PDF on Granularity, while focused on C++, the concepts aren't C++ specific. He has a section called "Designing with Packages" where he asks these questions, that we should ask ourselves when we start organizing an application. Substitute "packages" for "assemblies" if you like. He defines a package as a "releasable entity.":
1. What are the best partitioning criteria?
2. What are the relationships that exist between packages, and what design principles govern their use?
3. Should packages be designed before classes (Top down)? Or should classes be
designed before packages (Bottom up)?
4. How are packages physically represented? In C++? In the development environment?
5. Once created, to what purpose will we put these packages?
[Robert Martin's "Granularity"]
He then goes on to answer these questions. Do read the PDF, but here's some terse quotes along with my own commentary:
"The granule of reuse is the granule of release."
If you're creating a reusable library or framework, think about the developer who is "downstream" from you and how they will reuse your library.
"The classes in a package are reused together. If you reuse one of the classes in a package, you reuse them all."
Not only this, but you reuse all their dependencies. Consider what the developer will be creating with your library and if he/she needs to include other assemblies of yours to accomplish the goal. If they have to add 4 assemblies to implement 1 plugin, you might consider rethinking the number of assemblies in your deployment.
Consider the effect of changes on the user of your library. When you make a change to a class, perhaps one that the user/developer doesn't care about, do they need to distribute that newly versioned library anyway, receiving a change in a class they didn't use?
"The classes in a package should be closed together against the same kinds of changes."
Developers don't usually use/reuse a single class from your assembly, rather, they'll use a series of related classes, either in the same namespace or in a parent namespace. Consider this, and the guideline above when you decide what classes/namespaces go in what assembly.
Classes that are related usually change for related reasons and should be kept together in a related assembly, so that when change happens, it's contained in a reasonably sized physical package.
"The dependant structure between packages must be a directed acyclic graph."
Basically, avoid circular dependencies. Avoid leapfrogging and having a (stable) Common assembly dependant on a downstream assembly. Remember that stability measures how easily an assembly can change without affecting everyone else. Ralf Westphal has a great article on Dependency Structure using Lattix, an NDepend competitor. Patrick Smacchia, the NDepend Lead Developer, has a fine article on the evils of Dependency Cycles.
Ralf has some very pragmatic advice that I agree with:
"Distinguish between implementation and deployment. Split code into as many assemblies as seems necessary to get high testability, and productivity, and retain a good structure." [Ralf Westphal]
This might initially sound like a punt or a cop-out, but it's not. The number of assemblies you release will likely asymptotically approach one. It'll definitely be closer to one than to one-million. However, if you ever reach one, worry, and think seriously about how you got there.
Udi Dahan commented on Patrick's Guiding Principles of Software Development and said, quoting himself from elsewhere (emphasis mine):
In the end, you should see bunches of projects/dlls which go together, while between the bunches there is almost no dependence whatsoever. The boundaries between bunches will almost always be an interface project/dll.
This will have the pleasant side-effect of enabling concurrent development of bunches with developers hardly ever stepping on each other’s toes. Actually, the number of projects in a given developer’s solution will probably decrease, since they no longer have to deal with all parts of the system.
I would respectfully disagree with the bolded point and direct back to the principle that team structure should never dictate deployment structure. A good source control system along with pervasive communication as well as good task assignment should prevent "stepping on...toes." Sure, it's perhaps a side effect, but definitively not one worth mentioning or focusing on.
What's the conclusion? Consider these things when designing your deployment structure (that is, when deciding if something should be a new assembly or not):
- Where will it be deployed?
- Remember that compile-time references aren't necessarily runtime references. Assemblies aren't loaded until the code that uses them is JIT'ted.
- Think about where your 3rd Party Libraries fit in. Are they hanging off your "root" assembly, or are they abstracted away behind a plugin assembly?
- Consider putting "undesirable dependencies" as far away from your main stuff as possible. Don't get them tangled in your main tree. "Things burn, Colonel."
- If you're making a library, think about how many dependencies "come along for the ride" when your downstream developer/user does an Add Reference.
- If they have to add 4 assemblies to make 1 plugin, that's weak. (DasBlog has this problem with custom macros, we force you to add 2. That's lame of us.)
- If you're big into plugins and you're *all about* interfaces, do consider putting them in their own assembly.
- Consider grouping assemblies based on shared functionality, where Assembly == Subsystem, while watching for cycles and in appropriate, ahem, coupling.
- If you notice that some assemblies seem to always go together, always versioning and changing together, like peas and carrots, perhaps they should be one.
- Notice "natural clumping" and consider if that clumping is happening for reasons of Good Design or Bad Design, and act accordingly.
- Personally, while I think ILMerge is a very cool tool, I have to find a good reason to use it in the Enterprise, except for when releasing small XCopy-able utilities to end-user's machines where I really want a single file.
What IS the right number of assemblies, Dear Reader?