In Domain Driven Design, there seems to be lots of agreement that Entities should not access Repositories directly.
Did this come from Eric Evans Domain Driven Desi
Did this come from Eric Evans Domain Driven Design book, or did it come from elsewhere?
It's old stuff. Eric`s book just made it buzz a bit more.
Where are there some good explanations for the reasoning behind it?
Reason is simple - human mind gets weak when it faces vaguely related multiple contexts. They lead to ambiguousness (America in South/North America means South/North America), ambiguousness leads to constant mapping of information whenever mind "touches it" and that sums up as bad productivity and errors.
Business logic should be reflected as clearly as possible. Foreign keys, normalization, object relational mapping are from completely different domain - those things are technical, computer related.
In analogy: if you are learning how to handwrite, you shouldn't be burdened with understanding where pen was made, why ink holds on paper, when paper was invented and what are other famous Chinese inventions.
edit: To clarify: I'm not talking about the classic OO practice of separating data access off into a separate layer from the business logic - I'm talking about the specific arrangement whereby in DDD, Entities are not supposed to talk to the data access layer at all (i.e. they are not supposed to hold references to Repository objects)
Reason is still the same I mentioned above. Here it's just one step further. Why entities should be partially persistence ignorant if they can be (at least close to) totally? Less domain-unrelated concerns our model holds - more breathing room our mind gets when it has to re-interpret it.
Its a very good question. I will look forward to some discussion about this. But I think it's mentioned in several DDD books and Jimmy nilssons and Eric Evans. I guess it's also visible through examples how to use the reposistory pattern.
BUT lets discuss. I think a very valid thought is why should an entity know about how to persist another entity? Important with DDD is that each entity has a responsibility to manage its own "knowledge-sphere" and shouldn't know anything about how to read or write other entities. Sure you can probably just add a repository interface to Entity A for reading Entities B. But the risk is that you expose knowledge for how to persist B. Will entity A also do validation on B before persisting B into db?
As you can see entity A can get more involved into entity B's lifecycle and that can add more complexity to the model.
I guess (without any example) that unit-testing will be more complex.
But I'm sure there will always be scenarios where you're tempted to use repositories via entities. You have to look at each scenario to make a valid judgement. Pros and Cons. But the repository-entity solution in my opinion starts with a lot of Cons. It must be a very special scenario with Pros that balance up the Cons....
At first, I was of the persuasion to allow some of my entities access to repositories (ie. lazy loading without an ORM). Later I came to the conclusion that I shouldn't and that I could find alternate ways:
Vernon Vaughn in the red book Implementing Domain-Driven Design refers to this issue in two places that I know of (note: this book is fully endorsed by Evans as you can read in the foreword). In Chapter 7 on Services, he uses a domain service and a specification to work around the need for an aggregate to use a repository and another aggregate to determine if a user is authenticated. He's quoted as saying:
As a rule of thumb, we should try to avoid the use of Repositories (12) from inside Aggregates, if at all possible.
Vernon, Vaughn (2013-02-06). Implementing Domain-Driven Design (Kindle Location 6089). Pearson Education. Kindle Edition.
And in Chapter 10 on Aggregates, in the section titled "Model Navigation" he says (just after he recommends the use of global unique IDs for referencing other aggregate roots):
Reference by identity doesn’t completely prevent navigation through the model. Some will use a Repository (12) from inside an Aggregate for lookup. This technique is called Disconnected Domain Model, and it’s actually a form of lazy loading. There’s a different recommended approach, however: Use a Repository or Domain Service (7) to look up dependent objects ahead of invoking the Aggregate behavior. A client Application Service may control this, then dispatch to the Aggregate:
He goes onto show an example of this in code:
public class ProductBacklogItemService ... {
...
@Transactional
public void assignTeamMemberToTask(
String aTenantId,
String aBacklogItemId,
String aTaskId,
String aTeamMemberId) {
BacklogItem backlogItem = backlogItemRepository.backlogItemOfId(
new TenantId( aTenantId),
new BacklogItemId( aBacklogItemId));
Team ofTeam = teamRepository.teamOfId(
backlogItem.tenantId(),
backlogItem.teamId());
backlogItem.assignTeamMemberToTask(
new TeamMemberId( aTeamMemberId),
ofTeam,
new TaskId( aTaskId));
}
...
}
He goes on to also mention yet another solution of how a domain service can be used in an Aggregate command method along with double-dispatch. (I can't recommend enough how beneficial it is to read his book. After you have tired from end-lessly rummaging through the internet, fork over the well deserved money and read the book.)
I then had some discussion with the always gracious Marco Pivetta @Ocramius who showed me a bit of code on pulling out a specification from the domain and using that:
1) This is not recommended:
$user->mountFriends(); // <-- has a repository call inside that loads friends?
2) In a domain service, this is good:
public function mountYourFriends(MountFriendsCommand $mount) { /* see http://store.steampowered.com/app/296470/ */
$user = $this->users->get($mount->userId());
$friends = $this->users->findBySpecification($user->getFriendsSpecification());
array_map([$user, 'mount'], $friends);
}
I learnt to code object oriented programming before all this separate layer buzz appear, and my first objects / classes DID map directly to the database.
Eventually, I added an intermediate layer because I had to migrate to another database server. I have seen / heard about the same scenario several times.
I think separating the data access (a.k.a. "Repository") from your business logic, is one of those things, that have been reinvented several times, altought the Domain Driven Design book, make it a lot of "noise".
I currently use 3 layers (GUI, Logic, Data Access), like many developer does, because its a good technique.
Separating the data, into a Repository
layer (a.k.a. Data Access
layer), may be seen like a good programming technique, not just a rule, to follow.
Like many methodologies, you may want to start, by NOT implemented, and eventually, update your program, once you understand them.
Quote: The Iliad wasn't totally invented by Homer, Carmina Burana wasn't totally invented by Carl Orff, and in both cases, the person who put others work, all togheter, got the credit ;-)
From the book, I think the first two pages of the chapter Model Driven Design gives some justification for why you want to abstract out technical implementation details from the implementation of the domain model.
This seems to be all for the purpose of avoiding a separate "analysis model" that becomes divorced from the actual implementation of the system.
From what I understand of the book, it says this "analysis model" can end up being designed without considering software implementation. Once developers try to implement the model understood by the business side they form their own abstractions due to necessity, causing a wall in communication and understanding.
In the other direction, developers introducing too many technical concerns into the domain model can cause this divide as well.
So you could consider that practicing separation of concerns such as persistence can help safeguard against these design an analysis models diverging. If it feels necessary to introduce things like persistence into the model then it is a red flag. Maybe the model is not practical for implementation.
Quoting:
"The single model reduces the chances of error, because the design is now a direct outgrowth of the carefully considered model. The design, and even the code itself, has the communicativeness of a model."
The way I'm interpreting this, if you ended up with more lines of code dealing with things like database access, you lose that communicativeness.
If the need for accessing a database is for things like checking uniqueness, have a look at:
Udi Dahan: the biggest mistakes teams make when applying DDD
http://gojko.net/2010/06/11/udi-dahan-the-biggest-mistakes-teams-make-when-applying-ffffd/
under "All rules aren't created equal"
and
Employing the Domain Model Pattern
http://msdn.microsoft.com/en-us/magazine/ee236415.aspx#id0400119
under "Scenarios for Not Using the Domain Model", which touches on the same subject.
The "data access layer" has been abstracted through an interface, which you call in order to retrieve required data:
var orderLines = OrderRepository.GetOrderLines(orderId);
foreach (var line in orderLines)
{
total += line.Price;
}
Pros: The interface separates out the "data access" plumbing code, allowing you to still write tests. Data access can be handled on a case by case basis allowing better performance than a generic strategy.
Cons: The calling code must assume what has been loaded and what hasn't.
Say GetOrderLines returns OrderLine objects with a null ProductInfo property for performance reasons. The developer must have intimate knowledge of the code behind the interface.
I've tried this method on real systems. You end up changing the scope of what is loaded all the time in an attempt to fix performance problems. You end up peeking behind the interface to look at the data access code to see what is and isn't being loaded.
Now, separation of concerns should allow the developer to focus on one aspect of the code at one time, as much as is possible. The interface technique removes the HOW is this data loaded, but not HOW MUCH data is loaded, WHEN it is loaded, and WHERE it is loaded.
Conclusion: Fairly low separation!
Data is loaded on demand. Calls to load data is hidden within the object graph itself, where accessing a property can cause a sql query to execute before returning the result.
foreach (var line in order.OrderLines)
{
total += line.Price;
}
Pros: The 'WHEN, WHERE, and HOW' of data access is hidden from the developer focusing on domain logic. There is no code in the aggregate that deals with loading data. The amount of data loaded can be the exact amount required by the code.
Cons: When you are hit with a performance problem, it is hard to fix when you have a generic "one size fits all" solution. Lazy loading can cause worse performance overall, and implementing lazy loading may be tricky.
Each use case is made explicit via a Role Interface implemented by the aggregate class, allowing for data loading strategies to be handled per use case.
Fetching strategy may look like this:
public class BillOrderFetchingStrategy : ILoadDataFor<IBillOrder, Order>
{
Order Load(string aggregateId)
{
var order = new Order();
order.Data = GetOrderLinesWithPrice(aggregateId);
return order;
}
}
Then your aggregate can look like:
public class Order : IBillOrder
{
void BillOrder(BillOrderCommand command)
{
foreach (var line in this.Data.OrderLines)
{
total += line.Price;
}
etc...
}
}
The BillOrderFetchingStrategy is use to build the aggregate, and then the aggregate does its work.
Pros: Allows for custom code per use case, allowing for optimal performance. Is inline with the Interface Segregation Principle. No complex code requirements. Aggregates unit tests do not have to mimic loading strategy. Generic loading strategy can be used for majority of cases (e.g. a "load all" strategy) and special loading strategies can be implemented when necessary.
Cons: Developer still has to adjust/review fetching strategy after changing domain code.
With the fetching strategy approach you might still find yourself changing custom fetching code for a change in business rules. It's not a perfect separation of concerns but will end up more maintainable and is better than the first option. The fetching strategy does encapsulate the HOW, WHEN and WHERE data is loaded. It has a better separation of concerns, without losing flexibility like the one size fits all lazy loading approach.