问题
Document oriented databases (particularly RavenDB) are really intriguing me, and I'm wanting to play around with them a bit. However as someone who is very used to relational mapping, I was trying to think of how to model data correctly in a document database.
Say I have a CRM with the following entities in my C# application (leaving out unneeded properties):
public class Company
{
public int Id { get; set; }
public IList<Contact> Contacts { get; set; }
public IList<Task> Tasks { get; set; }
}
public class Contact
{
public int Id { get; set; }
public Company Company { get; set; }
public IList<Task> Tasks { get; set; }
}
public class Task
{
public int Id { get; set; }
public Company Company { get; set; }
public Contact Contact { get; set; }
}
I was thinking of putting this all in a Company
document, as contacts and tasks do not have a purpose out side of companies, and most of the time query for a task or contacts will also show information about the associated company.
The issue comes with Task
entities. Say the business requires that a task is ALWAYS associated with a company but optionally also associated with a task.
In a relational model this is easy, as you just have a Tasks
table and have the Company.Tasks
relate to all tasks for the company, while Contact.Tasks
only show the tasks for the specific Task.
For modeling this in a document database, I thought of the following three ideas:
Model Tasks as a separate document. This seems kind of anti-document db as most of the time you look at a company or contact you will want to see the list of tasks, thus having to perform joins over documents a lot.
Keep tasks that are not associated with a contact in the
Company.Tasks
list and put tasks assocaited with a contact in the list for each individual contacts. This unfortunately means that if you want to see all tasks for a company (which will probably be a lot) you have to combine all tasks for the company with all tasks for each individual contact. I also see this being complicated when you want to disassociate a task from a contact, as you have to move it from the contact to the companyKeep all tasks in the
Company.Tasks
list, and each contact has a list of id values for tasks it is associated with. This seems like a good approach except for having to manually take id values and having to make a sub-list ofTask
entities for a contact.
What is the recommended way to model this data in a document oriented database?
回答1:
Use denormalized references:
http://ravendb.net/faq/denormalized-references
in essence you have a DenormalizedReference class:
public class DenormalizedReference<T> where T : INamedDocument
{
public string Id { get; set; }
public string Name { get; set; }
public static implicit operator DenormalizedReference<T> (T doc)
{
return new DenormalizedReference<T>
{
Id = doc.Id,
Name = doc.Name
}
}
}
your documents look like - i've implemented the INamedDocument interface - this can be whatever you need it to be though:
public class Company : INamedDocument
{
public string Name{get;set;}
public int Id { get; set; }
public IList<DenormalizedReference<Contact>> Contacts { get; set; }
public IList<DenormalizedReference<Task>> Tasks { get; set; }
}
public class Contact : INamedDocument
{
public string Name{get;set;}
public int Id { get; set; }
public DenormalizedReference<Company> Company { get; set; }
public IList<DenormalizedReference<Task>> Tasks { get; set; }
}
public class Task : INamedDocument
{
public string Name{get;set;}
public int Id { get; set; }
public DenormalizedReference<Company> Company { get; set; }
public DenormalizedReference<Contact> Contact { get; set; }
}
Now saving a Task works exactly as it did before:
var task = new Task{
Company = myCompany,
Contact = myContact
};
However pulling all this back will mean you're only going to get the denormalized reference for the child objects. To hydrate these I use an index:
public class Tasks_Hydrated : AbstractIndexCreationTask<Task>
{
public Tasks_Hydrated()
{
Map = docs => from doc in docs
select new
{
doc.Name
};
TransformResults = (db, docs) => from doc in docs
let Company = db.Load<Company>(doc.Company.Id)
let Contact = db.Load<Contact>(doc.Contact.Id)
select new
{
Contact,
Company,
doc.Id,
doc.Name
};
}
}
And using your index to retrieve the hydrated tasks is:
var tasks = from c in _session.Query<Projections.Task, Tasks_Hydrated>()
where c.Name == "taskmaster"
select c;
Which i think is quite clean :)
As a design conversation - the general rule is that if you ever need to load the child documents alone as in - not part of the parent document. Whether that be for editing or viewing - you should model it with it's own Id as it's own document. Using the method above makes this quite simple.
回答2:
I'm new to document dbs as well...so with a grain of salt...
As a contrasting example...if you are on Twitter and you have a list of the people you follow, which contains a list of their tweets...you would not move their tweets into your twitter account in order to read them, and if you re-tweet, you would only have a copy, not the original.
So, in the same way, my opinion is that if Tasks belong to a company, then they stay within the Company. The Company is the Aggregate Root for Tasks. The Contacts can only hold references (ids) or copies of the Tasks and cannot modify them directly. If you have your contact hold a "copy" of the task, that's fine, but in order to modify the task (e.g. mark it complete) you would modify the task through its Aggregate Root (Company). Since a copy could quickly become outdated, it seems like you would only want a copy to exist while in memory and when saving the Contact, you would only save references to the Tasks.
来源:https://stackoverflow.com/questions/6285711/how-would-i-model-data-that-is-heirarchal-and-relational-in-a-document-oriented