What is the best database schema to support values that are only appropriate to specific rows?

问题

I have a db table called Calendar with fields

Id (PK)
Name
Description
CalendarTypeId (FK into CalendarType table)

I have another table called CalendarType with fields

Id (PK)
Name
Description

The issue is that i need to store an additional field for every calendar where the calendar Type is 2. (but this field would be irrelevant for any other calendar type).

Should i just create a new field in the Calendar table and ignore that field for all other calendar that have a different calendarTypeid or is there a better way to organize this schema to support this need.

回答1:

Ok, this is the ER model of what you currently have (omitting cardinalities):

Now, let's focus on the Calendar and SubCalendar. Clearly, you have a hierarchy there. But how are hierarchies turned into tables? There are three common ways to do this:

1) Kill the parent and keep the children: In this case you remove the parent entity and send all the fields from that entity to each of the children. In your example you only have one child, so all the parent's attributes will go only to it.

Advantages: No null values as each table will have all it needs. No joins are required either. If you will be running queries searching just for one type of children this schema will be useful because you won't need to filter by type because each table will store only one type

Disadvantages: This schema is not appropriate for cases where you have overlapping children. In other words, if a parent row can have more than one children when sending the fields to each child the parent data will be duplicated in each children. Not good, so don't use this strategy if that is the case. Additionally, if you have many children and very few records in each, you'll have many tables with few records each, so it might become a little harder to manage

2) Kill children and keep the parent: In this case you remove all the children and send all of their attributes to the parent. As the parent is now a mix of itself and all of its children it needs a way to determine what row belongs to what type of children. This is accomplished by adding a new attribute to the parent entity that will determine the type of each row (no matter the data type).

Advantages: There will only be one table for all children, so that is easy to manage. No joins are required. Might be useful if the most queries that are run against this table require results from more than one type of children.

Disadvantages: Again, if a parent can have a row that relates to more than one children data will be duplicated as there will be one row per each of them, so there is a limitation in this solution. Additionally, a new column will have to be added as metadata. The volume of records in the table will be the larger. Null values will have to be assigned to the data that children have and the parent nor other children have.

3) Keep all: The least bloody solution is not to kill anything :) In this case the hierarchy is replaced by a relationship between the parent and each of the children. That way, a child will have to join to the parent table by means of a foreign key to reach the parent's data.

Advantages: There is no data duplication nor null values. Each entity has just the minimal amount of data and the rest can be obtained by joining to the parent table. In this case, a parent row can be linked to more than one children without duplicating data. If there will be run many queries that can be satisfied with only one table (usually the parent) this is a good option. One more thing is that it is easy to extend to more calendars, for example, if a new calendar is to be added that requires new fields, then a new table has to be added, without modifying the current ones

Disadvantages: Requires the most tables (actually one more than the first one). A join is needed per each children which will degrade performance the larger the dataset becomes. Additionally, foreign keys will be needed to join both tables. If the most queries will require data from parent and children this schema will be the worst in terms of performance

Now, you asked which is the best database schema. I think it is now clear it depends on the requirements, types of queries that will be run, the way data is structured, etc.

However, I can analyze this a little bit more. You said that you had a Calendar table and sometimes more data is needed for one of them. So we can say we have 2 types of calendars, the parent and the child. So we might think that going for solution 2 is a good possibility because you'll have 2 rows representing each type, but we would be wrong. This is because each child includes its parent in this case. Now, if we can assume that if SubAttribute will always be non-null for a child and null for a parent we can even remove CalendarType, which will actually lead to solution 1.

Finally, as a rule of thumb (and mainly because most queries have lots of joins in real life), if you want to focus on performance, you should go for solution 1, otherwise, if you want to focus on having normalized design you should go for solution 3.

I hope this has cleared some doubts and possibly generated others :)

回答2:

I would probably use Calendar. I call it overloading the Db table. When data storage was expensive, this was a crime. Now it's called solving the problem the easy way and moving on. Never over engineer until you really need to.

However, you didn't explicitly state whether the extra field value varied for each instance of Calendar with a typeID of 2. Sometimes my Type tables have sub-type fields etc, but I will assume it's the case where the Calendar instances of Type 2 WILL have different values in the required field.

回答3:

Maybe I'm looking at this too simply, but if you stick to the model of "use before reuse" then the right thing to do is simply add the null-able column to your calendar table and add a check constraint back to calendar type ensuring it's not null if calendar type = 2.

It's straight forward, and most of all it's easily testable.

I might get some slack for this answer (not the most efficient probably), but it completely depends on the scale of your solution. The reality is these constraints could very well change in the next couple months and you don't want to paint yourself into a corner by picking the "right" way when you don't know what that is yet. It's entirely possible that when you get to the 10th calendar type a pattern will emerge which really will tell you the best (or most normal) way to do it. For now, just keep it simple and make it easy to test and easy to change later.

回答4:

You can use the Single Table Inheritance Pattern, which is close to your suggestion,

http://martinfowler.com/eaaCatalog/singleTableInheritance.html

http://martinfowler.com/eaaCatalog/classTableInheritance.html

if you want to specialize some tables to match the types (Calendar and CalendarType2) you are trying to represent in your database

回答5:

Leora,

I would recommend that you use the calendar table, and null the extra fields that are not required for the other calendar types. As requirements change, you will be able to add more attributes to the calendar table this way.

I would also recommend having a base calendar class for your model, then creating sub-classes mapped using the calendartypeid field, and use the specific calendar subclasses within your application as required. Most ORMS will support this type of mapping and will also allow you to render each subclass different from the others if the need arises

Stephen

来源：https://stackoverflow.com/questions/8685621/what-is-the-best-database-schema-to-support-values-that-are-only-appropriate-to

标签

database-design

database-schema