问题
This is a follow up question for Table design about sets of data collection elements as I am still trying to come up with a design.
What I would like to do is to be able to pre-define what study/protocol pair requires as a data collection to be displayed like a to-do list or checklist which can be tracked at clinic visits for patients. Attached is what I have so far with possible examples in each table but I have never implemented supertype/subtype relationship so I am not sure if I am on the right track. Does it overly normalized? or should I even bother going with supertype/subtype?
Any thoughts/feedback would help.
EDIT
@YoungBob First of all thanks a lot for your input. FormId(PK) is also a foreign key to DataCollectionId so I can query either tables with the same ID by DataCollection.DatacollectionId = Form.FormId to get both level attributes, no?
I will not provide an interface to create these forms on the fly so that is why I didn't want to include sections or question types but I liked the idea of including version control.
As you mentioned I will load it with test data to see the performance whether I should de-normalise any tables.
Since I posted the question I have added the link for DataCollectionIntervals as you suggested in this manner - is it looking much better?
http://imageshack.us/f/716/erd02.png/
回答1:
The schema design looks fine to me, at least based on the information you gave in this and the previous post. Best practice is to start with a normalised design and then de-normalise where you think query optimisation is needed. I'm guessing the database isn't going to be massive or have a high rate of transactions, so performance shouldn't be an issue so I would stick with the normalised design. As a rule of thumb denormalisation may be worthwhile if you need to write queries which join more than 4 tables (in sql server at least), but I can't really see that happening with this schema design.
As you suggest in your question the Form and Sample tables could be candidates for denormalisation by including attributes for both within the DataCollection table, but this will depend on how many other attributes Form and Sample have and how many are common to both.
One tip i would give is to consider giving Form table a primary key which is short character string, assuming you have fairly standard forms, which I find makes life a bit easier when browsing tables (e.g. a bit like HMRC forms P45, P60, etc. or airport codes LHR, JFK, etc.) as you don't then have to keep joining with the other tables to remember which form a particular int ID refers to. A CHAR(3) field also uses less storage than an INT. This may apply to other tables like DataCollectionType. But this is probably a matter of personal preference.
From our discussion in the previous post the DataFrequency table we talked about probably should be a many-1 link to the DataCollection table. Perhaps DataCollectionIntervals may be a better name for it.
One other thing to think about in the design is whether some frequently accessed tables would benefit form vertical splitting. By this I mean if the table has wide rows i.e. a lot of attributes or storage hungry attributes like VARCHAR(MAX) infrequently accessed ones can be split off into a separate table with a 1-1 link which can significantly improve query performance involving this table. But as I say I don't really see performance being an issue with the size of database you're planning and assuming you'll be using something like SQL Server.
And one final thing...the structure of Forms may be a bit more complex than the schema currently indciates, for example Forms are typically broken into several sections and question types needed can be quite complex e.g. multiple choice, text, branching, conditional. Also forms can exist in different versions (use Active flag to identify currently active version in Forms table). I've looked at using queXML myself to design a questionnaire in XML but decided it was a bit overkill for what I needed so I decided on a simpler XML schema of my own which can be imported into the database.
来源:https://stackoverflow.com/questions/15578997/erd-draft-for-creating-grouped-data-collection-list