问题
I've started a new project and they have a very normalized database. everything that can be a lookup is stored as the foreign key to the lookup table. this is normalized and fine, but I end up doing 5 table joins for the simplest queries.
from va in VehicleActions
join vat in VehicleActionTypes on va.VehicleActionTypeId equals vat.VehicleActionTypeId
join ai in ActivityInvolvements on va.VehicleActionId equals ai.VehicleActionId
join a in Agencies on va.AgencyId equals a.AgencyId
join vd in VehicleDescriptions on ai.VehicleDescriptionId equals vd.VehicleDescriptionId
join s in States on vd.LicensePlateStateId equals s.StateId
where va.CreatedDate > DateTime.Now.AddHours(-DateTime.Now.Hour)
select new {va.VehicleActionId,a.AgencyCode,vat.Description,vat.Code,
vd.LicensePlateNumber,LPNState = s.Code,va.LatestDateTime,va.CreatedDate}
I'd like to recommend that we denormaize some stuff. like the state code. I don't see the state codes changing in my lifetime. similar story with the 3-letter agency code. these are handed out by the agency of agencies and will never change.
When I approached the DBA with the state code issue and the 5 table joins. i get the response that "we are normalized" and that "joins are fast".
Is there a compelling argument to denormalize? I'd do it for sanity if nothing else.
the same query in T-SQL:
SELECT VehicleAction.VehicleActionID
, Agency.AgencyCode AS ActionAgency
, VehicleActionType.Description
, VehicleDescription.LicensePlateNumber
, State.Code AS LPNState
, VehicleAction.LatestDateTime AS ActionLatestDateTime
, VehicleAction.CreatedDate
FROM VehicleAction INNER JOIN
VehicleActionType ON VehicleAction.VehicleActionTypeId = VehicleActionType.VehicleActionTypeId INNER JOIN
ActivityInvolvement ON VehicleAction.VehicleActionId = ActivityInvolvement.VehicleActionId INNER JOIN
Agency ON VehicleAction.AgencyId = Agency.AgencyId INNER JOIN
VehicleDescription ON ActivityInvolvement.VehicleDescriptionId = VehicleDescription.VehicleDescriptionId INNER JOIN
State ON VehicleDescription.LicensePlateStateId = State.StateId
Where VehicleAction.CreatedDate >= floor(cast(getdate() as float))
回答1:
I don't know if I would even call what you want to do denormalization -- it looks more like you just want to replace artificial foreign keys (StateId, AgencyId) with natural foreign keys (State Abbreviation, Agency Code). Using varchar fields instead of integer fields will slow down join/query performance, but (a) if you don't even need to join the table most of the time because the natural FK is what you want anyway it's not a big deal and (b) your database would need to be pretty big/have a high load for it to be noticeable.
But djna is correct in that you need a complete understanding of current and future needs before making a change like this. Are you SURE the three letter agency codes will never change, even five years from now? Really, really sure?
回答2:
Some denormalization can be needed for performance (and sanity) reasons at some times. Hard to tell wihout seeing all your tables / needs etc...
But why not just build a few convenience views (to do a few joins) and then use these to be able to write simpler queries?
回答3:
Beware of wanting to shape things to your current idioms. Right now the unfamiliar code seems unweildy and obstructive to your understanding. In time it's possible that you will become acclimatised.
If current (or known future) requirements, such as performance are not being met then that's a whole different issue. But remember anything can be performance tuned, the objective is not to make things as fast as possible, but to make them fast enough.
回答4:
This previous post dealt with a similar issue to the one you're having. Hopefully it will be helpful to you.
Dealing with "hypernormalized" data
My own personal take on normalization is to normalize as much as possible, but denormalize only for performance. And evn the denormalization for performance is something to avoid. I'd go the route of profiling,setting correct indexes, etc before I'd denormalize.
Sanity... That's overrated. Especially in our profession.
回答5:
Well, what about the performance? If the performance is okay, just make the five table JOIN into a view and, for sanity, SELECT from the view when you need the data.
State abbreviations are one of the cases in which I think meaningful keys are okay. For very simple lookup tables with a limited number of rows and where I'm in complete control of the data (meaning it's not populated from some outside source) I'll sometimes create meaningful four or five character keys so that the key value can proxy for the fully descriptive lookup value in some queries.
回答6:
Create a view (or inline table-valued function to get parameterization). In any case, I usually put all my code into SPs (some code generated) whether they use views or not and that's that, you pretty much only ever write the join once.
回答7:
An argument (for this "normalization") that the three-letter codes might change isn't very compelling without a plan for what you will do if the codes do change, and how your artificial-key scenario will address this eventuality better than using the codes as keys. Unless you've implemented a fully temporal schema (which is horribly difficult to do and not suggested by your example), it's not obvious to me how your normalization benefits you at all. Now if you work with agencies from multiple sources and standards that might have colliding code names, or if "state" might eventually mean a two-letter code for state, province, department, canton, or estado, that's another matter. You then need your own keys or you need a two-column key with more information than that code.
来源:https://stackoverflow.com/questions/1575189/denormalizing-for-sanity-or-performance