Overnormalization

前端 未结 11 883
一整个雨季
一整个雨季 2020-12-24 01:48

When would a database design be described as overnormalized? Is this characterization an absolute one? Or is it dependent on the way it is used in the application? Thanks.

相关标签:
11条回答
  • 2020-12-24 02:27

    Normalize your OLTP databases, and denormalize your OLAP databases. Each has a mission that dictates its schema. Like normalized transaction databases, data warehouses exist for a reason. A complete system needs both.

    0 讨论(0)
  • 2020-12-24 02:29

    Normalization is absolute. A database follows Normal Forms or it does not. There are a half-dozen normal forms. Mostly, they have names like First through Fifth. Plus there's a Boyce-Codd Normal Form.

    Normalization exists for precisely one purpose -- to prevent "update anomalies".

    Normalization isn't subjective. It isn't a judgement. Each table and relationship among tables either does or does not follow a normal form.

    Consequently, you can't be "over-normalized" or "under-normalized".

    Having said that, normalization has a performance cost. Some people elect to denormalize in various ways to improve performance. The most common sensible denormalization is to break 3NF and include derived data.

    A common mistake is to break 2NF and have duplicate copies of a functional dependency between a key and non-key value. This requires extra updates or -- worse -- triggers to keep the copies in parallel.

    Denormalization of a transactional database should be a case-by-case situation.

    A data warehouse, also, rarely follows any of the transactional normalization rules because it's (essentially) never updated.

    "Over-normalization" could mean that a database is too slow because of a large number of joins. This may also mean that the database has outgrown the hardware. Or that the applications haven't been designed to scale.

    The most common issue here is that folks try to use a transactional database for reporting while transactions are going on. The locking for transactions interferes with reporting.

    "Under-normalization," however, means that there are NF violations and needless processing is being done to handle the replicated data and correct update anomalies.

    0 讨论(0)
  • In the general sense, I think that overnormalized is when you are doing so many JOINs to retrieve data that it is causing notable performance penalties and deadlocks on your database, even after you've tuned the heck out of your indexes. Obviously, for huge applications and sites like MySpace or eBay, de-normalization is a scaling requirement.

    As a developer for several small businesses, I tell you that in my experience it's always been easier to go from normalized -> denormalized than the other way around, and in fact going the other way around (to avoid duplication of data now that the business requirements have changed a year or so later) is much more difficult.

    When I read general statements such as "you should put the address in your customers table instead of a separate address table so you can avoid the join", I shudder, because you just know that a year from now somebody's going to ask you to do something with addresses that you totally didn't foresee, like maintaining an audit trail, or storing multiple per customer. If your database allows you to create an indexed view, you can sidestep that issue until you get to the point where your dataset is so large that it can't possibly exist or be served by a single server or set of servers in a 1-write, many-read environment. For most of us, I don't think that scenario happens very often.

    When in doubt, I aim for third normal form with some exceptions (for example, having a field contain a CSV-list of separated strings because I know I'll never ever look at the data from the other angle). When I need to consolidate, I'll look at my views or indexes first. Hope this helps.

    0 讨论(0)
  • 2020-12-24 02:33

    In my experience, I've never seen a normalized database that contains postal addresses, as it's usually acceptable to store the address as a string. Ideally, there would be tables for countries, counties / states, cities, districts and streets. I've not come across anyone who needs to report on street level, so it hasn't been necessary. The addresses have only be used for postal contact, so are treated as a single entity.

    0 讨论(0)
  • 2020-12-24 02:34

    When the performance cost exceeds the benefit towards the application's intended purpose.

    0 讨论(0)
  • 2020-12-24 02:39

    A lot of people are talking about performance. I think a key issue is flexibility. In general, the more normalized your database, the more flexible it is.

    We currently use an "over-normalized" database because, in our operating environment, client requirements change on a monthly basis. By "over-normalizing" we can adopt our software accordingly, without changing the database structure.

    0 讨论(0)
提交回复
热议问题