How do you determine how far to normalize a database?

前端 未结 13 2068
醉梦人生
醉梦人生 2020-11-29 06:36

When creating a database structure, what are good guidelines to follow or good ways to determine how far a database should be normalized? Should you create an un-normalized

相关标签:
13条回答
  • 2020-11-29 07:14

    Jeff has a pretty good overview of his philosophy on his blog: Maybe normalization isn't normal. The main thing is: don't overdo normalization. But I think an even bigger point to take away is that it probably doesn't matter too much. Unless you're running the next Google, you probably won't notice much of a difference until your application grows.

    0 讨论(0)
  • 2020-11-29 07:15

    Normalization means eliminating redundant data. In other words, an un-normalized or de-normalized database is a database where the same information will be repeated in multiple different places. This means you have to write more complex update statement to ensure you update the same data everywhere, otherwise you get inconsistent data which in turn means the output of queries is unrealiable.

    This is a pretty huge problem, so I would say denormalization hurts, not the other way around.

    In some case you may deliberately decide to denormalize specific parts of a database, if you judge that the benefit outweighs the extra work in updating data and the risk of data corruption. For example with datawarehouses, where data is aggregated for performance reasons, and data if often not updated after the initial entry which reduce the risk of inconsistencies.

    But in general be weary of denormalizing for performance. For example the performance benefit of a denormalized join can typically be achieved by using materialized view (also called indexed view), which will be as fast as querying a denormalized table, but still protects the consistency of the data.

    0 讨论(0)
  • 2020-11-29 07:15

    I agree that you should normalise as much as possible and only denormalise if absolutely necessary for performance. And with materialised views or caching schemes this is often not necessary.

    The thing to bare in mind is that by normalising your model you are giving the database more information on how to constrain your data so that you can remove the risk of update anomalies that can occur in incompletely normalised models.

    If you denormalise then you either need to live with the fact that you may get update anomolies or you need to implement the constraint validation yourself in your application code. This takes away a lot of the benefit of using a DBMS which lets you define these constraints declaratively.

    So assuming the same quality of code, denormalising may not actually give you better performance.

    Another thing to mention is that hardware is cheap these days so throwing extra processing power at the problem is often more cost effective than accepting the potential costs of cleaning up corrupted data.

    0 讨论(0)
  • 2020-11-29 07:16

    You want to start designing a normalized database up to 3rd normal form. As you develop the business logic layer you may decide you have to denormalize a bit but never, never go below the 3rd form. Always, keep 1st and 2nd form compliant. You want to denormalize for simplicity of code, not for performance. Use indexes and stored procedures for that :)

    The reason not "normalize as you go" is that you would have to modify the code you already have written most every time you modify the database design.

    There are a couple of good articles:

    http://www.agiledata.org/essays/dataNormalization.html

    0 讨论(0)
  • 2020-11-29 07:20

    The truth is that "it depends." It depends on a lot of factors including:

    • Code (Hand-coded or Tool driven (like ETL packages))
    • Primary Application (Transaction Processing, Data Warehousing, Reporting)
    • Type of Database (MySQL, DB/2, Oracle, Netezza, etc.)
    • Database Architecture (Tablular, Columnar)
    • DBA Quality (proactive, reactive, inactive)
    • Expected Data Quality (do you want to enforce data quality at the application level or the database level?)
    0 讨论(0)
  • 2020-11-29 07:23

    Database normizational I feel is an art form.

    You don't want to over normalize your database because you will have too many tables and it will cause your queries of even simple objects take longer than they should.

    A good rule of thumb I follow is to normalize the same information repeated over and over again.

    For example if you are creating a contact management application it would make sense to have Address (Street, City, State, Zip, etc. . ) as its own table.

    However if you have only 2 types of contacts, Business or personal, do you need a contact type table if you know you are only going to have 2? For me no.

    I would start by first figuring out the datatypes you need. Use a modeling program to help like Visio. You don't want to start with a non-normalized database because you will eventually normalize. Start by putting objects in there logical groupings, as you see data repeated take that data into a new table. I would keep up with that process until you feel you have the database designed.

    Let testing tell you if you need to combine tables. A well written query can cover any over normalization.

    0 讨论(0)
提交回复
热议问题