Normalization in plain English

前端 未结 11 1340
太阳男子
太阳男子 2020-11-28 22:53

I understand the concept of database normalization, but always have a hard time explaining it in plain English - especially for a job interview. I have read the wikipedia p

相关标签:
11条回答
  • 2020-11-28 23:16

    Well, if I had to explain it to my wife it would have been something like that:

    The main idea is to avoid duplication of large data.

    Let's take a look at a list of people and the country they came from. Instead of holding the name of the country which can be as long as "Bosnia & Herzegovina" for every person, we simply hold a number that references a table of countries. So instead of holding 100 "Bosnia & Herzegovina"s, we hold 100 #45. Now in the future, as often happens with Balkan countries, they split to two countries: Bosnia and Herzegovina, I will have to change it only in one place. well, sort of.

    Now, to explain 2NF, I would have changed the example, and let's assume that we hold the list of countries every person visited. Instead of holding a table like:

    Person   CountryVisited   AnotherInformation   D.O.B.
    Faruz    USA              Blah Blah            1/1/2000
    Faruz    Canada           Blah Blah            1/1/2000
    

    I would have created three tables, one table with the list of countries, one table with the list of persons and another table to connect them both. That gives me the most freedom I can get changing person's information or country information. This enables me to "remove duplicate rows" as normalization expects.

    0 讨论(0)
  • 2020-11-28 23:17

    This is not a thorough explanation, but one goal of normalization is to allow for growth without awkwardness.

    For example, if you've got a user table, and every user is going to have one and only one phone number, it's fine to have a phonenumber column in that table.

    However, if each user is going to have a variable number of phone numbers, it would be awkward to have columns like phonenumber1, phonenumber2, etc. This is for two reasons:

    • If your columns go up to phonenumber3 and someone needs to add a fourth number, you have to add a column to the table.
    • For all the users with fewer than 3 phone numbers, there are empty columns on their rows.

    Instead, you'd want to have a phonenumber table, where each row contains a phone number and a foreign key reference to which row in the user table it belongs to. No blank columns are needed, and each user can have as few or many phone numbers as necessary.

    0 讨论(0)
  • 2020-11-28 23:21

    I would say that normalization is like keeping notes to do things efficiently, so to speak:

    If you had a note that said you had to go shopping for ice cream without normalization, you would then have another note, saying you have to go shopping for ice cream, just one in each pocket.

    Now, In real life, you would never do this, so why do it in a database?

    For the designing and implementing part, thats when you can move back to "the lingo" and keep it away from layman terms, but I suppose you could simplify. You would say what you needed to at first, and then when normalization comes into it, you say you'll make sure of the following:

    1. There must be no repeating groups of information within a table
    2. No table should contain data that is not functionally dependent on that tables primary key
    3. For 3NF I like Bill Kent's take on it: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key.

    I think it may be more impressive if you speak of denormalization as well, and the fact that you cannot always have the best structure AND be in normal forms.

    0 讨论(0)
  • One side point to note about normalization: A fully normalized database is space efficient, but is not necessarily the most time efficient arrangement of data depending on use patterns.

    Skipping around to multiple tables to look up all the pieces of info from their denormalized locations takes time. In high load situations (millions of rows per second flying around, thousands of concurrent clients, like say credit card transaction processing) where time is more valuable than storage space, appropriately denormalized tables can give better response times than fully normalized tables.

    For more info on this, look for SQL books written by Ken Henderson.

    0 讨论(0)
  • 2020-11-28 23:25

    This is what I ask interviewees:

    Why don't we use a single table for an application instead of using multiple tables ?

    The answer is ofcourse normalization. As already said, its to avoid redundancy and there by update anomalies.

    0 讨论(0)
  • 2020-11-28 23:25

    Database normalization is a formal process of designing your database to eliminate redundant data. The design consists of:

    • planning what information the database will store
    • outlining what information users will request from it
    • documenting the assumptions for review

    Use a data-dictionary or some other metadata representation to verify the design.

    The biggest problem with normalization is that you end up with multiple tables representing what is conceptually a single item, such as a user profile. Don't worry about normalizing data in table that will have records inserted but not updated, such as history logs or financial transactions.

    References

    • When not to Normalize your SQL Database

    • Database Design Basics

    0 讨论(0)
提交回复
热议问题