I understand the concept of database normalization, but always have a hard time explaining it in plain English - especially for a job interview. I have read the wikipedia p
Well, if I had to explain it to my wife it would have been something like that:
The main idea is to avoid duplication of large data.
Let's take a look at a list of people and the country they came from. Instead of holding the name of the country which can be as long as "Bosnia & Herzegovina" for every person, we simply hold a number that references a table of countries. So instead of holding 100 "Bosnia & Herzegovina"s, we hold 100 #45. Now in the future, as often happens with Balkan countries, they split to two countries: Bosnia and Herzegovina, I will have to change it only in one place. well, sort of.
Now, to explain 2NF, I would have changed the example, and let's assume that we hold the list of countries every person visited. Instead of holding a table like:
Person CountryVisited AnotherInformation D.O.B.
Faruz USA Blah Blah 1/1/2000
Faruz Canada Blah Blah 1/1/2000
I would have created three tables, one table with the list of countries, one table with the list of persons and another table to connect them both. That gives me the most freedom I can get changing person's information or country information. This enables me to "remove duplicate rows" as normalization expects.
This is not a thorough explanation, but one goal of normalization is to allow for growth without awkwardness.
For example, if you've got a user
table, and every user is going to have one and only one phone number, it's fine to have a phonenumber
column in that table.
However, if each user is going to have a variable number of phone numbers, it would be awkward to have columns like phonenumber1
, phonenumber2
, etc. This is for two reasons:
phonenumber3
and someone needs to add a fourth number, you have to add a column to the table.Instead, you'd want to have a phonenumber
table, where each row contains a phone number and a foreign key reference to which row in the user
table it belongs to. No blank columns are needed, and each user can have as few or many phone numbers as necessary.
I would say that normalization is like keeping notes to do things efficiently, so to speak:
If you had a note that said you had to go shopping for ice cream without normalization, you would then have another note, saying you have to go shopping for ice cream, just one in each pocket.
Now, In real life, you would never do this, so why do it in a database?
For the designing and implementing part, thats when you can move back to "the lingo" and keep it away from layman terms, but I suppose you could simplify. You would say what you needed to at first, and then when normalization comes into it, you say you'll make sure of the following:
I think it may be more impressive if you speak of denormalization as well, and the fact that you cannot always have the best structure AND be in normal forms.
One side point to note about normalization: A fully normalized database is space efficient, but is not necessarily the most time efficient arrangement of data depending on use patterns.
Skipping around to multiple tables to look up all the pieces of info from their denormalized locations takes time. In high load situations (millions of rows per second flying around, thousands of concurrent clients, like say credit card transaction processing) where time is more valuable than storage space, appropriately denormalized tables can give better response times than fully normalized tables.
For more info on this, look for SQL books written by Ken Henderson.
This is what I ask interviewees:
Why don't we use a single table for an application instead of using multiple tables ?
The answer is ofcourse normalization. As already said, its to avoid redundancy and there by update anomalies.
Database normalization is a formal process of designing your database to eliminate redundant data. The design consists of:
Use a data-dictionary or some other metadata representation to verify the design.
The biggest problem with normalization is that you end up with multiple tables representing what is conceptually a single item, such as a user profile. Don't worry about normalizing data in table that will have records inserted but not updated, such as history logs or financial transactions.
References
When not to Normalize your SQL Database
Database Design Basics