What is normalization in MySQL and in which case and how we need to use it?
It's a technique for ensuring that your data remains consistent, by eliminating duplication. So a database in which the same information is stored in more than one table is not normalized.
See the Wikipedia article on Database normalization.
(It's a general technique for relational databases, not specific to MySQL.)
While creating a database schema for your application, you need to make sure that you avoid any information being stored in more than one column across different tables.
As every table in your DB, identifies a significant entity in your application, a unique identifier is a must-have columns for them.
Now, while deciding the storage schema, various kinds of relationships are being identified between these entities (tables), viz-a-viz, one-to-one, one-to-many, many-to-many.
Once you attend to all these scenarios, your db-schema will be normalized to 4NF.
In the field of relational database design, normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—insertion, update, and deletion anomalies—that could lead to a loss of data integrity.[1] E.F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know as the first normal form in 1970.[2] Codd went on to define the second and third normal forms in 1971,[3] and Codd and Raymond F. Boyce defined the Boyce-Codd normal form in 1974.[4] Higher normal forms were defined by other theorists in subsequent years, the most recent being the sixth normal form introduced by Chris Date, Hugh Darwen, and Nikos Lorentzos in 2002.[5]
Informally, a relational database table (the computerized representation of a relation) is often described as "normalized" if it is in the third normal form (3NF).[6] Most 3NF tables are free of insertion, update, and deletion anomalies, i.e. in most cases 3NF tables adhere to BCNF, 4NF, and 5NF (but typically not 6NF).
A standard piece of database design guidance is that the designer should create a fully normalized design; selective denormalization can subsequently be performed for performance reasons.[7] However, some modeling disciplines, such as the dimensional modeling approach to data warehouse design, explicitly recommend non-normalized designs, i.e. designs that in large part do not adhere to 3NF.[8]
Edit: Source: http://en.wikipedia.org/wiki/Database_normalization
I try to attempt to explain normalization in layman terms here. First off, it is something that applies to relational database (Oracle, Access, MySQL) so it is not only for MySQL.
Normalisation is about making sure each table has the only minimal fields and to get rid of dependencies. Imagine you have an employee record, and each employee belongs to a department. If you store the department as a field along with the other data of the employee, you have a problem - what happens if a department is removed? You have to update all the department fields, and there's opportunity for error. And what if some employees does not have a department (newly assigned, perhaps?). Now there will be null values.
So the normalisation, in brief, is to avoid having fields that would be null, and making sure that the all the fields in the table only belong to one domain of data being described. For example, in the employee table, the fields could be id, name, social security number, but those three fields have nothing to do with the department. Only employee id describes which department the employee belongs to. So this implies that which department an employee is in should be in another table.
Here's a simple normalization process.
EMPLOYEE ( < employee_id >, name, social_security, department_name)
This is not normalized, as explained. A normalized form could look like
EMPLOYEE ( < employee_id >, name, social_security)
Here, the Employee table is only responsible for one set of data. So where do we store which department the employee belongs to? In another table
EMPLOYEE_DEPARTMENT ( < employee_id >, department_name )
This is not optimal. What if the department name changes? (it happens in the US government all the time). Hence it is better to do this
EMPLOYEE_DEPARTMENT ( < employee_id >, department_id )
DEPARTMENT ( < department_id >, department_name )
There are first normal form, second normal form and third normal form. But unless you are studying a DB course, I usually just go for the most normalized form I could understand.
Hope this helps.
Normalization is not for MYSql only. Its a general database concept.
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
Normal forms in SQL are given below.
First Normal form (1NF): A relation is said to be in 1NF if it has only single valued attributes, neither repeating nor arrays are permitted.
Second Normal Form (2NF): A relation is said to be in 2NF if it is in 1NF and every non key attribute is fully functional dependent on the primary key.
Third Normal Form (3NF): We say that a relation is in 3NF if it is in 2NF and has no transitive dependencies.
Boyce-Codd Normal Form (BCNF): A relation is said to be in BCNF if and only if every determinant in the relation is a candidate key.
Fourth Normal Form (4NF): A relation is said to be in 4NF if it is in BCNF and contains no multivalued dependency.
Fifth Normal Form (5NF): A relation is said to be in 5NF if and only if every join dependency in relation is implied by the candidate keys of relation.
Domain-Key Normal Form (DKNF): We say that a relation is in DKNF if it is free of all modification anomalies. Insertion, Deletion, and update anomalies come under modification anomalies
Seel also
Database Normalization Basics