Friday, 4 April 2014

Database Normalization

                     Normalization can be defined as the process of organization the data to reduce the redundant table data to the minimum. This process is carried out by dividing the database into two or more than two tables and defining relationship between them so that deletion, updation and insertion can be made to just one table and it can be propagated to other tables through defined relationships.
              Normalization is part of successful database design; without normalization, database systems can be inaccurate, slow, and inefficient, and they might not produce the data you expect.

Normalization can be done for the following reason: 
  1. To simplify the database structure so that it is easy to maintain.
  2. To retrieve the data quickly from the database.
  3. Minimizing the amount of duplicated data stored in a database.
  4. Building a database in which you can access and manipulate the data quickly and efficiently without compromising the integrity of the data storage
Normal Forms: - The normal form can be refers to the highest normal form condition that it meets and hence indicates the degree to which it has been modified. The normal forms are:-
    1. First Normal Form (INF)
    2. Second Normal Form (2NF)
    3. Third Normal Form (3NF)
    4. Boyce- Codd Normal Form
    5. Forth Normal Form (4NF)   
    6. Fifth Normal Form(5NF)

First Normal Form (INF):- A table is said to be in a First Normal Form (1NF) if it satisfy the below three conditions:
1)  If the columns of the table only contain atomic values (Single, indivisible).
2)  Primary key is defined for the table
3)  All the columns of the table are defined on the primary key.
Example:  suppose we have a table StudentDetails
StudentID               StudentName      StudentRegDate         expertisedomain
1                           Mahesh                 4/4/2014                      Maths   
2                             Suresh                  12/4/2014                    Maths/chemistry

                     Now, in the above table the column InterestedSubjects contains a set of values for the StudentID 2. Therefore this table is not in the 1NF form. To make this table in the 1NF, we need to break this table into two tables, one contains the student information StuInfo (StuId, StuName, and StuRegdate) and the other table contains the student expertisedomain information StuexpertDomain (StuId, ExpertiseDomain). In both the tables, StuId will be the primary key.
TableName: StuInfo
PrimaryKey: StuId                 
StuId                StuName                    StuRegdate
1                      Mahesh                      4/4/2014                       
2                         Suresh                     12/4/2014       

TableName: expertisedomain
PrimaryKey: StuId   
StuId                  expertisedomain
1                         Maths
2                         Maths
2                         Chemistry

                                          Now these tables is said to be in the 1NF since all the columns of these tables contains the atomic values and all the values of the columns are dependent on the primary keys.

Second Normal Form (2NF):- A table is said to be in its Second Normal Form if it satisfied the following conditions:
1) It satisfies the condition for the First Normal Form (1NF),
2) It do not includes partial dependencies where a column is dependent only a part of a primary key.
For example suppose we have a table StuProjDetail, which contains the employee details and its project details like projectid, project name and durations in terms of days on which he/she is allocated to the project.
Table Name:-StuProjDetail
Primary Key :- StuId + projected

StuID       ProjectID      StuName       Days
1                1                   Mahesh         20
2                2                   Suresh          30

           In the above  table, the primary key is composition of two columns StuId and ProjectId. Now this table is in 1NF but it is not in the 2NF since the column StuName can be depended on the column StuId and the column ProjectName can be depended on the column ProjectId which violates the second condition for the 2NF.
We can break this table into three different tables to convert it into the 2NF. These tables are given below

Table name:- EmpDetails
Primary Key: - EmpId


  StuID     StuName
1                          Mahesh
2                          Suresh

Table name:- ProjDetails
Primary Key: - ProjectId

ProjectID       ProjectName
1                                          ABC
2                                          xyz


Table name:- StuProjdetails
Primary Key: - StuId + ProjectId

StuID      ProjectID     Days
   1            1                     20
   2            2                     30

Now all the three tables are in 2 NF and all the columns of these tables are fully depended on their respective primary keys.


Third Normal Form (3NF):- A table is said to be in the Third Normal form (3NF) if it satisfy the following conditions:
1) It should be in the 2NF
2) It should not contain any transitive dependency which means that any non key column of the table should not be dependent on another non key column.
For example, let consider a table StuProjDetails which contains the details about the employee and its project.

Table Name:-  StuProjDetails
Primary Key: -  StuId

StuId      StuName       StuRegDate       Projectid        ProjectName
1                  Mahesh               01/01/2006            1               Bankingexpress
2                  Suresh                06/06/2007            2               BankingReport


Now, the table StuProjDetails is in Second Normal Form (2NF), but it is not in the Third Normal Form because the non key column projectName is dependent on another non key column ProjectId. So to convert this table in the Third Normal Form, we need to decompose this table into the two tables StuInfo and ProjInfo whose structure is given below:-


Table Name:- StuInfo
Primary Key: - StuId

StuId       StuName          StuRegDate         Projectid
1                  Mahesh                   01/01/2006              1
2                  Suresh                    06/06/2007               2


Table Name:- ProjInfo
Primary Key: - ProjectId

ProjectId          ProjectName
  1                       abc
  2                       xyz

Now the above tables are in the Third Normal Form since they satisfy all the conditions for the Third Normal Form.

Denormalization:- Denormalization can be defined as the process of moving from higher normal form to a lower normal forms in order to speed up the database access.

No comments:

Post a Comment