Database Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE) PDF Download

Introduction

Database normalization is the process of organizing the attributes of the database to reduce or eliminate data redundancy (having the same data but at different places) .

Problems because of data redundancy
Data redundancy unnecessarily increases the size of the database as the same data is repeated in many places. Inconsistency problems also arise during insert, delete and update operations.

Functional Dependency
Functional Dependency is a constraint between two sets of attributes in a relation from a database. A functional dependency is denoted by arrow (→). If an attributed A functionally determines B, then it is written as A → B.
For example, employee_id → name means employee_id functionally determines the name of the employee. As another example in a time table database, {student_id, time} → {lecture_room}, student ID and time determine the lecture room where the student should be.

What does functionally dependent mean?
A function dependency A → B means for all instances of a particular value of A, there is the same value of B.
For example in the below table A → B is true, but B → A is not true as there are different values of A for B = 3.

Database Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE)

Trivial Functional Dependency
X → Y is trivial only when Y is subset of X.
Examples
ABC → AB
ABC → A
ABC → ABC

Non Trivial Functional Dependencies
X → Y is a non trivial functional dependencies when Y is not a subset of X.
X → Y is called completely non-trivial when X intersect Y is NULL.
Examples:
Id → Name,
Name → DOB

Semi Non Trivial Functional Dependencies
X → Y is called semi non-trivial when X intersect Y is not NULL.
Examples:
AB → BC,
AD → DC

Normal Forms in DBMS

Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation may cause insertion, deletion and updation anomalies. So, it helps to minimize the redundancy in relations. Normal forms are used to eliminate or reduce redundancy in database tables.

1. First Normal Form
If a relation contain composite or multi-valued attribute, it violates first normal form or a relation is in first normal form if it does not contain any composite or multi-valued attribute.
A relation is in first normal form if every attribute in that relation is singled valued attribute.

  • Example 1: Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2.
    Database Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE)
  • Example 2:Database Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE)In the above table Course is a multi valued attribute so it is not in 1NF.
    Below Table is in 1NF as there is no multi valued attributeDatabase Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE)

2. Second Normal Form
To be in second normal form, a relation must be in first normal form and relation must not contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate key) is dependent on any proper subset of any candidate key of the table.

Partial Dependency: If the proper subset of candidate key determines non-prime attribute, it is called partial dependency.

  • Example 1: Consider table-3 as following below.Database Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE){Note that, there are many courses having the same course fee. }
    Here,
    COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
    COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
    COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
    Hence,
    COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key {STUD_NO, COURSE_NO} ;
    But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial dependency and so this relation is not in 2NF.
    To convert the above relation to 2NF,
    we need to split the table into two tables such as:
    Table 1: STUD_NO, COURSE_NO
    Table 2: COURSE_NO, COURSE_FEEDatabase Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE)2 C5
    Note: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are 100 students taking C1 course, we dont need to store its Fee as 1000 for all the 100 records, instead once we can store it in the second table as the course fee for C1 is 1000.
  • Example 2: Consider following functional dependencies in relation  R (A,  B, C,  D )
    AB → C  [A and B together determine C]
    BC → D  [B and C together determine D]
    In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper subset of AB doesn’t determine any non-prime attribute.

3. Third Normal Form
A relation is in third normal form, if there is no transitive dependency for non-prime attributes as well as it is in second normal form.
A relation is in 3NF if at least one of the following condition holds in every non-trivial function dependency X → Y

  • X is a super key.
  • Y is a prime attribute (each element of Y is part of some candidate key).

Database Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE)

Transitive dependency: If A → B and B → C are two FDs then A → C is called transitive dependency.

  • Example 1: In relation STUDENT given in Table 4,
    FD set: {STUD_NO → STUD_NAME, STUD_NO → STUD_STATE, STUD_STATE → STUD_COUNTRY, STUD_NO → STUD_AGE}
    Candidate Key: {STUD_NO}
    For this relation in table 4, STUD_NO → STUD_STATE and STUD_STATE → STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal form. To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
    STUD_COUNTRY_STUD_AGE) as:
    STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)
    STATE_COUNTRY (STATE, COUNTRY)
  • Example 2: Consider relation R(A, B, C, D, E)
    A → BC,
    CD → E,
    B → D,
    E → A
    All possible candidate keys in above relation are {A, E, CD, BC} All attribute are on right sides of all functional dependencies are prime.

4. Boyce-Codd Normal Form (BCNF)
A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is super key. A relation is in BCNF iff in every non-trivial functional dependency X → Y, X is a super key.

  • Example 1: Find the highest normal form of a relation R(A, B, C, D, E) with FD set as {BC → D, AC → BE, B → E}
    Step 1. As we can see, (AC)+ ={A, C, B, E, D} but none of its subset can determine all attribute of relation, So AC will be candidate key. A or C can’t be derived from any other attribute of the relation, so there will be only 1 candidate key {AC}.
    Step 2. Prime attributes are those attribute which are part of candidate key {A, C} in this example and others will be non-prime {B, D, E} in this example.
    Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or composite attribute.
    The relation is in 2nd normal form because BC → D is in 2nd normal form (BC is not a proper subset of candidate key AC) and AC → BE is in 2nd normal form (AC is candidate key) and B → E is in 2nd normal form (B is not a proper subset of candidate key AC).
    The relation is not in 3rd normal form because in BC → D (neither BC is a super key nor D is a prime attribute) and in B → E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be prime attribute.
    So the highest normal form of relation will be 2nd Normal form.
  • Example 2: For example consider relation R(A, B, C)
    A → BC,
    B →
    A and B both are super keys so above relation is in BCNF.

Key Points

  1. BCNF is free from redundancy.
  2. If a relation is in BCNF, then 3NF is also also satisfied.
  3. If all attributes of relation are prime attribute, then the relation is always in 3NF.
  4. A relation in a Relational Database is always and at least in 1NF form.
  5. Every Binary Relation (a Relation with only 2 attributes) is always in BCNF.
  6. If a Relation has only singleton candidate keys( i.e. every candidate key consists of only 1 attribute), then the Relation is always in 2NF( because no Partial functional dependency possible).
  7. Sometimes going for BCNF form may not preserve functional dependency. In that case go for BCNF only if the lost FD(s) is not required, else normalize till 3NF only.
  8. There are many more Normal forms that exist after BCNF, like 4NF and more. But in real world database systems it’s generally not required to go beyond BCNF.

Exercise 1: Find the highest normal form in R (A, B, C, D, E) under following functional dependencies.

  ABC --> D

  CD --> AE 

Important Points for solving above type of question.

  1. It is always a good idea to start checking from BCNF, then 3 NF and so on.
  2. If any functional dependency satisfied a normal form then there is no need to check for lower normal form. For example, ABC → D is in BCNF (Note that ABC is a superkey), so no need to check this dependency for lower normal forms.

Candidate keys in the given relation are {ABC, BCD}
BCNF: ABC → D is in BCNF. Let us check CD → AE, CD is not a super key so this dependency is not in BCNF. So, R is not in BCNF.

3NF: ABC → D we don’t need to check for this dependency as it already satisfied BCNF. Let us consider CD → AE. Since E is not a prime attribute, so the relation is not in 3NF.

2NF: In 2NF, we need to check for partial dependency. CD which is a proper subset of a candidate key and it determine E, which is non-prime attribute. So, given relation is also not in 2 NF. So, the highest normal form is 1 NF.

The document Database Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE) is a part of the Computer Science Engineering (CSE) Course Database Management System (DBMS).
All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)
62 videos|66 docs|35 tests

Top Courses for Computer Science Engineering (CSE)

FAQs on Database Normalization - Database Management System (DBMS) - Computer Science Engineering (CSE)

1. What is database normalization?
Ans. Database normalization is a process of organizing the data in a database to eliminate redundancy and improve data integrity. It involves breaking down a larger table into smaller tables and establishing relationships between them.
2. Why is database normalization important?
Ans. Database normalization is important because it helps in eliminating data redundancy and inconsistency. It ensures that each piece of data is stored only once and reduces the chances of data anomalies. It also improves the overall efficiency and performance of the database.
3. What are the main normal forms in database normalization?
Ans. The main normal forms in database normalization are: - First Normal Form (1NF): It eliminates duplicate data and ensures atomicity by organizing data into separate tables. - Second Normal Form (2NF): It eliminates partial dependencies by creating separate tables for subsets of data that depend on the primary key. - Third Normal Form (3NF): It eliminates transitive dependencies by creating separate tables for non-key attributes.
4. How does database normalization improve data integrity?
Ans. Database normalization improves data integrity by reducing data redundancy and inconsistency. When data is normalized, each piece of data is stored only once, reducing the chance of inconsistencies or contradictions. It also ensures that updates or modifications to the data are made in a controlled and consistent manner, maintaining the overall integrity of the database.
5. What are the potential drawbacks of excessive normalization?
Ans. Excessive normalization can lead to increased complexity and overhead in managing the database. It may require more complex queries and joins to retrieve information from multiple tables. Additionally, excessive normalization can sometimes result in poor performance, especially when dealing with large amounts of data. It is important to strike a balance between normalization and denormalization based on the specific requirements of the database and its usage patterns.
62 videos|66 docs|35 tests
Download as PDF
Explore Courses for Computer Science Engineering (CSE) exam

Top Courses for Computer Science Engineering (CSE)

Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

Objective type Questions

,

pdf

,

Extra Questions

,

MCQs

,

practice quizzes

,

ppt

,

Summary

,

Sample Paper

,

shortcuts and tricks

,

Free

,

mock tests for examination

,

Semester Notes

,

Important questions

,

Database Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE)

,

Viva Questions

,

video lectures

,

Database Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE)

,

Database Normalization | Database Management System (DBMS) - Computer Science Engineering (CSE)

,

Previous Year Questions with Solutions

,

Exam

,

past year papers

,

study material

;