03

Normalization

Chapter 3 • Advanced

70 min

Normalization

Normalization is the process of organizing data to reduce redundancy and eliminate anomalies.

Database Anomalies

1. Insertion Anomaly

Cannot insert data without other related data.

Example: Cannot insert student without course enrollment.

2. Deletion Anomaly

Deleting data causes loss of other related data.

Example: Deleting last enrollment deletes course information.

3. Update Anomaly

Updating data requires updating multiple rows.

Example: Updating course name requires updating all enrollment rows.

Functional Dependencies

Functional Dependency (FD): X → Y means Y is functionally dependent on X.

Meaning: For each value of X, there is exactly one value of Y.

Example: SID → Name (Student ID determines Name)

Types of Functional Dependencies

Trivial FD: X → Y where Y ⊆ X

Non-trivial FD: X → Y where Y ⊄ X

Full FD: X → Y where no proper subset of X determines Y

Partial FD: X → Y where proper subset of X determines Y

Transitive FD: X → Y and Y → Z, then X → Z (transitively)

Closure of Functional Dependencies

Closure (F+): Set of all FDs that can be inferred from given FDs.

Armstrong's Axioms:

  1. Reflexivity: If Y ⊆ X, then X → Y
  2. Augmentation: If X → Y, then XZ → YZ
  3. Transitivity: If X → Y and Y → Z, then X → Z

Normal Forms

First Normal Form (1NF)

Rule: All attributes must be atomic (single-valued).

Example:

Not 1NF:

code
Student(SID, Name, Courses)
Courses: "CS101, CS102"  // Not atomic

1NF:

code
Student(SID, Name)
Enrollment(SID, Course)

Second Normal Form (2NF)

Rule:

  • Must be in 1NF
  • No partial dependency (non-key attribute fully dependent on primary key)

Example:

Not 2NF:

code
Enrollment(SID, CID, Name, Grade)
Primary Key: (SID, CID)
FD: SID → Name  // Partial dependency

2NF:

code
Student(SID, Name)
Enrollment(SID, CID, Grade)

Third Normal Form (3NF)

Rule:

  • Must be in 2NF
  • No transitive dependency (non-key attribute depends on another non-key attribute)

Example:

Not 3NF:

code
Student(SID, Name, Dept, DeptHead)
FD: SID → Dept, Dept → DeptHead  // Transitive

3NF:

code
Student(SID, Name, Dept)
Department(Dept, DeptHead)

Boyce-Codd Normal Form (BCNF)

Rule:

  • Must be in 3NF
  • Every determinant is a candidate key

Example:

Not BCNF:

code
Enrollment(SID, CID, Instructor)
FD: (SID, CID) → Instructor, CID → Instructor
Candidate Key: (SID, CID)
Determinant: CID (not candidate key)

BCNF:

code
Enrollment(SID, CID)
Course(CID, Instructor)

Decomposition

Decomposition splits a relation into multiple relations.

Properties:

  • Lossless Join: No information loss
  • Dependency Preserving: All FDs preserved

Lossless Join Test:

  • R1 ∩ R2 → R1 or R1 ∩ R2 → R2

GATE CS Important Points

  1. Functional Dependencies: Understand and identify FDs
  2. Normal Forms: Know 1NF, 2NF, 3NF, BCNF rules
  3. Anomalies: Understand insertion, deletion, update anomalies
  4. Decomposition: Practice lossless join decomposition
  5. Closure: Calculate closure of attribute sets

Practice Tips

  1. Identify FDs: Practice finding functional dependencies
  2. Check Normal Forms: Determine which normal form a relation is in
  3. Normalize: Convert relations to higher normal forms
  4. Decompose: Practice lossless join decomposition
  5. Previous Year Questions: Solve GATE normalization questions