Normalization
Chapter 3 • Advanced
Normalization
Normalization is the process of organizing data to reduce redundancy and eliminate anomalies.
Database Anomalies
1. Insertion Anomaly
Cannot insert data without other related data.
Example: Cannot insert student without course enrollment.
2. Deletion Anomaly
Deleting data causes loss of other related data.
Example: Deleting last enrollment deletes course information.
3. Update Anomaly
Updating data requires updating multiple rows.
Example: Updating course name requires updating all enrollment rows.
Functional Dependencies
Functional Dependency (FD): X → Y means Y is functionally dependent on X.
Meaning: For each value of X, there is exactly one value of Y.
Example: SID → Name (Student ID determines Name)
Types of Functional Dependencies
Trivial FD: X → Y where Y ⊆ X
Non-trivial FD: X → Y where Y ⊄ X
Full FD: X → Y where no proper subset of X determines Y
Partial FD: X → Y where proper subset of X determines Y
Transitive FD: X → Y and Y → Z, then X → Z (transitively)
Closure of Functional Dependencies
Closure (F+): Set of all FDs that can be inferred from given FDs.
Armstrong's Axioms:
- Reflexivity: If Y ⊆ X, then X → Y
- Augmentation: If X → Y, then XZ → YZ
- Transitivity: If X → Y and Y → Z, then X → Z
Normal Forms
First Normal Form (1NF)
Rule: All attributes must be atomic (single-valued).
Example:
Not 1NF:
Student(SID, Name, Courses)
Courses: "CS101, CS102" // Not atomic
1NF:
Student(SID, Name)
Enrollment(SID, Course)
Second Normal Form (2NF)
Rule:
- Must be in 1NF
- No partial dependency (non-key attribute fully dependent on primary key)
Example:
Not 2NF:
Enrollment(SID, CID, Name, Grade)
Primary Key: (SID, CID)
FD: SID → Name // Partial dependency
2NF:
Student(SID, Name)
Enrollment(SID, CID, Grade)
Third Normal Form (3NF)
Rule:
- Must be in 2NF
- No transitive dependency (non-key attribute depends on another non-key attribute)
Example:
Not 3NF:
Student(SID, Name, Dept, DeptHead)
FD: SID → Dept, Dept → DeptHead // Transitive
3NF:
Student(SID, Name, Dept)
Department(Dept, DeptHead)
Boyce-Codd Normal Form (BCNF)
Rule:
- Must be in 3NF
- Every determinant is a candidate key
Example:
Not BCNF:
Enrollment(SID, CID, Instructor)
FD: (SID, CID) → Instructor, CID → Instructor
Candidate Key: (SID, CID)
Determinant: CID (not candidate key)
BCNF:
Enrollment(SID, CID)
Course(CID, Instructor)
Decomposition
Decomposition splits a relation into multiple relations.
Properties:
- Lossless Join: No information loss
- Dependency Preserving: All FDs preserved
Lossless Join Test:
- R1 ∩ R2 → R1 or R1 ∩ R2 → R2
GATE CS Important Points
- Functional Dependencies: Understand and identify FDs
- Normal Forms: Know 1NF, 2NF, 3NF, BCNF rules
- Anomalies: Understand insertion, deletion, update anomalies
- Decomposition: Practice lossless join decomposition
- Closure: Calculate closure of attribute sets
Practice Tips
- Identify FDs: Practice finding functional dependencies
- Check Normal Forms: Determine which normal form a relation is in
- Normalize: Convert relations to higher normal forms
- Decompose: Practice lossless join decomposition
- Previous Year Questions: Solve GATE normalization questions