GATE CS - DATABASES (DBMS):Normalization

Mastering normalization concepts and implementation.

Normalization

Normalization is the process of organizing data to reduce redundancy and eliminate anomalies.

Database Anomalies

1. Insertion Anomaly

Cannot insert data without other related data.

Example: Cannot insert student without course enrollment.

2. Deletion Anomaly

Deleting data causes loss of other related data.

Example: Deleting last enrollment deletes course information.

3. Update Anomaly

Updating data requires updating multiple rows.

Example: Updating course name requires updating all enrollment rows.

Functional Dependencies

Functional Dependency (FD): X → Y means Y is functionally dependent on X.

Meaning: For each value of X, there is exactly one value of Y.

Example: SID → Name (Student ID determines Name)

Types of Functional Dependencies

Trivial FD: X → Y where Y ⊆ X

Non-trivial FD: X → Y where Y ⊄ X

Full FD: X → Y where no proper subset of X determines Y

Partial FD: X → Y where proper subset of X determines Y

Transitive FD: X → Y and Y → Z, then X → Z (transitively)

Closure of Functional Dependencies

Closure (F+): Set of all FDs that can be inferred from given FDs.

Armstrong's Axioms:

  1. Reflexivity: If Y ⊆ X, then X → Y
  2. Augmentation: If X → Y, then XZ → YZ
  3. Transitivity: If X → Y and Y → Z, then X → Z

Normal Forms

First Normal Form (1NF)

Rule: All attributes must be atomic (single-valued).

Example:

Not 1NF:

Student(SID, Name, Courses)
Courses: "CS101, CS102"  // Not atomic

1NF:

Student(SID, Name)
Enrollment(SID, Course)

Second Normal Form (2NF)

Rule:

  • Must be in 1NF
  • No partial dependency (non-key attribute fully dependent on primary key)

Example:

Not 2NF:

Enrollment(SID, CID, Name, Grade)
Primary Key: (SID, CID)
FD: SID → Name  // Partial dependency

2NF:

Student(SID, Name)
Enrollment(SID, CID, Grade)

Third Normal Form (3NF)

Rule:

  • Must be in 2NF
  • No transitive dependency (non-key attribute depends on another non-key attribute)

Example:

Not 3NF:

Student(SID, Name, Dept, DeptHead)
FD: SID → Dept, Dept → DeptHead  // Transitive

3NF:

Student(SID, Name, Dept)
Department(Dept, DeptHead)

Boyce-Codd Normal Form (BCNF)

Rule:

  • Must be in 3NF
  • Every determinant is a candidate key

Example:

Not BCNF:

Enrollment(SID, CID, Instructor)
FD: (SID, CID) → Instructor, CID → Instructor
Candidate Key: (SID, CID)
Determinant: CID (not candidate key)

BCNF:

Enrollment(SID, CID)
Course(CID, Instructor)

Decomposition

Decomposition splits a relation into multiple relations.

Properties:

  • Lossless Join: No information loss
  • Dependency Preserving: All FDs preserved

Lossless Join Test:

  • R1 ∩ R2 → R1 or R1 ∩ R2 → R2

GATE CS Important Points

  1. Functional Dependencies: Understand and identify FDs
  2. Normal Forms: Know 1NF, 2NF, 3NF, BCNF rules
  3. Anomalies: Understand insertion, deletion, update anomalies
  4. Decomposition: Practice lossless join decomposition
  5. Closure: Calculate closure of attribute sets

Practice Tips

  1. Identify FDs: Practice finding functional dependencies
  2. Check Normal Forms: Determine which normal form a relation is in
  3. Normalize: Convert relations to higher normal forms
  4. Decompose: Practice lossless join decomposition
  5. Previous Year Questions: Solve GATE normalization questions