06

Indexing and File Organization

Chapter 6 • Intermediate

50 min

Indexing and File Organization

Indexing improves query performance by providing fast data access paths.

Index

Index is a data structure that provides fast lookup.

Purpose:

  • Speed up search operations
  • Reduce disk I/O
  • Improve query performance

Trade-off:

  • Extra storage space
  • Maintenance overhead on updates

Types of Indexes

1. Primary Index

Index on primary key:

  • Ordered file
  • One index entry per block
  • Dense or sparse

Dense Index:

  • One entry per record
  • Faster search
  • More space

Sparse Index:

  • One entry per block
  • Less space
  • Slower search

2. Secondary Index

Index on non-key attribute:

  • Can have duplicates
  • Always dense

3. Clustering Index

Index on ordering field:

  • Records physically ordered
  • One entry per distinct value

B-Tree Index

B-Tree is a balanced tree structure for indexing.

B-Tree Properties

  • All leaves at same level
  • Internal nodes have keys and pointers
  • Minimum keys: ⌈m/2⌉ - 1 (except root)
  • Maximum keys: m - 1
  • Minimum children: ⌈m/2⌉ (except root)
  • Maximum children: m

Where m = order of B-tree

B-Tree Operations

Search

Algorithm:

  1. Start at root
  2. Compare key with node keys
  3. Follow appropriate pointer
  4. Repeat until leaf
  5. Search in leaf node

Time Complexity: O(log_m n) where m is order, n is records

Insert

Algorithm:

  1. Search for position
  2. Insert in leaf
  3. If leaf full, split
  4. Propagate split upward if needed

Delete

Algorithm:

  1. Search for key
  2. Delete from leaf
  3. If underflow, merge or redistribute
  4. Propagate upward if needed

B+ Tree

B+ Tree is variation of B-tree.

Differences:

  • All keys in leaves
  • Internal nodes have copies of keys
  • Leaves linked for range queries

Advantages:

  • Better for range queries
  • All data in leaves

Hash Index

Hash Index uses hash function to map keys to buckets.

Static Hashing

Fixed number of buckets:

  • Hash function: h(k) = k mod m
  • Collision handling: Chaining or open addressing

Problems:

  • Fixed size
  • Performance degrades with load

Dynamic Hashing

Extendable Hashing:

  • Directory doubles when needed
  • Multiple buckets can share same hash value

Linear Hashing:

  • Grows incrementally
  • No directory needed

File Organization

1. Heap File

Unordered file:

  • Records in insertion order
  • Insert: O(1)
  • Search: O(n)

2. Sorted File

Ordered by key:

  • Binary search: O(log n)
  • Insert: O(n) - need to maintain order

3. Hash File

Hash-based organization:

  • Direct access: O(1) average
  • Good for equality search
  • Poor for range queries

4. Indexed File

File with index:

  • Fast search via index
  • Maintains index overhead

GATE CS Important Points

  1. B-Tree Properties: Know order, min/max keys, height
  2. B-Tree Operations: Search, insert, delete algorithms
  3. B+ Tree: Differences from B-tree, advantages
  4. Hash Index: Static vs dynamic hashing
  5. File Organization: Heap, sorted, hash, indexed files

Practice Tips

  1. B-Tree Operations: Practice insert, delete, search
  2. Calculate Height: Find B-tree height for given records
  3. Index Selection: Choose appropriate index type
  4. Performance Analysis: Compare indexing methods
  5. Previous Year Questions: Solve GATE indexing questions