06
Indexing and File Organization
Chapter 6 • Intermediate
50 min
Indexing and File Organization
Indexing improves query performance by providing fast data access paths.
Index
Index is a data structure that provides fast lookup.
Purpose:
- Speed up search operations
- Reduce disk I/O
- Improve query performance
Trade-off:
- Extra storage space
- Maintenance overhead on updates
Types of Indexes
1. Primary Index
Index on primary key:
- Ordered file
- One index entry per block
- Dense or sparse
Dense Index:
- One entry per record
- Faster search
- More space
Sparse Index:
- One entry per block
- Less space
- Slower search
2. Secondary Index
Index on non-key attribute:
- Can have duplicates
- Always dense
3. Clustering Index
Index on ordering field:
- Records physically ordered
- One entry per distinct value
B-Tree Index
B-Tree is a balanced tree structure for indexing.
B-Tree Properties
- All leaves at same level
- Internal nodes have keys and pointers
- Minimum keys: ⌈m/2⌉ - 1 (except root)
- Maximum keys: m - 1
- Minimum children: ⌈m/2⌉ (except root)
- Maximum children: m
Where m = order of B-tree
B-Tree Operations
Search
Algorithm:
- Start at root
- Compare key with node keys
- Follow appropriate pointer
- Repeat until leaf
- Search in leaf node
Time Complexity: O(log_m n) where m is order, n is records
Insert
Algorithm:
- Search for position
- Insert in leaf
- If leaf full, split
- Propagate split upward if needed
Delete
Algorithm:
- Search for key
- Delete from leaf
- If underflow, merge or redistribute
- Propagate upward if needed
B+ Tree
B+ Tree is variation of B-tree.
Differences:
- All keys in leaves
- Internal nodes have copies of keys
- Leaves linked for range queries
Advantages:
- Better for range queries
- All data in leaves
Hash Index
Hash Index uses hash function to map keys to buckets.
Static Hashing
Fixed number of buckets:
- Hash function: h(k) = k mod m
- Collision handling: Chaining or open addressing
Problems:
- Fixed size
- Performance degrades with load
Dynamic Hashing
Extendable Hashing:
- Directory doubles when needed
- Multiple buckets can share same hash value
Linear Hashing:
- Grows incrementally
- No directory needed
File Organization
1. Heap File
Unordered file:
- Records in insertion order
- Insert: O(1)
- Search: O(n)
2. Sorted File
Ordered by key:
- Binary search: O(log n)
- Insert: O(n) - need to maintain order
3. Hash File
Hash-based organization:
- Direct access: O(1) average
- Good for equality search
- Poor for range queries
4. Indexed File
File with index:
- Fast search via index
- Maintains index overhead
GATE CS Important Points
- B-Tree Properties: Know order, min/max keys, height
- B-Tree Operations: Search, insert, delete algorithms
- B+ Tree: Differences from B-tree, advantages
- Hash Index: Static vs dynamic hashing
- File Organization: Heap, sorted, hash, indexed files
Practice Tips
- B-Tree Operations: Practice insert, delete, search
- Calculate Height: Find B-tree height for given records
- Index Selection: Choose appropriate index type
- Performance Analysis: Compare indexing methods
- Previous Year Questions: Solve GATE indexing questions