Indexing and File Organization

Indexing improves query performance by providing fast data access paths.

Index

Index is a data structure that provides fast lookup.

Purpose:

Speed up search operations
Reduce disk I/O
Improve query performance

Trade-off:

Extra storage space
Maintenance overhead on updates

Types of Indexes

1. Primary Index

Index on primary key:

Ordered file
One index entry per block
Dense or sparse

Dense Index:

One entry per record
Faster search
More space

Sparse Index:

One entry per block
Less space
Slower search

2. Secondary Index

Index on non-key attribute:

Can have duplicates
Always dense

3. Clustering Index

Index on ordering field:

Records physically ordered
One entry per distinct value

B-Tree Index

B-Tree is a balanced tree structure for indexing.

B-Tree Properties

All leaves at same level
Internal nodes have keys and pointers
Minimum keys: ⌈m/2⌉ - 1 (except root)
Maximum keys: m - 1
Minimum children: ⌈m/2⌉ (except root)
Maximum children: m

Where m = order of B-tree

B-Tree Operations

Search

Algorithm:

Start at root
Compare key with node keys
Follow appropriate pointer
Repeat until leaf
Search in leaf node

Time Complexity: O(log_m n) where m is order, n is records

Insert

Algorithm:

Search for position
Insert in leaf
If leaf full, split
Propagate split upward if needed

Delete

Algorithm:

Search for key
Delete from leaf
If underflow, merge or redistribute
Propagate upward if needed

B+ Tree

B+ Tree is variation of B-tree.

Differences:

All keys in leaves
Internal nodes have copies of keys
Leaves linked for range queries

Advantages:

Better for range queries
All data in leaves

Hash Index

Hash Index uses hash function to map keys to buckets.

Static Hashing

Fixed number of buckets:

Hash function: h(k) = k mod m
Collision handling: Chaining or open addressing

Problems:

Fixed size
Performance degrades with load

Dynamic Hashing

Extendable Hashing:

Directory doubles when needed
Multiple buckets can share same hash value

Linear Hashing:

Grows incrementally
No directory needed

File Organization

1. Heap File

Unordered file:

Records in insertion order
Insert: O(1)
Search: O(n)

2. Sorted File

Ordered by key:

Binary search: O(log n)
Insert: O(n) - need to maintain order

3. Hash File

Hash-based organization:

Direct access: O(1) average
Good for equality search
Poor for range queries

4. Indexed File

File with index:

Fast search via index
Maintains index overhead

GATE CS Important Points

B-Tree Properties: Know order, min/max keys, height
B-Tree Operations: Search, insert, delete algorithms
B+ Tree: Differences from B-tree, advantages
Hash Index: Static vs dynamic hashing
File Organization: Heap, sorted, hash, indexed files

Practice Tips

B-Tree Operations: Practice insert, delete, search
Calculate Height: Find B-tree height for given records
Index Selection: Choose appropriate index type
Performance Analysis: Compare indexing methods
Previous Year Questions: Solve GATE indexing questions

Indexing and File Organization