Hashing
Chapter 6 • Intermediate
Hashing
Hashing is a technique for fast data retrieval. Understanding hash functions and collision resolution is important for GATE CS.
Hash Table
Hash Table is a data structure that maps keys to values using a hash function.
Hash Function
Hash Function maps a key to an index in the hash table.
Properties of Good Hash Function:
- Uniform Distribution: Keys distributed evenly
- Deterministic: Same key always maps to same index
- Fast Computation: O(1) computation time
- Minimize Collisions: Reduce key collisions
Common Hash Functions
1. Division Method
Formula: h(k) = k mod m
Where:
- k: key
- m: table size (preferably prime)
Example:
- Key = 42, m = 11
- h(42) = 42 mod 11 = 9
2. Multiplication Method
Formula: h(k) = ⌊m × (k × A mod 1)⌋
Where:
- A: constant (0 < A < 1), often (√5 - 1)/2
- k × A mod 1: fractional part
3. Mid-Square Method
Formula: Square key, take middle digits
Example:
- Key = 1234
- Square = 1522756
- Middle digits = 227
- h(1234) = 227 mod m
Collision Resolution
Collision occurs when two keys map to same index.
1. Chaining (Open Hashing)
Method: Store colliding keys in linked list at same index.
Structure:
struct Node {
int key;
int value;
struct Node* next;
};
struct Node* hashTable[SIZE];
Operations:
- Insert: Add to linked list - O(1) average
- Search: Search in linked list - O(1 + α) average
- Delete: Delete from linked list - O(1 + α) average
Where α = n/m (load factor)
Advantages:
- Simple implementation
- No clustering
- Can handle any number of keys
Disadvantages:
- Extra memory for pointers
- Cache performance not optimal
2. Open Addressing (Closed Hashing)
Method: Find next available slot when collision occurs.
Probing Methods:
a) Linear Probing:
- h(k, i) = (h(k) + i) mod m
- Check next slot sequentially
b) Quadratic Probing:
- h(k, i) = (h(k) + c₁i + c₂i²) mod m
- Check slots with quadratic offset
c) Double Hashing:
- h(k, i) = (h₁(k) + i × h₂(k)) mod m
- Use second hash function for step size
Operations:
- Insert: Find empty slot - O(1/(1-α)) average
- Search: Find key or empty slot - O(1/(1-α)) average
- Delete: Mark as deleted (lazy deletion)
Advantages:
- Better cache performance
- No extra memory for pointers
Disadvantages:
- Clustering (especially linear probing)
- Limited capacity
Load Factor
Load Factor α = n/m
Where:
- n: number of keys
- m: table size
Performance:
- α < 0.5: Good performance
- α > 0.7: Performance degrades
- α = 1: Table full (open addressing)
Rehashing
Rehashing is resizing hash table when load factor becomes high.
Process:
- Create new larger table
- Rehash all keys to new table
- Update table size
When to Rehash:
- Load factor exceeds threshold (e.g., 0.7)
- Performance degrades
Hash Table Operations
Insert
Chaining:
void insert(int key, int value) {
int index = hash(key);
struct Node* newNode = createNode(key, value);
newNode->next = hashTable[index];
hashTable[index] = newNode;
}
Open Addressing:
void insert(int key, int value) {
int index = hash(key);
int i = 0;
while (hashTable[index] != NULL && hashTable[index]->key != -1) {
index = (hash(key) + probe(i)) % SIZE;
i++;
}
hashTable[index] = createEntry(key, value);
}
Search
Chaining:
int search(int key) {
int index = hash(key);
struct Node* temp = hashTable[index];
while (temp != NULL) {
if (temp->key == key)
return temp->value;
temp = temp->next;
}
return -1; // Not found
}
Open Addressing:
int search(int key) {
int index = hash(key);
int i = 0;
while (hashTable[index] != NULL) {
if (hashTable[index]->key == key)
return hashTable[index]->value;
if (hashTable[index]->key == -1) // Deleted
break;
index = (hash(key) + probe(i)) % SIZE;
i++;
}
return -1; // Not found
}
Time Complexities
| Operation | Average | Worst Case |
|---|---|---|
| Insert | O(1) | O(n) |
| Search | O(1) | O(n) |
| Delete | O(1) | O(n) |
Note: With good hash function and load factor < 0.7, average case is O(1)
GATE CS Important Points
- Hash Functions: Know division, multiplication methods
- Collision Resolution: Understand chaining and open addressing
- Load Factor: Impact on performance
- Time Complexities: Average vs worst case
- Probing Methods: Linear, quadratic, double hashing
Practice Tips
- Implement Hash Table: Code both chaining and open addressing
- Hash Functions: Practice different hash functions
- Collision Handling: Understand all collision resolution methods
- Load Factor: Calculate and understand impact
- Previous Year Questions: Solve GATE hashing questions