06

Hashing

Chapter 6 • Intermediate

50 min

Hashing

Hashing is a technique for fast data retrieval. Understanding hash functions and collision resolution is important for GATE CS.

Hash Table

Hash Table is a data structure that maps keys to values using a hash function.

Hash Function

Hash Function maps a key to an index in the hash table.

Properties of Good Hash Function:

  • Uniform Distribution: Keys distributed evenly
  • Deterministic: Same key always maps to same index
  • Fast Computation: O(1) computation time
  • Minimize Collisions: Reduce key collisions

Common Hash Functions

1. Division Method

Formula: h(k) = k mod m

Where:

  • k: key
  • m: table size (preferably prime)

Example:

  • Key = 42, m = 11
  • h(42) = 42 mod 11 = 9

2. Multiplication Method

Formula: h(k) = ⌊m × (k × A mod 1)⌋

Where:

  • A: constant (0 < A < 1), often (√5 - 1)/2
  • k × A mod 1: fractional part

3. Mid-Square Method

Formula: Square key, take middle digits

Example:

  • Key = 1234
  • Square = 1522756
  • Middle digits = 227
  • h(1234) = 227 mod m

Collision Resolution

Collision occurs when two keys map to same index.

1. Chaining (Open Hashing)

Method: Store colliding keys in linked list at same index.

Structure:

c.js
struct Node {
    int key;
    int value;
    struct Node* next;
};

struct Node* hashTable[SIZE];

Operations:

  • Insert: Add to linked list - O(1) average
  • Search: Search in linked list - O(1 + α) average
  • Delete: Delete from linked list - O(1 + α) average

Where α = n/m (load factor)

Advantages:

  • Simple implementation
  • No clustering
  • Can handle any number of keys

Disadvantages:

  • Extra memory for pointers
  • Cache performance not optimal

2. Open Addressing (Closed Hashing)

Method: Find next available slot when collision occurs.

Probing Methods:

a) Linear Probing:

  • h(k, i) = (h(k) + i) mod m
  • Check next slot sequentially

b) Quadratic Probing:

  • h(k, i) = (h(k) + c₁i + c₂i²) mod m
  • Check slots with quadratic offset

c) Double Hashing:

  • h(k, i) = (h₁(k) + i × h₂(k)) mod m
  • Use second hash function for step size

Operations:

  • Insert: Find empty slot - O(1/(1-α)) average
  • Search: Find key or empty slot - O(1/(1-α)) average
  • Delete: Mark as deleted (lazy deletion)

Advantages:

  • Better cache performance
  • No extra memory for pointers

Disadvantages:

  • Clustering (especially linear probing)
  • Limited capacity

Load Factor

Load Factor α = n/m

Where:

  • n: number of keys
  • m: table size

Performance:

  • α < 0.5: Good performance
  • α > 0.7: Performance degrades
  • α = 1: Table full (open addressing)

Rehashing

Rehashing is resizing hash table when load factor becomes high.

Process:

  1. Create new larger table
  2. Rehash all keys to new table
  3. Update table size

When to Rehash:

  • Load factor exceeds threshold (e.g., 0.7)
  • Performance degrades

Hash Table Operations

Insert

Chaining:

c.js
void insert(int key, int value) {
    int index = hash(key);
    struct Node* newNode = createNode(key, value);
    newNode->next = hashTable[index];
    hashTable[index] = newNode;
}

Open Addressing:

c.js
void insert(int key, int value) {
    int index = hash(key);
    int i = 0;
    
    while (hashTable[index] != NULL && hashTable[index]->key != -1) {
        index = (hash(key) + probe(i)) % SIZE;
        i++;
    }
    
    hashTable[index] = createEntry(key, value);
}

Search

Chaining:

c.js
int search(int key) {
    int index = hash(key);
    struct Node* temp = hashTable[index];
    
    while (temp != NULL) {
        if (temp->key == key)
            return temp->value;
        temp = temp->next;
    }
    return -1;  // Not found
}

Open Addressing:

c.js
int search(int key) {
    int index = hash(key);
    int i = 0;
    
    while (hashTable[index] != NULL) {
        if (hashTable[index]->key == key)
            return hashTable[index]->value;
        if (hashTable[index]->key == -1)  // Deleted
            break;
        index = (hash(key) + probe(i)) % SIZE;
        i++;
    }
    return -1;  // Not found
}

Time Complexities

OperationAverageWorst Case
InsertO(1)O(n)
SearchO(1)O(n)
DeleteO(1)O(n)

Note: With good hash function and load factor < 0.7, average case is O(1)

GATE CS Important Points

  1. Hash Functions: Know division, multiplication methods
  2. Collision Resolution: Understand chaining and open addressing
  3. Load Factor: Impact on performance
  4. Time Complexities: Average vs worst case
  5. Probing Methods: Linear, quadratic, double hashing

Practice Tips

  1. Implement Hash Table: Code both chaining and open addressing
  2. Hash Functions: Practice different hash functions
  3. Collision Handling: Understand all collision resolution methods
  4. Load Factor: Calculate and understand impact
  5. Previous Year Questions: Solve GATE hashing questions