05

Huffman Coding Demo

Chapter 5 • Advanced

120 min

Huffman Coding Demo

Problem Statement

Given characters and their frequencies, assign binary codes to minimize total encoded length.

Huffman's Algorithm

Greedy Strategy

Repeatedly merge two nodes with minimum frequency.

Algorithm Steps

  1. Create leaf node for each character with frequency
  2. Build min-heap of nodes
  3. While heap has more than one node:
  • Extract two nodes with minimum frequency
  • Create internal node with these as children
  • Frequency = sum of children frequencies
  • Insert back into heap
  1. Root of tree is Huffman tree
  2. Assign codes: left = 0, right = 1

Implementation

python.js
import heapq

class HuffmanNode:
    def __init__(self, char=None, freq=0, left=None, right=None):
        self.char = char
        self.freq = freq
        self.left = left
        self.right = right
    
    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(frequencies):
    """
    Build Huffman tree from character frequencies.
    frequencies: {char: freq}
    """
    # Create leaf nodes
    heap = []
    for char, freq in frequencies.items():
        heapq.heappush(heap, HuffmanNode(char, freq))
    
    # Build tree
    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        
        # Create internal node
        merged = HuffmanNode(freq=left.freq + right.freq, left=left, right=right)
        heapq.heappush(heap, merged)
    
    return heap[0] if heap else None

def generate_codes(root, code="", codes={}):
    """Generate Huffman codes from tree"""
    if root is None:
        return
    
    if root.char is not None:
        codes[root.char] = code if code else "0"
        return
    
    generate_codes(root.left, code + "0", codes)
    generate_codes(root.right, code + "1", codes)
    
    return codes

def huffman_encode(text, frequencies):
    """Encode text using Huffman coding"""
    tree = build_huffman_tree(frequencies)
    codes = generate_codes(tree)
    
    encoded = ""
    for char in text:
        encoded += codes[char]
    
    return encoded, codes

def huffman_decode(encoded, tree):
    """Decode Huffman-encoded string"""
    decoded = ""
    current = tree
    
    for bit in encoded:
        if bit == "0":
            current = current.left
        else:
            current = current.right
        
        if current.char is not None:
            decoded += current.char
            current = tree
    
    return decoded

Example

python.js
text = "aabacdab"
frequencies = {'a': 4, 'b': 2, 'c': 1, 'd': 1}

encoded, codes = huffman_encode(text, frequencies)
print("Codes:", codes)
print("Encoded:", encoded)

# Output:
# Codes: {'a': '0', 'b': '10', 'c': '110', 'd': '111'}
# Encoded: 00100101110010

Properties

  • Optimal: Produces minimum average code length
  • Prefix-free: No code is prefix of another
  • Uniquely decodable: Can decode without ambiguity

Compression Ratio

Original: 8 characters × 8 bits = 64 bits

Encoded: 16 bits

Compression: 64/16 = 4:1

Applications

  • File compression: ZIP, GZIP use Huffman coding
  • Image compression: JPEG uses Huffman coding
  • Data transmission: Reduce bandwidth
  • DNA compression: Compress genetic sequences

Practice Problems

  1. Build Huffman tree from frequencies
  2. Encode and decode text
  3. Calculate compression ratio
  4. Visualize Huffman tree
  5. Adaptive Huffman coding