05
Huffman Coding Demo
Chapter 5 • Advanced
120 min
Huffman Coding Demo
Problem Statement
Given characters and their frequencies, assign binary codes to minimize total encoded length.
Huffman's Algorithm
Greedy Strategy
Repeatedly merge two nodes with minimum frequency.
Algorithm Steps
- Create leaf node for each character with frequency
- Build min-heap of nodes
- While heap has more than one node:
- Extract two nodes with minimum frequency
- Create internal node with these as children
- Frequency = sum of children frequencies
- Insert back into heap
- Root of tree is Huffman tree
- Assign codes: left = 0, right = 1
Implementation
python.js
import heapq
class HuffmanNode:
def __init__(self, char=None, freq=0, left=None, right=None):
self.char = char
self.freq = freq
self.left = left
self.right = right
def __lt__(self, other):
return self.freq < other.freq
def build_huffman_tree(frequencies):
"""
Build Huffman tree from character frequencies.
frequencies: {char: freq}
"""
# Create leaf nodes
heap = []
for char, freq in frequencies.items():
heapq.heappush(heap, HuffmanNode(char, freq))
# Build tree
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
# Create internal node
merged = HuffmanNode(freq=left.freq + right.freq, left=left, right=right)
heapq.heappush(heap, merged)
return heap[0] if heap else None
def generate_codes(root, code="", codes={}):
"""Generate Huffman codes from tree"""
if root is None:
return
if root.char is not None:
codes[root.char] = code if code else "0"
return
generate_codes(root.left, code + "0", codes)
generate_codes(root.right, code + "1", codes)
return codes
def huffman_encode(text, frequencies):
"""Encode text using Huffman coding"""
tree = build_huffman_tree(frequencies)
codes = generate_codes(tree)
encoded = ""
for char in text:
encoded += codes[char]
return encoded, codes
def huffman_decode(encoded, tree):
"""Decode Huffman-encoded string"""
decoded = ""
current = tree
for bit in encoded:
if bit == "0":
current = current.left
else:
current = current.right
if current.char is not None:
decoded += current.char
current = tree
return decoded
Example
python.js
text = "aabacdab"
frequencies = {'a': 4, 'b': 2, 'c': 1, 'd': 1}
encoded, codes = huffman_encode(text, frequencies)
print("Codes:", codes)
print("Encoded:", encoded)
# Output:
# Codes: {'a': '0', 'b': '10', 'c': '110', 'd': '111'}
# Encoded: 00100101110010
Properties
- Optimal: Produces minimum average code length
- Prefix-free: No code is prefix of another
- Uniquely decodable: Can decode without ambiguity
Compression Ratio
Original: 8 characters × 8 bits = 64 bits
Encoded: 16 bits
Compression: 64/16 = 4:1
Applications
- File compression: ZIP, GZIP use Huffman coding
- Image compression: JPEG uses Huffman coding
- Data transmission: Reduce bandwidth
- DNA compression: Compress genetic sequences
Practice Problems
- Build Huffman tree from frequencies
- Encode and decode text
- Calculate compression ratio
- Visualize Huffman tree
- Adaptive Huffman coding