ADVANCED ALGORITHMS - ADVANCED TECHNIQUES:Approximation Algorithms — Practical NP-hard Solutions
Mastering approximation algorithms — practical np-hard solutions concepts and implementation.
Approximation Algorithms — Practical NP-hard Solutions
When a problem is NP-hard, we can't find an optimal solution in polynomial time (assuming P ≠ NP). Approximation algorithms find solutions within a provable ratio of optimal.
Approximation Ratio
An algorithm is an α-approximation if for all instances:
- Minimization: ALG(I) ≤ α · OPT(I)
- Maximization: ALG(I) ≥ (1/α) · OPT(I)
where α ≥ 1. Smaller α = better approximation.
Vertex Cover (2-approximation)
Problem: Find the smallest set of vertices that touches every edge.
Algorithm (Maximal Matching):
- Pick any uncovered edge (u, v)
- Add both u and v to the cover
- Remove all edges incident to u or v
- Repeat until all edges covered
Why it's 2-approximate: Every optimal solution must include at least one endpoint of each matched edge. Since we picked both endpoints, we have at most 2× as many vertices.
def vertex_cover_approx(graph):
"""2-approximation for Vertex Cover via maximal matching."""
cover = set()
edges_covered = set()
for u in graph:
for v in graph[u]:
edge = (min(u,v), max(u,v))
if edge not in edges_covered and u not in cover and v not in cover:
cover.add(u)
cover.add(v)
# Mark all edges incident to u and v as covered
for neighbor in graph[u]:
edges_covered.add((min(u, neighbor), max(u, neighbor)))
for neighbor in graph[v]:
edges_covered.add((min(v, neighbor), max(v, neighbor)))
return cover
# Example
graph = {0: [1,2], 1: [0,2,3], 2: [0,1,4], 3: [1,4], 4: [2,3]}
print(vertex_cover_approx(graph)) # Some 2-approximation
Travelling Salesman Problem (TSP)
Metric TSP — 2-approximation (Christofides-like)
For TSP where edge weights satisfy the triangle inequality (d(u,w) ≤ d(u,v) + d(v,w)):
Algorithm:
- Find a Minimum Spanning Tree T
- Do a DFS pre-order traversal of T to get a tour H
- Output H (shortcutting repeated vertices)
Why 2-approximate: The MST cost ≤ OPT (removing one edge from optimal tour gives a spanning tree). The DFS tour visits each edge at most twice, so tour cost ≤ 2 · MST cost ≤ 2 · OPT.
import heapq
def metric_tsp_approx(dist):
"""2-approximation for metric TSP using MST DFS tour."""
n = len(dist)
# Step 1: Prim's MST
in_mst = [False] * n
min_edge = [float('inf')] * n
parent = [-1] * n
min_edge[0] = 0
heap = [(0, 0)]
mst_adj = {i: [] for i in range(n)}
while heap:
cost, u = heapq.heappop(heap)
if in_mst[u]: continue
in_mst[u] = True
if parent[u] != -1:
mst_adj[parent[u]].append(u)
mst_adj[u].append(parent[u])
for v in range(n):
if not in_mst[v] and dist[u][v] < min_edge[v]:
min_edge[v] = dist[u][v]
parent[v] = u
heapq.heappush(heap, (dist[u][v], v))
# Step 2: DFS pre-order traversal
visited = [False] * n
tour = []
def dfs(node):
visited[node] = True
tour.append(node)
for neighbor in mst_adj[node]:
if not visited[neighbor]:
dfs(neighbor)
dfs(0)
tour.append(0) # Return to start
# Calculate tour cost
cost = sum(dist[tour[i]][tour[i+1]] for i in range(len(tour)-1))
return tour, cost
Set Cover (O(log n) approximation)
Problem: Given universe U of n elements and collection S of subsets, find minimum number of subsets covering all of U.
Greedy Algorithm: Repeatedly pick the subset covering the most uncovered elements.
Approximation ratio: H(n) = 1 + 1/2 + 1/3 + ... + 1/n ≈ ln(n) + 0.577
def set_cover_greedy(universe, subsets):
"""Greedy set cover — O(log n) approximation."""
covered = set()
chosen = []
remaining_subsets = [set(s) for s in subsets]
while covered != universe:
# Pick subset with maximum new coverage
best_idx = max(range(len(remaining_subsets)),
key=lambda i: len(remaining_subsets[i] - covered))
best = remaining_subsets[best_idx]
if not (best - covered):
break # No progress possible
covered |= best
chosen.append(subsets[best_idx])
return chosen
universe = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
subsets = [{1,2,3,8},{1,2,3,4,5},{4,5,7},{5,6,7},{6,7,8,9,10}]
print(set_cover_greedy(universe, subsets))
Knapsack — FPTAS
For problems with numerical objective values, we can often get a (1 + ε)-approximation in polynomial time via FPTAS (Fully Polynomial-Time Approximation Scheme):
- Running time is polynomial in both n and 1/ε
- As ε → 0, we approach the exact solution (but run slower)
Knapsack FPTAS:
- Scale item values: v'[i] = ⌊v[i] / (ε·V_max/n)⌋
- Solve exact DP on scaled values (now polynomial since max scaled value is n/ε)
- Solution is (1+ε)-optimal for original problem
def knapsack_fptas(weights, values, capacity, epsilon):
"""(1+epsilon)-approximation for 0/1 Knapsack."""
n = len(values)
if n == 0: return 0
V_max = max(values)
# Scale factor: we'll round values down to multiples of K
K = epsilon * V_max / n
if K == 0: # epsilon = 0, exact solution needed
K = 1
# Scale values (round down)
scaled_values = [int(v / K) for v in values]
max_scaled = sum(scaled_values)
# DP on scaled values: dp[v] = min weight to achieve scaled value v
dp = [float('inf')] * (max_scaled + 1)
dp[0] = 0
for i in range(n):
for v in range(max_scaled, scaled_values[i] - 1, -1):
if dp[v - scaled_values[i]] + weights[i] <= capacity:
dp[v] = min(dp[v], dp[v - scaled_values[i]] + weights[i])
# Find maximum achievable scaled value within capacity
best_scaled = max(v for v in range(max_scaled + 1) if dp[v] <= capacity)
# Convert back using original values approximation
return best_scaled * K
# Test
weights = [2, 3, 4, 5]
values = [3, 4, 5, 7]
capacity = 7
print(f"FPTAS result: {knapsack_fptas(weights, values, capacity, 0.1):.2f}")
Inapproximability — Lower Bounds
Not all NP-hard problems can be approximated to any constant ratio:
| Problem | Best approximation | Hardness |
|---|---|---|
| Vertex Cover | 2 (best known) | APX-hard; beating 2 is hard |
| TSP (general) | None possible | NP-hard to approximate within any ratio |
| Metric TSP | 1.5 (Christofides) | Best known |
| Set Cover | H(n) ≈ ln n | Cannot do better unless P=NP |
| Clique | n^{1-ε} | NP-hard to approximate |
| Independent Set | n^{1-ε} | NP-hard to approximate |
PCP Theorem (1992): Forms the theoretical basis for inapproximability — proves that certain problems cannot be approximated beyond a constant threshold unless P=NP.
Practice Problems
- Implement a 2-approximation for Vertex Cover and test on a random graph
- Code the MST-based TSP approximation and compare with brute-force optimal on n=8 cities
- Analyze the approximation ratio of: "always pick the most expensive item that fits" for Knapsack
- Prove that the greedy algorithm for Maximization Set Cover achieves (1 - 1/e) ≈ 63% approximation ratio
- Implement the Knapsack FPTAS and verify that for ε=0.5, the result is within 50% of optimal