ADVANCED ALGORITHMS - ADVANCED TECHNIQUES:Approximation Algorithms — Practical NP-hard Solutions

Mastering approximation algorithms — practical np-hard solutions concepts and implementation.

Approximation Algorithms — Practical NP-hard Solutions

When a problem is NP-hard, we can't find an optimal solution in polynomial time (assuming P ≠ NP). Approximation algorithms find solutions within a provable ratio of optimal.

Approximation Ratio

An algorithm is an α-approximation if for all instances:

  • Minimization: ALG(I) ≤ α · OPT(I)
  • Maximization: ALG(I) ≥ (1/α) · OPT(I)

where α ≥ 1. Smaller α = better approximation.

Vertex Cover (2-approximation)

Problem: Find the smallest set of vertices that touches every edge.

Algorithm (Maximal Matching):

  1. Pick any uncovered edge (u, v)
  2. Add both u and v to the cover
  3. Remove all edges incident to u or v
  4. Repeat until all edges covered

Why it's 2-approximate: Every optimal solution must include at least one endpoint of each matched edge. Since we picked both endpoints, we have at most 2× as many vertices.

def vertex_cover_approx(graph):
    """2-approximation for Vertex Cover via maximal matching."""
    cover = set()
    edges_covered = set()
    
    for u in graph:
        for v in graph[u]:
            edge = (min(u,v), max(u,v))
            if edge not in edges_covered and u not in cover and v not in cover:
                cover.add(u)
                cover.add(v)
                # Mark all edges incident to u and v as covered
                for neighbor in graph[u]:
                    edges_covered.add((min(u, neighbor), max(u, neighbor)))
                for neighbor in graph[v]:
                    edges_covered.add((min(v, neighbor), max(v, neighbor)))
    
    return cover

# Example
graph = {0: [1,2], 1: [0,2,3], 2: [0,1,4], 3: [1,4], 4: [2,3]}
print(vertex_cover_approx(graph))  # Some 2-approximation

Travelling Salesman Problem (TSP)

Metric TSP — 2-approximation (Christofides-like)

For TSP where edge weights satisfy the triangle inequality (d(u,w) ≤ d(u,v) + d(v,w)):

Algorithm:

  1. Find a Minimum Spanning Tree T
  2. Do a DFS pre-order traversal of T to get a tour H
  3. Output H (shortcutting repeated vertices)

Why 2-approximate: The MST cost ≤ OPT (removing one edge from optimal tour gives a spanning tree). The DFS tour visits each edge at most twice, so tour cost ≤ 2 · MST cost ≤ 2 · OPT.

import heapq

def metric_tsp_approx(dist):
    """2-approximation for metric TSP using MST DFS tour."""
    n = len(dist)
    
    # Step 1: Prim's MST
    in_mst = [False] * n
    min_edge = [float('inf')] * n
    parent = [-1] * n
    min_edge[0] = 0
    heap = [(0, 0)]
    mst_adj = {i: [] for i in range(n)}
    
    while heap:
        cost, u = heapq.heappop(heap)
        if in_mst[u]: continue
        in_mst[u] = True
        if parent[u] != -1:
            mst_adj[parent[u]].append(u)
            mst_adj[u].append(parent[u])
        for v in range(n):
            if not in_mst[v] and dist[u][v] < min_edge[v]:
                min_edge[v] = dist[u][v]
                parent[v] = u
                heapq.heappush(heap, (dist[u][v], v))
    
    # Step 2: DFS pre-order traversal
    visited = [False] * n
    tour = []
    
    def dfs(node):
        visited[node] = True
        tour.append(node)
        for neighbor in mst_adj[node]:
            if not visited[neighbor]:
                dfs(neighbor)
    
    dfs(0)
    tour.append(0)  # Return to start
    
    # Calculate tour cost
    cost = sum(dist[tour[i]][tour[i+1]] for i in range(len(tour)-1))
    return tour, cost

Set Cover (O(log n) approximation)

Problem: Given universe U of n elements and collection S of subsets, find minimum number of subsets covering all of U.

Greedy Algorithm: Repeatedly pick the subset covering the most uncovered elements.

Approximation ratio: H(n) = 1 + 1/2 + 1/3 + ... + 1/n ≈ ln(n) + 0.577

def set_cover_greedy(universe, subsets):
    """Greedy set cover — O(log n) approximation."""
    covered = set()
    chosen = []
    remaining_subsets = [set(s) for s in subsets]
    
    while covered != universe:
        # Pick subset with maximum new coverage
        best_idx = max(range(len(remaining_subsets)),
                      key=lambda i: len(remaining_subsets[i] - covered))
        best = remaining_subsets[best_idx]
        
        if not (best - covered):
            break  # No progress possible
        
        covered |= best
        chosen.append(subsets[best_idx])
    
    return chosen

universe = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
subsets = [{1,2,3,8},{1,2,3,4,5},{4,5,7},{5,6,7},{6,7,8,9,10}]
print(set_cover_greedy(universe, subsets))

Knapsack — FPTAS

For problems with numerical objective values, we can often get a (1 + ε)-approximation in polynomial time via FPTAS (Fully Polynomial-Time Approximation Scheme):

  • Running time is polynomial in both n and 1/ε
  • As ε → 0, we approach the exact solution (but run slower)

Knapsack FPTAS:

  1. Scale item values: v'[i] = ⌊v[i] / (ε·V_max/n)⌋
  2. Solve exact DP on scaled values (now polynomial since max scaled value is n/ε)
  3. Solution is (1+ε)-optimal for original problem
def knapsack_fptas(weights, values, capacity, epsilon):
    """(1+epsilon)-approximation for 0/1 Knapsack."""
    n = len(values)
    if n == 0: return 0
    
    V_max = max(values)
    # Scale factor: we'll round values down to multiples of K
    K = epsilon * V_max / n
    
    if K == 0:  # epsilon = 0, exact solution needed
        K = 1
    
    # Scale values (round down)
    scaled_values = [int(v / K) for v in values]
    max_scaled = sum(scaled_values)
    
    # DP on scaled values: dp[v] = min weight to achieve scaled value v
    dp = [float('inf')] * (max_scaled + 1)
    dp[0] = 0
    
    for i in range(n):
        for v in range(max_scaled, scaled_values[i] - 1, -1):
            if dp[v - scaled_values[i]] + weights[i] <= capacity:
                dp[v] = min(dp[v], dp[v - scaled_values[i]] + weights[i])
    
    # Find maximum achievable scaled value within capacity
    best_scaled = max(v for v in range(max_scaled + 1) if dp[v] <= capacity)
    
    # Convert back using original values approximation
    return best_scaled * K

# Test
weights = [2, 3, 4, 5]
values = [3, 4, 5, 7]
capacity = 7
print(f"FPTAS result: {knapsack_fptas(weights, values, capacity, 0.1):.2f}")

Inapproximability — Lower Bounds

Not all NP-hard problems can be approximated to any constant ratio:

ProblemBest approximationHardness
Vertex Cover2 (best known)APX-hard; beating 2 is hard
TSP (general)None possibleNP-hard to approximate within any ratio
Metric TSP1.5 (Christofides)Best known
Set CoverH(n) ≈ ln nCannot do better unless P=NP
Cliquen^{1-ε}NP-hard to approximate
Independent Setn^{1-ε}NP-hard to approximate

PCP Theorem (1992): Forms the theoretical basis for inapproximability — proves that certain problems cannot be approximated beyond a constant threshold unless P=NP.

Practice Problems

  1. Implement a 2-approximation for Vertex Cover and test on a random graph
  2. Code the MST-based TSP approximation and compare with brute-force optimal on n=8 cities
  3. Analyze the approximation ratio of: "always pick the most expensive item that fits" for Knapsack
  4. Prove that the greedy algorithm for Maximization Set Cover achieves (1 - 1/e) ≈ 63% approximation ratio
  5. Implement the Knapsack FPTAS and verify that for ε=0.5, the result is within 50% of optimal