ADVANCED ALGORITHMS - ADVANCED TECHNIQUES:Online Algorithms and Competitive Analysis
Mastering online algorithms and competitive analysis concepts and implementation.
Online Algorithms and Competitive Analysis
An online algorithm must make decisions as inputs arrive, without knowledge of future inputs. In contrast, an offline algorithm (like the algorithms we've studied so far) has access to the entire input upfront.
Competitive Ratio
An online algorithm A is c-competitive if for all input sequences σ:
A(σ) ≤ c · OPT(σ) + b
where OPT(σ) is the optimal offline cost and b is a constant.
The Ski Rental Problem
Setting: You need skis for an unknown number of days. Renting costs $1/day. Buying costs $B. The "buy-or-rent" decision must be made each day.
Optimal offline: If you knew in advance you'd ski ≥ B days, buy; otherwise rent.
Deterministic online strategy: Rent for B days, then buy on day B+1 if you're still skiing.
Analysis:
- If you ski < B days: you rented the whole time (same as optimal renting).
- If you ski ≥ B days: you paid B days rent + B for buying = 2B total. Optimal was B (buy on day 1). Ratio = 2.
Competitive ratio = 2 — this is optimal for deterministic algorithms!
Randomized improvement: Rent for B days with probability depending on a randomized strategy → competitive ratio ≈ e/(e-1) ≈ 1.58.
Paging / Caching
Setting: Fast cache holds k pages. On each request for a page:
- Cache hit: Free
- Cache miss: Load from slow memory (cost 1). Must evict a page if cache is full.
Common algorithms:
| Algorithm | Competitive Ratio | Strategy |
|---|---|---|
| LRU | k | Evict Least Recently Used |
| FIFO | k | Evict First In First Out |
| LFU | Not competitive | Evict Least Frequently Used |
| LFD (offline) | Optimal | Evict page used farthest in future |
from collections import OrderedDict
class LRUCache:
"""Least Recently Used cache — k-competitive online algorithm."""
def __init__(self, capacity):
self.capacity = capacity
self.cache = OrderedDict()
def access(self, page):
if page in self.cache:
self.cache.move_to_end(page) # Mark as recently used
return True # Cache hit
# Cache miss — evict LRU if full
if len(self.cache) >= self.capacity:
self.cache.popitem(last=False) # Remove least recently used
self.cache[page] = True
return False # Cache miss
def simulate_paging(requests, cache_size, algorithm='lru'):
cache = LRUCache(cache_size)
misses = 0
for page in requests:
if not cache.access(page):
misses += 1
return misses
requests = [1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5]
print(f"LRU misses (k=3): {simulate_paging(requests, 3)}") # k-competitive
Load Balancing — List Scheduling
Setting: m machines, n jobs with processing times p₁, p₂, ..., pₙ arriving online. Assign each job to a machine to minimize makespan (max completion time).
Algorithm: Assign each job to the machine with current minimum load (LPT if sorted, List if not).
Competitive ratio of List Scheduling: (2 - 1/m)
import heapq
def list_scheduling(jobs, m):
"""List scheduling: assign each job to least loaded machine."""
# Min-heap: (current_load, machine_id)
machines = [(0, i) for i in range(m)]
heapq.heapify(machines)
assignment = []
for job in jobs:
load, machine = heapq.heappop(machines)
assignment.append(machine)
heapq.heappush(machines, (load + job, machine))
return assignment, max(load for load, _ in machines)
jobs = [6, 5, 5, 4, 4, 3, 3, 1]
assignment, makespan = list_scheduling(jobs, 3)
print(f"Makespan: {makespan}") # Competitive approximation
The Secretary Problem
Setting: Interview n candidates one by one in random order. After each interview, immediately accept or reject. Goal: maximize probability of hiring the best candidate.
Optimal strategy (Optimal Stopping):
- Reject the first n/e ≈ 37% of candidates (observation phase)
- Hire the first candidate better than all seen so far
Result: This strategy succeeds with probability ≈ 1/e ≈ 37%, regardless of n.
import random
import math
def secretary_problem(candidates):
"""Optimal stopping: observe 1/e then hire first improvement."""
n = len(candidates)
observe = max(1, int(n / math.e))
# Observation phase — reject first observe candidates
best_so_far = max(candidates[:observe])
# Hiring phase — hire first candidate better than observation phase
for candidate in candidates[observe:]:
if candidate > best_so_far:
return candidate
# Fallback: hire last candidate
return candidates[-1]
# Simulation
trials = 100000
success = 0
for _ in range(trials):
n = 100
candidates = list(range(1, n+1))
random.shuffle(candidates)
hired = secretary_problem(candidates)
if hired == n: # Hired the best (rank n)
success += 1
print(f"Success rate: {success/trials:.3f} (expected ~0.368)")
Practice Problems
- Prove that no deterministic online algorithm for ski rental can be better than 2-competitive
- Simulate LRU vs FIFO on the request sequence [1,2,3,4,2,5,1,3,6,2,4,7] with k=3
- Show that Deterministic List Scheduling achieves ratio (2-1/m) with an explicit bad instance
- Implement the secretary problem simulation and verify the ~37% success rate
- What is the competitive ratio of "always serve nearest" for the online k-server problem on a line?