ADVANCED ALGORITHMS - ADVANCED TECHNIQUES:Online Algorithms and Competitive Analysis

Mastering online algorithms and competitive analysis concepts and implementation.

Online Algorithms and Competitive Analysis

An online algorithm must make decisions as inputs arrive, without knowledge of future inputs. In contrast, an offline algorithm (like the algorithms we've studied so far) has access to the entire input upfront.

Competitive Ratio

An online algorithm A is c-competitive if for all input sequences σ:

A(σ) ≤ c · OPT(σ) + b

where OPT(σ) is the optimal offline cost and b is a constant.

The Ski Rental Problem

Setting: You need skis for an unknown number of days. Renting costs $1/day. Buying costs $B. The "buy-or-rent" decision must be made each day.

Optimal offline: If you knew in advance you'd ski ≥ B days, buy; otherwise rent.

Deterministic online strategy: Rent for B days, then buy on day B+1 if you're still skiing.

Analysis:

  • If you ski < B days: you rented the whole time (same as optimal renting).
  • If you ski ≥ B days: you paid B days rent + B for buying = 2B total. Optimal was B (buy on day 1). Ratio = 2.

Competitive ratio = 2 — this is optimal for deterministic algorithms!

Randomized improvement: Rent for B days with probability depending on a randomized strategy → competitive ratio ≈ e/(e-1) ≈ 1.58.

Paging / Caching

Setting: Fast cache holds k pages. On each request for a page:

  • Cache hit: Free
  • Cache miss: Load from slow memory (cost 1). Must evict a page if cache is full.

Common algorithms:

AlgorithmCompetitive RatioStrategy
LRUkEvict Least Recently Used
FIFOkEvict First In First Out
LFUNot competitiveEvict Least Frequently Used
LFD (offline)OptimalEvict page used farthest in future
from collections import OrderedDict

class LRUCache:
    """Least Recently Used cache — k-competitive online algorithm."""
    
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = OrderedDict()
    
    def access(self, page):
        if page in self.cache:
            self.cache.move_to_end(page)  # Mark as recently used
            return True  # Cache hit
        
        # Cache miss — evict LRU if full
        if len(self.cache) >= self.capacity:
            self.cache.popitem(last=False)  # Remove least recently used
        
        self.cache[page] = True
        return False  # Cache miss

def simulate_paging(requests, cache_size, algorithm='lru'):
    cache = LRUCache(cache_size)
    misses = 0
    for page in requests:
        if not cache.access(page):
            misses += 1
    return misses

requests = [1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5]
print(f"LRU misses (k=3): {simulate_paging(requests, 3)}")  # k-competitive

Load Balancing — List Scheduling

Setting: m machines, n jobs with processing times p₁, p₂, ..., pₙ arriving online. Assign each job to a machine to minimize makespan (max completion time).

Algorithm: Assign each job to the machine with current minimum load (LPT if sorted, List if not).

Competitive ratio of List Scheduling: (2 - 1/m)

import heapq

def list_scheduling(jobs, m):
    """List scheduling: assign each job to least loaded machine."""
    # Min-heap: (current_load, machine_id)
    machines = [(0, i) for i in range(m)]
    heapq.heapify(machines)
    
    assignment = []
    for job in jobs:
        load, machine = heapq.heappop(machines)
        assignment.append(machine)
        heapq.heappush(machines, (load + job, machine))
    
    return assignment, max(load for load, _ in machines)

jobs = [6, 5, 5, 4, 4, 3, 3, 1]
assignment, makespan = list_scheduling(jobs, 3)
print(f"Makespan: {makespan}")  # Competitive approximation

The Secretary Problem

Setting: Interview n candidates one by one in random order. After each interview, immediately accept or reject. Goal: maximize probability of hiring the best candidate.

Optimal strategy (Optimal Stopping):

  1. Reject the first n/e ≈ 37% of candidates (observation phase)
  2. Hire the first candidate better than all seen so far

Result: This strategy succeeds with probability ≈ 1/e ≈ 37%, regardless of n.

import random
import math

def secretary_problem(candidates):
    """Optimal stopping: observe 1/e then hire first improvement."""
    n = len(candidates)
    observe = max(1, int(n / math.e))
    
    # Observation phase — reject first observe candidates
    best_so_far = max(candidates[:observe])
    
    # Hiring phase — hire first candidate better than observation phase
    for candidate in candidates[observe:]:
        if candidate > best_so_far:
            return candidate
    
    # Fallback: hire last candidate
    return candidates[-1]

# Simulation
trials = 100000
success = 0
for _ in range(trials):
    n = 100
    candidates = list(range(1, n+1))
    random.shuffle(candidates)
    hired = secretary_problem(candidates)
    if hired == n:  # Hired the best (rank n)
        success += 1

print(f"Success rate: {success/trials:.3f} (expected ~0.368)")

Practice Problems

  1. Prove that no deterministic online algorithm for ski rental can be better than 2-competitive
  2. Simulate LRU vs FIFO on the request sequence [1,2,3,4,2,5,1,3,6,2,4,7] with k=3
  3. Show that Deterministic List Scheduling achieves ratio (2-1/m) with an explicit bad instance
  4. Implement the secretary problem simulation and verify the ~37% success rate
  5. What is the competitive ratio of "always serve nearest" for the online k-server problem on a line?