Online Algorithms and Competitive Analysis

An online algorithm must make decisions as inputs arrive, without knowledge of future inputs. In contrast, an offline algorithm (like the algorithms we've studied so far) has access to the entire input upfront.

Competitive Ratio

An online algorithm A is c-competitive if for all input sequences σ:

A(σ) ≤ c · OPT(σ) + b

where OPT(σ) is the optimal offline cost and b is a constant.

The Ski Rental Problem

Setting: You need skis for an unknown number of days. Renting costs $1/day. Buying costs $B. The "buy-or-rent" decision must be made each day.

Optimal offline: If you knew in advance you'd ski ≥ B days, buy; otherwise rent.

Deterministic online strategy: Rent for B days, then buy on day B+1 if you're still skiing.

Analysis:

If you ski < B days: you rented the whole time (same as optimal renting).
If you ski ≥ B days: you paid B days rent + B for buying = 2B total. Optimal was B (buy on day 1). Ratio = 2.

Competitive ratio = 2 — this is optimal for deterministic algorithms!

Randomized improvement: Rent for B days with probability depending on a randomized strategy → competitive ratio ≈ e/(e-1) ≈ 1.58.

Paging / Caching

Setting: Fast cache holds k pages. On each request for a page:

Cache hit: Free
Cache miss: Load from slow memory (cost 1). Must evict a page if cache is full.

Common algorithms:

Algorithm	Competitive Ratio	Strategy
LRU	k	Evict Least Recently Used
FIFO	k	Evict First In First Out
LFU	Not competitive	Evict Least Frequently Used
LFD (offline)	Optimal	Evict page used farthest in future

python

from collections import OrderedDict

class LRUCache:
    """Least Recently Used cache — k-competitive online algorithm."""
    
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = OrderedDict()
    
    def access(self, page):
        if page in self.cache:
            self.cache.move_to_end(page)  # Mark as recently used
            return True  # Cache hit
        
        # Cache miss — evict LRU if full
        if len(self.cache) >= self.capacity:
            self.cache.popitem(last=False)  # Remove least recently used
        
        self.cache[page] = True
        return False  # Cache miss

def simulate_paging(requests, cache_size, algorithm='lru'):
    cache = LRUCache(cache_size)
    misses = 0
    for page in requests:
        if not cache.access(page):
            misses += 1
    return misses

requests = [1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5]
print(f"LRU misses (k=3): {simulate_paging(requests, 3)}")  # k-competitive

Load Balancing — List Scheduling

Setting: m machines, n jobs with processing times p₁, p₂, ..., pₙ arriving online. Assign each job to a machine to minimize makespan (max completion time).

Algorithm: Assign each job to the machine with current minimum load (LPT if sorted, List if not).

Competitive ratio of List Scheduling: (2 - 1/m)

python

import heapq

def list_scheduling(jobs, m):
    """List scheduling: assign each job to least loaded machine."""
    # Min-heap: (current_load, machine_id)
    machines = [(0, i) for i in range(m)]
    heapq.heapify(machines)
    
    assignment = []
    for job in jobs:
        load, machine = heapq.heappop(machines)
        assignment.append(machine)
        heapq.heappush(machines, (load + job, machine))
    
    return assignment, max(load for load, _ in machines)

jobs = [6, 5, 5, 4, 4, 3, 3, 1]
assignment, makespan = list_scheduling(jobs, 3)
print(f"Makespan: {makespan}")  # Competitive approximation

The Secretary Problem

Setting: Interview n candidates one by one in random order. After each interview, immediately accept or reject. Goal: maximize probability of hiring the best candidate.

Optimal strategy (Optimal Stopping):

Reject the first n/e ≈ 37% of candidates (observation phase)
Hire the first candidate better than all seen so far

Result: This strategy succeeds with probability ≈ 1/e ≈ 37%, regardless of n.

python

import random
import math

def secretary_problem(candidates):
    """Optimal stopping: observe 1/e then hire first improvement."""
    n = len(candidates)
    observe = max(1, int(n / math.e))
    
    # Observation phase — reject first observe candidates
    best_so_far = max(candidates[:observe])
    
    # Hiring phase — hire first candidate better than observation phase
    for candidate in candidates[observe:]:
        if candidate > best_so_far:
            return candidate
    
    # Fallback: hire last candidate
    return candidates[-1]

# Simulation
trials = 100000
success = 0
for _ in range(trials):
    n = 100
    candidates = list(range(1, n+1))
    random.shuffle(candidates)
    hired = secretary_problem(candidates)
    if hired == n:  # Hired the best (rank n)
        success += 1

print(f"Success rate: {success/trials:.3f} (expected ~0.368)")

Practice Problems

Prove that no deterministic online algorithm for ski rental can be better than 2-competitive
Simulate LRU vs FIFO on the request sequence [1,2,3,4,2,5,1,3,6,2,4,7] with k=3
Show that Deterministic List Scheduling achieves ratio (2-1/m) with an explicit bad instance
Implement the secretary problem simulation and verify the ~37% success rate
What is the competitive ratio of "always serve nearest" for the online k-server problem on a line?

ADVANCED ALGORITHMS - ADVANCED TECHNIQUES:Online Algorithms and Competitive Analysis

Online Algorithms and Competitive Analysis

Competitive Ratio

The Ski Rental Problem

Paging / Caching

Load Balancing — List Scheduling

The Secretary Problem

Practice Problems

Related Page

Approximation Algorithms — Practical NP-hard Solutions