18

Module 18: Failure Is the Default (Error & Resilience)

Chapter 18 • Advanced

50 min

Failure Is the Default (Error & Resilience)

The Real Problem

Scenario: It's 2 AM. Products app deployment fails. App is down. Users are trying to checkout. What happens?

The Question: What happens when Products app is down at 2 AM and users are trying to checkout?

This is where most courses chicken out. Don't.

In production, failure is not an exception. Failure is the default.


The Production Reality

Failure Modes

  1. Remote Unavailable - Products app is down
  2. Partial Failure - Products fails, Checkout works
  3. Network Failure - Slow network, timeouts
  4. Version Mismatch - Products updated, Checkout hasn't
  5. Load Failure - remoteEntry.js fails to load

The Question: How do we handle each failure mode gracefully?


Strategy 1: Remote Unavailable at Runtime

The Problem

Products app is down. User navigates to /products. What happens?

The Solution: Graceful Degradation

typescript.js
// Host App - Route with error handling
const routes: Routes = [
  {
    path: 'products',
    loadChildren: () => import('products/ProductsModule')
      .then(m => m.ProductsModule)
      .catch(err => {
        console.error('Products app unavailable', err)
        // Return fallback module
        return import('./fallback/products-fallback.module')
          .then(m => m.ProductsFallbackModule)
      })
  }
]
typescript.js
// Fallback Module - Shows cached or error state
@NgModule({
  declarations: [ProductsFallbackComponent],
  imports: [CommonModule]
})
export class ProductsFallbackModule {}

@Component({
  template: `
    <div class="error-state">
      <h2>Products temporarily unavailable</h2>
      <p>We're working on it. Please try again later.</p>
      <button (click)="retry()">Retry</button>
    </div>
  `
})
export class ProductsFallbackComponent {
  retry() {
    window.location.reload()
  }
}

Why This Works

App Doesn't Crash - Host continues working

User Sees Message - Not a blank screen

Retry Option - User can try again

Other Apps Work - Checkout still functional


Strategy 2: Partial App Failure

The Problem

Products app loads, but API calls fail. Products shows error, but rest of app works.

The Solution: Error Boundaries Per Remote

typescript.js
// Host App - Error boundary component
@Component({
  selector: 'app-remote-container',
  template: `
    <ng-container *ngIf="!hasError; else errorTemplate">
      <router-outlet></router-outlet>
    </ng-container>
    <ng-template #errorTemplate>
      <app-error-fallback [error]="error"></app-error-fallback>
    </ng-template>
  `
})
export class RemoteContainerComponent implements OnInit {
  hasError = false
  error: any = null
  
  constructor(
    private router: Router,
    private errorHandler: ErrorHandlerService
  ) {}
  
  ngOnInit() {
    // Listen for errors from remotes
    this.errorHandler.errors$.subscribe(error => {
      if (error.source === 'products') {
        this.hasError = true
        this.error = error
      }
    })
  }
}
typescript.js
// Products App - Error handling
export class ProductsComponent {
  constructor(
    private productsService: ProductsService,
    private errorHandler: ErrorHandlerService
  ) {}
  
  ngOnInit() {
    this.productsService.getProducts().subscribe({
      next: products => {
        this.products = products
      },
      error: err => {
        // Report error to host
        this.errorHandler.reportError({
          source: 'products',
          message: 'Failed to load products',
          error: err
        })
      }
    })
  }
}

Strategy 3: Timeouts and Retries

The Problem

Network is slow. Products remoteEntry.js takes 30 seconds to load. User gives up.

The Solution: Timeout with Retry

typescript.js
// Host App - Timeout wrapper
export function loadRemoteWithTimeout<T>(
  loadFn: () => Promise<T>,
  timeout: number = 10000,
  retries: number = 3
): Promise<T> {
  return new Promise((resolve, reject) => {
    let attempts = 0
    
    const attempt = () => {
      attempts++
      
      const timeoutId = setTimeout(() => {
        if (attempts < retries) {
          console.warn(`Remote load timeout, retrying... (${attempts}/${retries})`)
          attempt()
        } else {
          reject(new Error('Remote load timeout after retries'))
        }
      }, timeout)
      
      loadFn()
        .then(module => {
          clearTimeout(timeoutId)
          resolve(module)
        })
        .catch(err => {
          clearTimeout(timeoutId)
          if (attempts < retries) {
            console.warn(`Remote load failed, retrying... (${attempts}/${retries})`)
            setTimeout(attempt, 1000 * attempts) // Exponential backoff
          } else {
            reject(err)
          }
        })
    }
    
    attempt()
  })
}
typescript.js
// Use in routes
const routes: Routes = [
  {
    path: 'products',
    loadChildren: () => loadRemoteWithTimeout(
      () => import('products/ProductsModule').then(m => m.ProductsModule),
      10000, // 10 second timeout
      3      // 3 retries
    ).catch(err => {
      return import('./fallback/products-fallback.module')
        .then(m => m.ProductsFallbackModule)
    })
  }
]

Retry Strategy

  • Exponential Backoff - Wait longer between retries
  • Max Retries - Don't retry forever
  • User Feedback - Show loading/retry status

Strategy 4: Fallback UI Strategies

Option 1: Cached Content

typescript.js
// Show cached products if remote fails
export class ProductsFallbackComponent {
  cachedProducts: Product[] = []
  
  ngOnInit() {
    // Load from cache
    const cached = localStorage.getItem('products-cache')
    if (cached) {
      this.cachedProducts = JSON.parse(cached)
    }
  }
}

Option 2: Disable Features

typescript.js
// Disable checkout if payment service down
export class CheckoutComponent {
  canCheckout = true
  
  ngOnInit() {
    this.paymentService.healthCheck().subscribe({
      next: () => this.canCheckout = true,
      error: () => {
        this.canCheckout = false
        this.showMessage('Checkout temporarily unavailable')
      }
    })
  }
}

Option 3: Feature Kill-Switches

typescript.js
// Kill switch for features
@Injectable({ providedIn: 'root' })
export class FeatureFlagService {
  private flags: Map<string, boolean> = new Map()
  
  constructor(private http: HttpClient) {
    this.loadFlags()
  }
  
  isEnabled(feature: string): boolean {
    return this.flags.get(feature) ?? true
  }
  
  private loadFlags() {
    // Load from config service
    this.http.get('/api/feature-flags').subscribe(flags => {
      Object.entries(flags).forEach(([key, value]) => {
        this.flags.set(key, value)
      })
    })
  }
}
typescript.js
// Use kill switch
export class ProductsComponent {
  constructor(private featureFlags: FeatureFlagService) {}
  
  ngOnInit() {
    if (!this.featureFlags.isEnabled('products')) {
      // Show disabled message
      this.showDisabledMessage()
    }
  }
}

Strategy 5: Health Checks

The Problem

How do we know if a remote is available before trying to load it?

The Solution: Pre-flight Health Check

typescript.js
// Health check service
@Injectable({ providedIn: 'root' })
export class RemoteHealthService {
  private healthCache: Map<string, boolean> = new Map()
  
  async checkHealth(remoteUrl: string): Promise<boolean> {
    // Check cache first
    if (this.healthCache.has(remoteUrl)) {
      return this.healthCache.get(remoteUrl)!
    }
    
    try {
      const response = await fetch(`${remoteUrl}/health`, {
        method: 'HEAD',
        signal: AbortSignal.timeout(5000) // 5 second timeout
      })
      
      const isHealthy = response.ok
      this.healthCache.set(remoteUrl, isHealthy)
      
      // Cache for 30 seconds
      setTimeout(() => {
        this.healthCache.delete(remoteUrl)
      }, 30000)
      
      return isHealthy
    } catch {
      this.healthCache.set(remoteUrl, false)
      return false
    }
  }
}
typescript.js
// Use in route loading
const routes: Routes = [
  {
    path: 'products',
    loadChildren: async () => {
      const healthService = inject(RemoteHealthService)
      const isHealthy = await healthService.checkHealth('https://products.example.com')
      
      if (isHealthy) {
        return import('products/ProductsModule').then(m => m.ProductsModule)
      } else {
        return import('./fallback/products-fallback.module')
          .then(m => m.ProductsFallbackModule)
      }
    }
  }
]

Error Tracking

The Problem

How do we know when failures happen in production?

The Solution: Centralized Error Tracking

typescript.js
// Error tracking service
@Injectable({ providedIn: 'root' })
export class ErrorTrackingService {
  constructor(private http: HttpClient) {}
  
  trackError(error: Error, context: ErrorContext) {
    const errorReport = {
      message: error.message,
      stack: error.stack,
      source: context.source, // 'products', 'checkout', 'host'
      url: window.location.href,
      userAgent: navigator.userAgent,
      timestamp: new Date().toISOString()
    }
    
    // Send to error tracking service
    this.http.post('/api/errors', errorReport).subscribe()
    
    // Also log to console in development
    if (!environment.production) {
      console.error('Error tracked:', errorReport)
    }
  }
}
typescript.js
// Global error handler
@Injectable()
export class GlobalErrorHandler implements ErrorHandler {
  constructor(private errorTracking: ErrorTrackingService) {}
  
  handleError(error: Error) {
    this.errorTracking.trackError(error, {
      source: 'unknown'
    })
  }
}

Production Checklist

  • [ ] Error boundaries for each remote
  • [ ] Timeout strategies implemented
  • [ ] Retry logic with backoff
  • [ ] Fallback UI for each remote
  • [ ] Health checks for remotes
  • [ ] Error tracking configured
  • [ ] Feature kill-switches implemented
  • [ ] Graceful degradation tested

Key Takeaways

  1. Failure is the default - Design for it
  2. Graceful degradation - App should never crash
  3. User feedback - Show what's happening
  4. Retry strategies - But don't retry forever
  5. Monitor failures - Track what breaks

Remember: In production, the question isn't "Will it fail?" It's "What happens when it fails?"

The next module: How to test this architecture.