Failure Is the Default (Error & Resilience)

The Real Problem

Scenario: It's 2 AM. Products app deployment fails. App is down. Users are trying to checkout. What happens?

The Question: What happens when Products app is down at 2 AM and users are trying to checkout?

This is where most courses chicken out. Don't.

In production, failure is not an exception. Failure is the default.

The Production Reality

Failure Modes

Remote Unavailable - Products app is down
Partial Failure - Products fails, Checkout works
Network Failure - Slow network, timeouts
Version Mismatch - Products updated, Checkout hasn't
Load Failure - remoteEntry.js fails to load

The Question: How do we handle each failure mode gracefully?

Strategy 1: Remote Unavailable at Runtime

The Problem

Products app is down. User navigates to /products. What happens?

The Solution: Graceful Degradation

typescript.js

// Host App - Route with error handling
const routes: Routes = [
  {
    path: 'products',
    loadChildren: () => import('products/ProductsModule')
      .then(m => m.ProductsModule)
      .catch(err => {
        console.error('Products app unavailable', err)
        // Return fallback module
        return import('./fallback/products-fallback.module')
          .then(m => m.ProductsFallbackModule)
      })
  }
]

typescript.js

// Fallback Module - Shows cached or error state
@NgModule({
  declarations: [ProductsFallbackComponent],
  imports: [CommonModule]
})
export class ProductsFallbackModule {}

@Component({
  template: `
    <div class="error-state">
      <h2>Products temporarily unavailable</h2>
      <p>We're working on it. Please try again later.</p>
      <button (click)="retry()">Retry</button>
    </div>
  `
})
export class ProductsFallbackComponent {
  retry() {
    window.location.reload()
  }
}

Why This Works

✅ App Doesn't Crash - Host continues working

✅ User Sees Message - Not a blank screen

✅ Retry Option - User can try again

✅ Other Apps Work - Checkout still functional

Strategy 2: Partial App Failure

The Problem

Products app loads, but API calls fail. Products shows error, but rest of app works.

The Solution: Error Boundaries Per Remote

typescript.js

// Host App - Error boundary component
@Component({
  selector: 'app-remote-container',
  template: `
    <ng-container *ngIf="!hasError; else errorTemplate">
      <router-outlet></router-outlet>
    </ng-container>
    <ng-template #errorTemplate>
      <app-error-fallback [error]="error"></app-error-fallback>
    </ng-template>
  `
})
export class RemoteContainerComponent implements OnInit {
  hasError = false
  error: any = null
  
  constructor(
    private router: Router,
    private errorHandler: ErrorHandlerService
  ) {}
  
  ngOnInit() {
    // Listen for errors from remotes
    this.errorHandler.errors$.subscribe(error => {
      if (error.source === 'products') {
        this.hasError = true
        this.error = error
      }
    })
  }
}

typescript.js

// Products App - Error handling
export class ProductsComponent {
  constructor(
    private productsService: ProductsService,
    private errorHandler: ErrorHandlerService
  ) {}
  
  ngOnInit() {
    this.productsService.getProducts().subscribe({
      next: products => {
        this.products = products
      },
      error: err => {
        // Report error to host
        this.errorHandler.reportError({
          source: 'products',
          message: 'Failed to load products',
          error: err
        })
      }
    })
  }
}

Strategy 3: Timeouts and Retries

The Problem

Network is slow. Products remoteEntry.js takes 30 seconds to load. User gives up.

The Solution: Timeout with Retry

typescript.js

// Host App - Timeout wrapper
export function loadRemoteWithTimeout<T>(
  loadFn: () => Promise<T>,
  timeout: number = 10000,
  retries: number = 3
): Promise<T> {
  return new Promise((resolve, reject) => {
    let attempts = 0
    
    const attempt = () => {
      attempts++
      
      const timeoutId = setTimeout(() => {
        if (attempts < retries) {
          console.warn(`Remote load timeout, retrying... (${attempts}/${retries})`)
          attempt()
        } else {
          reject(new Error('Remote load timeout after retries'))
        }
      }, timeout)
      
      loadFn()
        .then(module => {
          clearTimeout(timeoutId)
          resolve(module)
        })
        .catch(err => {
          clearTimeout(timeoutId)
          if (attempts < retries) {
            console.warn(`Remote load failed, retrying... (${attempts}/${retries})`)
            setTimeout(attempt, 1000 * attempts) // Exponential backoff
          } else {
            reject(err)
          }
        })
    }
    
    attempt()
  })
}

typescript.js

// Use in routes
const routes: Routes = [
  {
    path: 'products',
    loadChildren: () => loadRemoteWithTimeout(
      () => import('products/ProductsModule').then(m => m.ProductsModule),
      10000, // 10 second timeout
      3      // 3 retries
    ).catch(err => {
      return import('./fallback/products-fallback.module')
        .then(m => m.ProductsFallbackModule)
    })
  }
]

Retry Strategy

Exponential Backoff - Wait longer between retries
Max Retries - Don't retry forever
User Feedback - Show loading/retry status

Strategy 4: Fallback UI Strategies

Option 1: Cached Content

typescript.js

// Show cached products if remote fails
export class ProductsFallbackComponent {
  cachedProducts: Product[] = []
  
  ngOnInit() {
    // Load from cache
    const cached = localStorage.getItem('products-cache')
    if (cached) {
      this.cachedProducts = JSON.parse(cached)
    }
  }
}

Option 2: Disable Features

typescript.js

// Disable checkout if payment service down
export class CheckoutComponent {
  canCheckout = true
  
  ngOnInit() {
    this.paymentService.healthCheck().subscribe({
      next: () => this.canCheckout = true,
      error: () => {
        this.canCheckout = false
        this.showMessage('Checkout temporarily unavailable')
      }
    })
  }
}

Option 3: Feature Kill-Switches

typescript.js

// Kill switch for features
@Injectable({ providedIn: 'root' })
export class FeatureFlagService {
  private flags: Map<string, boolean> = new Map()
  
  constructor(private http: HttpClient) {
    this.loadFlags()
  }
  
  isEnabled(feature: string): boolean {
    return this.flags.get(feature) ?? true
  }
  
  private loadFlags() {
    // Load from config service
    this.http.get('/api/feature-flags').subscribe(flags => {
      Object.entries(flags).forEach(([key, value]) => {
        this.flags.set(key, value)
      })
    })
  }
}

typescript.js

// Use kill switch
export class ProductsComponent {
  constructor(private featureFlags: FeatureFlagService) {}
  
  ngOnInit() {
    if (!this.featureFlags.isEnabled('products')) {
      // Show disabled message
      this.showDisabledMessage()
    }
  }
}

Strategy 5: Health Checks

The Problem

How do we know if a remote is available before trying to load it?

The Solution: Pre-flight Health Check

typescript.js

// Health check service
@Injectable({ providedIn: 'root' })
export class RemoteHealthService {
  private healthCache: Map<string, boolean> = new Map()
  
  async checkHealth(remoteUrl: string): Promise<boolean> {
    // Check cache first
    if (this.healthCache.has(remoteUrl)) {
      return this.healthCache.get(remoteUrl)!
    }
    
    try {
      const response = await fetch(`${remoteUrl}/health`, {
        method: 'HEAD',
        signal: AbortSignal.timeout(5000) // 5 second timeout
      })
      
      const isHealthy = response.ok
      this.healthCache.set(remoteUrl, isHealthy)
      
      // Cache for 30 seconds
      setTimeout(() => {
        this.healthCache.delete(remoteUrl)
      }, 30000)
      
      return isHealthy
    } catch {
      this.healthCache.set(remoteUrl, false)
      return false
    }
  }
}

typescript.js

// Use in route loading
const routes: Routes = [
  {
    path: 'products',
    loadChildren: async () => {
      const healthService = inject(RemoteHealthService)
      const isHealthy = await healthService.checkHealth('https://products.example.com')
      
      if (isHealthy) {
        return import('products/ProductsModule').then(m => m.ProductsModule)
      } else {
        return import('./fallback/products-fallback.module')
          .then(m => m.ProductsFallbackModule)
      }
    }
  }
]

Error Tracking

The Problem

How do we know when failures happen in production?

The Solution: Centralized Error Tracking

typescript.js

// Error tracking service
@Injectable({ providedIn: 'root' })
export class ErrorTrackingService {
  constructor(private http: HttpClient) {}
  
  trackError(error: Error, context: ErrorContext) {
    const errorReport = {
      message: error.message,
      stack: error.stack,
      source: context.source, // 'products', 'checkout', 'host'
      url: window.location.href,
      userAgent: navigator.userAgent,
      timestamp: new Date().toISOString()
    }
    
    // Send to error tracking service
    this.http.post('/api/errors', errorReport).subscribe()
    
    // Also log to console in development
    if (!environment.production) {
      console.error('Error tracked:', errorReport)
    }
  }
}

typescript.js

// Global error handler
@Injectable()
export class GlobalErrorHandler implements ErrorHandler {
  constructor(private errorTracking: ErrorTrackingService) {}
  
  handleError(error: Error) {
    this.errorTracking.trackError(error, {
      source: 'unknown'
    })
  }
}

Production Checklist

[ ] Error boundaries for each remote
[ ] Timeout strategies implemented
[ ] Retry logic with backoff
[ ] Fallback UI for each remote
[ ] Health checks for remotes
[ ] Error tracking configured
[ ] Feature kill-switches implemented
[ ] Graceful degradation tested

Key Takeaways

Failure is the default - Design for it
Graceful degradation - App should never crash
User feedback - Show what's happening
Retry strategies - But don't retry forever
Monitor failures - Track what breaks

Remember: In production, the question isn't "Will it fail?" It's "What happens when it fails?"

The next module: How to test this architecture.

Module 18: Failure Is the Default (Error & Resilience)

Failure Is the Default (Error & Resilience)

The Real Problem

The Production Reality

Failure Modes

Strategy 1: Remote Unavailable at Runtime

The Problem

The Solution: Graceful Degradation

Why This Works

Strategy 2: Partial App Failure

The Problem

The Solution: Error Boundaries Per Remote

Strategy 3: Timeouts and Retries

The Problem

The Solution: Timeout with Retry

Retry Strategy

Strategy 4: Fallback UI Strategies

Option 1: Cached Content

Option 2: Disable Features

Option 3: Feature Kill-Switches

Strategy 5: Health Checks

The Problem

The Solution: Pre-flight Health Check

Error Tracking

The Problem

The Solution: Centralized Error Tracking

Production Checklist

Key Takeaways

Related Tutorials

Previous: Module 17: Communication Patterns That Scale (Production Architecture)

Next: Module 19: Testing Micro Frontends Like a Professional