02

Module 2: The Problem Statement

Chapter 2 โ€ข Beginner

30 min

The Problem Kafka Solves

Before we learn how Kafka works, we need to clearly understand why Kafka exists at all. Modern systems generate huge amounts of real-time data, and traditional architectures struggle badly under this load.

This module focuses on the pain points of traditional systems and sets up the need for event-driven architecture and event streaming platforms like Kafka.


๐ŸŽฏ What You Will Learn

By the end of this module, you will be able to:

  • Identify the main problems with traditional, API-based integrations
  • Recognize issues like tight coupling, data loss, low throughput, and cascading failures
  • Understand why real-time, high-volume systems need something beyond simple APIs or basic message queues
  • Explain the benefits of the publish-subscribe (pub/sub) model
  • Understand where event streaming (Kafka) goes beyond traditional message queues

โšก The Challenge of Real-Time Data

Modern applications generate continuous streams of events:

  • User clicks, page views, searches
  • Orders, payments, inventory updates
  • Logs, metrics, monitoring data
  • IoT sensor readings

If we try to handle all of this using only direct API calls and databases, we quickly run into serious problems.


๐Ÿ—๏ธ Traditional System Problems

1. Direct API Integrations (Tight Coupling)

In many legacy systems, services talk to each other directly via synchronous APIs:

code
    Service A โ†’ API Call โ†’ Service B
    Service A โ†’ API Call โ†’ Service C
    Service A โ†’ API Call โ†’ Service D
    

Problems:

  • Tight coupling: Service A must know about B, C, and D
  • Single point of failure: If B is down, A may also fail
  • Hard to scale: Every new consumer means more API calls and more load
  • Blocking calls: A waits for responses, increasing latency

The more services you add, the more complex this web of dependencies becomes.


2. Data Loss and Reliability Issues

In synchronous, API-based flows:

  • Network failures lead to lost requests
  • If a downstream service is down, data is often dropped or retried blindly
  • There is usually no durable log of events
  • Recovery is hard because you canโ€™t easily replay what happened

Thereโ€™s no central place where events are reliably stored.


3. Low Throughput and Performance Bottlenecks

Traditional architectures suffer from:

  • Synchronous processing that blocks threads while waiting for responses
  • Database bottlenecks when everything is written/read via a single DB
  • Limited horizontal scaling
  • CPU and memory constraints on individual services

As traffic grows, these bottlenecks become painful and expensive.


4. Tight Coupling Everywhere

Coupling appears at multiple levels:

  • Services depend on each otherโ€™s APIs, payloads, and availability
  • Adding a new consumer (e.g., analytics, notifications) often means code changes in multiple services
  • Rolling out new features is risky because one small change can break many other services

This slows down development and increases operational risk.


๐ŸŒ Real-World Pain: Example Scenarios

Scenario 1: E-commerce Platform (Synchronous)

code
    User Action โ†’ Order Service โ†’ Inventory Service
                        โ†“
                  Payment Service โ†’ Email Service
                        โ†“
                  Analytics Service โ†’ Recommendation Service
    

What goes wrong here?

  • If Inventory Service is down โ†’ the entire order flow fails
  • Payment Service waits for Inventory โ†’ user sees slow checkout
  • Adding a new service (e.g. SMS notifications) requires new calls
  • Analytics data is lost if the Analytics Service is temporarily down

This creates a fragile system thatโ€™s hard to evolve.


Scenario 2: Social Media Platform

code
    User Post โ†’ Content Service โ†’ Notification Service
                        โ†“
                  Analytics Service โ†’ Feed Service
                        โ†“
                  Search Service โ†’ Cache Service
    

Problems:

  • High latency: user waits while multiple services are called
  • Data inconsistency: some services update, others fail or lag behind
  • Traffic spikes (e.g., viral posts) overload synchronous APIs
  • Error handling becomes complex and difficult to reason about

As the platform grows, this architecture becomes increasingly unstable.


๐Ÿ“จ The Need for a Publish-Subscribe Model

Clearly, we need something better.

What We Actually Need

  1. Decoupling

Services shouldnโ€™t depend heavily on each otherโ€™s availability or APIs.

  1. Reliability

Messages (events) must not be silently lost.

  1. Scalability

The system should handle increasing load by adding more machines.

  1. Fault Tolerance

Parts of the system can fail without bringing everything down.

  1. Real-Time Processing

Events should be processed with low latency.

  1. Replay Capability

We should be able to reprocess events later (for bugs, analytics, new features).


Publish-Subscribe (Pub/Sub) Benefits

In a publish-subscribe model:

  • Producers (publishers) send events to a central system (like a topic)
  • Consumers (subscribers) read the events they are interested in

Publishers donโ€™t know who is consuming, and consumers donโ€™t know who produced the event.

Benefits:

  • Loose coupling: Producers and consumers are independent
  • Scalability: New consumers can be added without impacting producers
  • Reliability: Events can be persisted and retried
  • Performance: Asynchronous processing reduces latency
  • Flexibility: Multiple consumers can process the same event for different purposes

This is the conceptual foundation of event-driven architecture.


๐Ÿ“ฆ Traditional Message Queues vs Event Streaming

Many teams start with traditional message queues like RabbitMQ or ActiveMQ. These are good tools but have limitations when you move into large-scale event streaming.

Traditional Message Queues (RabbitMQ, ActiveMQ)

  • Point-to-point messaging (one consumer processes a message)
  • Message is typically consumed once and removed
  • Limited retention (messages are not stored long-term)
  • Good for task queues but not ideal for replaying history
  • Throughput is usually lower compared to Kafka at scale

Event Streaming (Kafka)

  • Publish-subscribe model with topics and partitions
  • Messages are written to a log and can be kept for hours, days, or longer
  • Multiple consumers can read the same data
  • High throughput, designed for large-scale, continuous streams
  • Built-in replay capability (you can re-read past events)

Kafka is not just about passing messages; itโ€™s about storing, streaming, and replaying event data at scale.


๐Ÿง  The Solution: Event-Driven Architecture

All of this leads to a new way of designing systems: event-driven architecture.

With event-driven architecture:

  • Services publish events when something happens (OrderCreated, PaymentCompleted, UserRegistered, etc.)
  • Other services subscribe to the events they care about
  • Services can process events in parallel and independently
  • New services can be added without changing existing ones
  • Failures are isolated, and the system can recover more gracefully

Kafka is a core building block for this style of architecture.


โœ… Key Takeaways

  • Traditional, API-only architectures become fragile and slow as systems grow
  • Common problems: tight coupling, data loss, low throughput, cascading failures
  • Real-time, high-volume applications need decoupled, reliable, scalable communication
  • The publish-subscribe model solves many of these issues
  • Traditional message queues help, but event streaming platforms like Kafka go further
  • Kafka enables event-driven architecture, which is the modern answer to these integration problems

๐Ÿ“š Whatโ€™s Next?

In the next module, you will learn:

โ€œHow Kafka Solves the Problemโ€ โ€“ a detailed look at how Kafkaโ€™s architecture (topics, partitions, brokers, consumer groups, persistence, replication) addresses exactly the problems we discussed here.

Continue with: Module 3 โ€“ How Kafka Solves the Problem.

Hands-on Examples

Traditional vs Event-Driven Architecture

# Traditional Synchronous System (Problems)
    
    ## Order Processing Flow:
    1. User places order
    2. Order Service calls Inventory Service (SYNC)
    3. Wait for inventory check
    4. Order Service calls Payment Service (SYNC)
    5. Wait for payment processing
    6. Order Service calls Email Service (SYNC)
    7. Wait for email confirmation
    8. Return response to user
    
    ## Problems:
    - High latency (sum of all service calls)
    - Single point of failure
    - Tight coupling
    - Difficult to scale
    - Complex error handling
    
    # Event-Driven System (Solution)
    
    ## Order Processing Flow:
    1. User places order
    2. Order Service publishes "OrderCreated" event
    3. Multiple services consume the event:
      - Inventory Service: Reserve items
      - Payment Service: Process payment
      - Analytics Service: Track metrics
      - Email Service: Send confirmation
    4. Each service publishes its own events
    5. Order Service updates status based on events
    
    ## Benefits:
    - Lower latency (asynchronous)
    - Fault tolerant
    - Loose coupling
    - Easier to scale
    - Simpler error handling

This comparison shows how event-driven architecture solves the fundamental problems of traditional synchronous systems by decoupling services and using asynchronous processing.