Module 2: The Problem Statement

Chapter 2 • Beginner

30 min

The Problem Kafka Solves

Before we learn how Kafka works, we need to clearly understand why Kafka exists at all. Modern systems generate huge amounts of real-time data, and traditional architectures struggle badly under this load.

This module focuses on the pain points of traditional systems and sets up the need for event-driven architecture and event streaming platforms like Kafka.

🎯 What You Will Learn

By the end of this module, you will be able to:

Identify the main problems with traditional, API-based integrations
Recognize issues like tight coupling, data loss, low throughput, and cascading failures
Understand why real-time, high-volume systems need something beyond simple APIs or basic message queues
Explain the benefits of the publish-subscribe (pub/sub) model
Understand where event streaming (Kafka) goes beyond traditional message queues

⚡ The Challenge of Real-Time Data

Modern applications generate continuous streams of events:

User clicks, page views, searches
Orders, payments, inventory updates
Logs, metrics, monitoring data
IoT sensor readings

If we try to handle all of this using only direct API calls and databases, we quickly run into serious problems.

🏗️ Traditional System Problems

1. Direct API Integrations (Tight Coupling)

In many legacy systems, services talk to each other directly via synchronous APIs:

code

    Service A → API Call → Service B
    Service A → API Call → Service C
    Service A → API Call → Service D

Problems:

Tight coupling: Service A must know about B, C, and D
Single point of failure: If B is down, A may also fail
Hard to scale: Every new consumer means more API calls and more load
Blocking calls: A waits for responses, increasing latency

The more services you add, the more complex this web of dependencies becomes.

2. Data Loss and Reliability Issues

In synchronous, API-based flows:

Network failures lead to lost requests
If a downstream service is down, data is often dropped or retried blindly
There is usually no durable log of events
Recovery is hard because you can’t easily replay what happened

There’s no central place where events are reliably stored.

3. Low Throughput and Performance Bottlenecks

Traditional architectures suffer from:

Synchronous processing that blocks threads while waiting for responses
Database bottlenecks when everything is written/read via a single DB
Limited horizontal scaling
CPU and memory constraints on individual services

As traffic grows, these bottlenecks become painful and expensive.

4. Tight Coupling Everywhere

Coupling appears at multiple levels:

Services depend on each other’s APIs, payloads, and availability
Adding a new consumer (e.g., analytics, notifications) often means code changes in multiple services
Rolling out new features is risky because one small change can break many other services

This slows down development and increases operational risk.

🌍 Real-World Pain: Example Scenarios

Scenario 1: E-commerce Platform (Synchronous)

code

    User Action → Order Service → Inventory Service
                        ↓
                  Payment Service → Email Service
                        ↓
                  Analytics Service → Recommendation Service

What goes wrong here?

If Inventory Service is down → the entire order flow fails
Payment Service waits for Inventory → user sees slow checkout
Adding a new service (e.g. SMS notifications) requires new calls
Analytics data is lost if the Analytics Service is temporarily down

This creates a fragile system that’s hard to evolve.

Scenario 2: Social Media Platform

code

    User Post → Content Service → Notification Service
                        ↓
                  Analytics Service → Feed Service
                        ↓
                  Search Service → Cache Service

Problems:

High latency: user waits while multiple services are called
Data inconsistency: some services update, others fail or lag behind
Traffic spikes (e.g., viral posts) overload synchronous APIs
Error handling becomes complex and difficult to reason about

As the platform grows, this architecture becomes increasingly unstable.

📨 The Need for a Publish-Subscribe Model

Clearly, we need something better.

What We Actually Need

Decoupling

Services shouldn’t depend heavily on each other’s availability or APIs.

Reliability

Messages (events) must not be silently lost.

Scalability

The system should handle increasing load by adding more machines.

Fault Tolerance

Parts of the system can fail without bringing everything down.

Real-Time Processing

Events should be processed with low latency.

Replay Capability

We should be able to reprocess events later (for bugs, analytics, new features).

Publish-Subscribe (Pub/Sub) Benefits

In a publish-subscribe model:

Producers (publishers) send events to a central system (like a topic)
Consumers (subscribers) read the events they are interested in

Publishers don’t know who is consuming, and consumers don’t know who produced the event.

Benefits:

Loose coupling: Producers and consumers are independent
Scalability: New consumers can be added without impacting producers
Reliability: Events can be persisted and retried
Performance: Asynchronous processing reduces latency
Flexibility: Multiple consumers can process the same event for different purposes

This is the conceptual foundation of event-driven architecture.

📦 Traditional Message Queues vs Event Streaming

Many teams start with traditional message queues like RabbitMQ or ActiveMQ. These are good tools but have limitations when you move into large-scale event streaming.

Traditional Message Queues (RabbitMQ, ActiveMQ)

Point-to-point messaging (one consumer processes a message)
Message is typically consumed once and removed
Limited retention (messages are not stored long-term)
Good for task queues but not ideal for replaying history
Throughput is usually lower compared to Kafka at scale

Event Streaming (Kafka)

Publish-subscribe model with topics and partitions
Messages are written to a log and can be kept for hours, days, or longer
Multiple consumers can read the same data
High throughput, designed for large-scale, continuous streams
Built-in replay capability (you can re-read past events)

Kafka is not just about passing messages; it’s about storing, streaming, and replaying event data at scale.

🧠 The Solution: Event-Driven Architecture

All of this leads to a new way of designing systems: event-driven architecture.

With event-driven architecture:

Services publish events when something happens (OrderCreated, PaymentCompleted, UserRegistered, etc.)
Other services subscribe to the events they care about
Services can process events in parallel and independently
New services can be added without changing existing ones
Failures are isolated, and the system can recover more gracefully

Kafka is a core building block for this style of architecture.

✅ Key Takeaways

Traditional, API-only architectures become fragile and slow as systems grow
Common problems: tight coupling, data loss, low throughput, cascading failures
Real-time, high-volume applications need decoupled, reliable, scalable communication
The publish-subscribe model solves many of these issues
Traditional message queues help, but event streaming platforms like Kafka go further
Kafka enables event-driven architecture, which is the modern answer to these integration problems

📚 What’s Next?

In the next module, you will learn:

“How Kafka Solves the Problem” – a detailed look at how Kafka’s architecture (topics, partitions, brokers, consumer groups, persistence, replication) addresses exactly the problems we discussed here.

Continue with: Module 3 – How Kafka Solves the Problem.

Hands-on Examples

Traditional vs Event-Driven Architecture

# Traditional Synchronous System (Problems)
    
    ## Order Processing Flow:
    1. User places order
    2. Order Service calls Inventory Service (SYNC)
    3. Wait for inventory check
    4. Order Service calls Payment Service (SYNC)
    5. Wait for payment processing
    6. Order Service calls Email Service (SYNC)
    7. Wait for email confirmation
    8. Return response to user
    
    ## Problems:
    - High latency (sum of all service calls)
    - Single point of failure
    - Tight coupling
    - Difficult to scale
    - Complex error handling
    
    # Event-Driven System (Solution)
    
    ## Order Processing Flow:
    1. User places order
    2. Order Service publishes "OrderCreated" event
    3. Multiple services consume the event:
      - Inventory Service: Reserve items
      - Payment Service: Process payment
      - Analytics Service: Track metrics
      - Email Service: Send confirmation
    4. Each service publishes its own events
    5. Order Service updates status based on events
    
    ## Benefits:
    - Lower latency (asynchronous)
    - Fault tolerant
    - Loose coupling
    - Easier to scale
    - Simpler error handling

This comparison shows how event-driven architecture solves the fundamental problems of traditional synchronous systems by decoupling services and using asynchronous processing.

Module 1: Introduction to Kafka

Module 3: How Kafka Solves the Problem

Module 2: The Problem Statement

The Problem Kafka Solves

🎯 What You Will Learn

⚡ The Challenge of Real-Time Data

🏗️ Traditional System Problems

1. Direct API Integrations (Tight Coupling)

2. Data Loss and Reliability Issues

3. Low Throughput and Performance Bottlenecks

4. Tight Coupling Everywhere

🌍 Real-World Pain: Example Scenarios

Scenario 1: E-commerce Platform (Synchronous)

Scenario 2: Social Media Platform

📨 The Need for a Publish-Subscribe Model

What We Actually Need

Publish-Subscribe (Pub/Sub) Benefits

📦 Traditional Message Queues vs Event Streaming

Traditional Message Queues (RabbitMQ, ActiveMQ)

Event Streaming (Kafka)

🧠 The Solution: Event-Driven Architecture

✅ Key Takeaways

📚 What’s Next?

Hands-on Examples

Traditional vs Event-Driven Architecture

Related Tutorials

Previous: Module 1: Introduction to Kafka

Next: Module 3: How Kafka Solves the Problem