Module 2: The Problem Statement
Chapter 2 โข Beginner
The Problem Kafka Solves
Before we learn how Kafka works, we need to clearly understand why Kafka exists at all. Modern systems generate huge amounts of real-time data, and traditional architectures struggle badly under this load.
This module focuses on the pain points of traditional systems and sets up the need for event-driven architecture and event streaming platforms like Kafka.
๐ฏ What You Will Learn
By the end of this module, you will be able to:
- Identify the main problems with traditional, API-based integrations
- Recognize issues like tight coupling, data loss, low throughput, and cascading failures
- Understand why real-time, high-volume systems need something beyond simple APIs or basic message queues
- Explain the benefits of the publish-subscribe (pub/sub) model
- Understand where event streaming (Kafka) goes beyond traditional message queues
โก The Challenge of Real-Time Data
Modern applications generate continuous streams of events:
- User clicks, page views, searches
- Orders, payments, inventory updates
- Logs, metrics, monitoring data
- IoT sensor readings
If we try to handle all of this using only direct API calls and databases, we quickly run into serious problems.
๐๏ธ Traditional System Problems
1. Direct API Integrations (Tight Coupling)
In many legacy systems, services talk to each other directly via synchronous APIs:
Service A โ API Call โ Service B
Service A โ API Call โ Service C
Service A โ API Call โ Service D
Problems:
- Tight coupling: Service A must know about B, C, and D
- Single point of failure: If B is down, A may also fail
- Hard to scale: Every new consumer means more API calls and more load
- Blocking calls: A waits for responses, increasing latency
The more services you add, the more complex this web of dependencies becomes.
2. Data Loss and Reliability Issues
In synchronous, API-based flows:
- Network failures lead to lost requests
- If a downstream service is down, data is often dropped or retried blindly
- There is usually no durable log of events
- Recovery is hard because you canโt easily replay what happened
Thereโs no central place where events are reliably stored.
3. Low Throughput and Performance Bottlenecks
Traditional architectures suffer from:
- Synchronous processing that blocks threads while waiting for responses
- Database bottlenecks when everything is written/read via a single DB
- Limited horizontal scaling
- CPU and memory constraints on individual services
As traffic grows, these bottlenecks become painful and expensive.
4. Tight Coupling Everywhere
Coupling appears at multiple levels:
- Services depend on each otherโs APIs, payloads, and availability
- Adding a new consumer (e.g., analytics, notifications) often means code changes in multiple services
- Rolling out new features is risky because one small change can break many other services
This slows down development and increases operational risk.
๐ Real-World Pain: Example Scenarios
Scenario 1: E-commerce Platform (Synchronous)
User Action โ Order Service โ Inventory Service
โ
Payment Service โ Email Service
โ
Analytics Service โ Recommendation Service
What goes wrong here?
- If Inventory Service is down โ the entire order flow fails
- Payment Service waits for Inventory โ user sees slow checkout
- Adding a new service (e.g. SMS notifications) requires new calls
- Analytics data is lost if the Analytics Service is temporarily down
This creates a fragile system thatโs hard to evolve.
Scenario 2: Social Media Platform
User Post โ Content Service โ Notification Service
โ
Analytics Service โ Feed Service
โ
Search Service โ Cache Service
Problems:
- High latency: user waits while multiple services are called
- Data inconsistency: some services update, others fail or lag behind
- Traffic spikes (e.g., viral posts) overload synchronous APIs
- Error handling becomes complex and difficult to reason about
As the platform grows, this architecture becomes increasingly unstable.
๐จ The Need for a Publish-Subscribe Model
Clearly, we need something better.
What We Actually Need
- Decoupling
Services shouldnโt depend heavily on each otherโs availability or APIs.
- Reliability
Messages (events) must not be silently lost.
- Scalability
The system should handle increasing load by adding more machines.
- Fault Tolerance
Parts of the system can fail without bringing everything down.
- Real-Time Processing
Events should be processed with low latency.
- Replay Capability
We should be able to reprocess events later (for bugs, analytics, new features).
Publish-Subscribe (Pub/Sub) Benefits
In a publish-subscribe model:
- Producers (publishers) send events to a central system (like a topic)
- Consumers (subscribers) read the events they are interested in
Publishers donโt know who is consuming, and consumers donโt know who produced the event.
Benefits:
- Loose coupling: Producers and consumers are independent
- Scalability: New consumers can be added without impacting producers
- Reliability: Events can be persisted and retried
- Performance: Asynchronous processing reduces latency
- Flexibility: Multiple consumers can process the same event for different purposes
This is the conceptual foundation of event-driven architecture.
๐ฆ Traditional Message Queues vs Event Streaming
Many teams start with traditional message queues like RabbitMQ or ActiveMQ. These are good tools but have limitations when you move into large-scale event streaming.
Traditional Message Queues (RabbitMQ, ActiveMQ)
- Point-to-point messaging (one consumer processes a message)
- Message is typically consumed once and removed
- Limited retention (messages are not stored long-term)
- Good for task queues but not ideal for replaying history
- Throughput is usually lower compared to Kafka at scale
Event Streaming (Kafka)
- Publish-subscribe model with topics and partitions
- Messages are written to a log and can be kept for hours, days, or longer
- Multiple consumers can read the same data
- High throughput, designed for large-scale, continuous streams
- Built-in replay capability (you can re-read past events)
Kafka is not just about passing messages; itโs about storing, streaming, and replaying event data at scale.
๐ง The Solution: Event-Driven Architecture
All of this leads to a new way of designing systems: event-driven architecture.
With event-driven architecture:
- Services publish events when something happens (
OrderCreated,PaymentCompleted,UserRegistered, etc.) - Other services subscribe to the events they care about
- Services can process events in parallel and independently
- New services can be added without changing existing ones
- Failures are isolated, and the system can recover more gracefully
Kafka is a core building block for this style of architecture.
โ Key Takeaways
- Traditional, API-only architectures become fragile and slow as systems grow
- Common problems: tight coupling, data loss, low throughput, cascading failures
- Real-time, high-volume applications need decoupled, reliable, scalable communication
- The publish-subscribe model solves many of these issues
- Traditional message queues help, but event streaming platforms like Kafka go further
- Kafka enables event-driven architecture, which is the modern answer to these integration problems
๐ Whatโs Next?
In the next module, you will learn:
โHow Kafka Solves the Problemโ โ a detailed look at how Kafkaโs architecture (topics, partitions, brokers, consumer groups, persistence, replication) addresses exactly the problems we discussed here.
Continue with: Module 3 โ How Kafka Solves the Problem.
Hands-on Examples
Traditional vs Event-Driven Architecture
# Traditional Synchronous System (Problems)
## Order Processing Flow:
1. User places order
2. Order Service calls Inventory Service (SYNC)
3. Wait for inventory check
4. Order Service calls Payment Service (SYNC)
5. Wait for payment processing
6. Order Service calls Email Service (SYNC)
7. Wait for email confirmation
8. Return response to user
## Problems:
- High latency (sum of all service calls)
- Single point of failure
- Tight coupling
- Difficult to scale
- Complex error handling
# Event-Driven System (Solution)
## Order Processing Flow:
1. User places order
2. Order Service publishes "OrderCreated" event
3. Multiple services consume the event:
- Inventory Service: Reserve items
- Payment Service: Process payment
- Analytics Service: Track metrics
- Email Service: Send confirmation
4. Each service publishes its own events
5. Order Service updates status based on events
## Benefits:
- Lower latency (asynchronous)
- Fault tolerant
- Loose coupling
- Easier to scale
- Simpler error handlingThis comparison shows how event-driven architecture solves the fundamental problems of traditional synchronous systems by decoupling services and using asynchronous processing.