Module 4: Kafka Architecture (Deep Dive)
Chapter 4 β’ Intermediate
Kafka Architecture (Deep Dive)
Understanding Kafka's internal architecture is crucial for designing reliable, scalable event-driven systems. In this module, you will go under the hood and see how producers, brokers, topics, partitions, replication, and controllers all fit together.
π― What You Will Learn
By the end of this module, you will be able to:
- Visualize Kafka's high-level architecture (producers, brokers, consumers, topics, partitions)
- Explain the internal architecture of producers, brokers, and consumers
- Understand how replication, leader election, and ISR provide fault tolerance
- Describe the difference between Zookeeper-based Kafka and KRaft (modern Kafka)
- Apply basic performance tuning, monitoring, and security best practices for Kafka clusters
πΊοΈ High-Level Kafka Architecture
At a high level, Kafka connects producers, brokers, and consumers using topics and partitions.
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Producer β β Producer β β Producer β
ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ
β β β
ββββββββββββββββββββΌβββββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββ
β β β
ββββββββΌβββββββ ββββββββΌβββββββ ββββββββΌβββββββ
β Broker 1 β β Broker 2 β β Broker 3 β
β β β β β β
β Topic A β β Topic A β β Topic A β
β Partition 0 β β Partition 1 β β Partition 2 β
ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ
β β β
ββββββββββββββββββββΌβββββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββ
β β β
ββββββββΌβββββββ ββββββββΌβββββββ ββββββββΌβββββββ
β Consumer β β Consumer β β Consumer β
β Group A β β Group A β β Group A β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
- Producers send messages to topics
- Topics are split into partitions, distributed across brokers
- Consumers read messages from partitions, usually as part of a consumer group
The rest of this module dives into how each part is implemented internally.
π§© 1. Producer Architecture
Producers are responsible for publishing records to Kafka topics in an efficient and reliable way.
Producer Components
- RecordAccumulator
Buffers messages in memory to send them in batches for better throughput.
- Sender Thread
Runs in the background and sends accumulated batches to the appropriate brokers.
- Metadata
Caches topic and partition information (which broker is leader for which partition).
- Serializer
Converts keys and values (e.g., Java objects, JSON) into bytes before sending.
Producer Flow
- Message Creation β Application creates a record with key/value.
- Serialization β Key/value are serialized to bytes.
- Partitioning β Kafka chooses a partition (by key hash or round-robin).
- Batching β Records for the same partition are accumulated into batches.
- Sending β Batches are sent to the broker that is leader for the partition.
- Acknowledgment β Producer waits for acknowledgment (based on
acksconfig).
Example Producer Configuration
# Reliability
acks=all # Wait for all in-sync replicas
retries=3 # Retry failed sends
enable.idempotence=true # Prevent duplicates
# Performance
batch.size=16384 # Batch size in bytes
linger.ms=5 # Wait up to 5ms to batch messages
compression.type=lz4 # Compress messages on the wire
- Higher reliability (acks=all, idempotence) β safer, but slightly higher latency
- Higher batching (batch.size, linger.ms) β better throughput, slightly more latency
π₯οΈ 2. Broker Architecture
A Kafka broker is a server that stores data and serves client requests. A Kafka cluster consists of multiple brokers.
Broker Components
- Log Manager
Manages topic partitions stored as log segments on disk.
- Replica Manager
Handles replication between leader and follower replicas.
- Controller (in the cluster)
Special broker responsible for managing cluster metadata, leader elections, and partition assignments.
- Request Handler
Handles client requests from producers and consumers (produce, fetch, metadata, etc.).
Topic and Partition Structure
Topic: user-events
βββ Partition 0 (Leader: Broker 1, Replicas: 1,2,3)
βββ Partition 1 (Leader: Broker 2, Replicas: 2,3,1)
βββ Partition 2 (Leader: Broker 3, Replicas: 3,1,2)
Each partition has:
- A leader replica that handles reads/writes
- One or more follower replicas that replicate data from the leader
Partition Storage Layout
On disk, each partition is stored as a sequence of segment files:
/kafka-logs/user-events-0/
βββ 00000000000000000000.log # Segment file (data)
βββ 00000000000000000000.index # Offset index
βββ 00000000000000000000.timeindex # Timestamp index
βββ leader-epoch-checkpoint # Leader epoch tracking
This log-based design allows:
- Fast sequential writes
- Efficient retention and compaction
- Simple offset-based reads
π₯ 3. Consumer Architecture
Consumers read records from Kafka and process them. Often, they are part of consumer groups to share the load.
Consumer Components
- Fetcher
Fetches batches of messages from brokers.
- Coordinator
Communicates with the Group Coordinator broker to manage group membership and partition assignments.
- Offset Manager
Tracks and commits offsets to Kafka (or an external store).
- Deserializer
Converts bytes back into objects (e.g., strings, JSON, custom types).
Consumer Group Coordination (High-Level)
- Group Join β Consumers join a group using
group.id. - Rebalance β The coordinator assigns partitions to consumers in the group.
- Offset Commit β Consumers commit offsets after processing messages.
- Heartbeat β Consumers send heartbeats to remain active in the group.
You will explore consumer groups in much more detail in Module 5.
π§ 4. Zookeeper vs KRaft (Kafka 2.8+)
Historically, Kafka used Zookeeper to store cluster metadata. Modern Kafka supports KRaft mode, which removes the Zookeeper dependency.
Zookeeper Mode (Legacy)
βββββββββββββββ βββββββββββββββ
β Zookeeper β βββββ β Kafka β
β Cluster β β Brokers β
βββββββββββββββ βββββββββββββββ
- Zookeeper stores metadata about brokers, topics, partitions, and ACLs.
- Kafka brokers rely on Zookeeper for controller election and cluster coordination.
KRaft Mode (Modern Kafka)
βββββββββββββββ βββββββββββββββ
β Kafka β βββββ β Kafka β
β Controller β β Brokers β
βββββββββββββββ βββββββββββββββ
- Kafka uses its own Raft-based consensus (KRaft) for metadata.
- No external Zookeeper cluster is required.
- Simplifies deployment and improves metadata scalability.
New Kafka deployments increasingly use KRaft mode, while older clusters may still run with Zookeeper.
π‘οΈ Replication and Fault Tolerance
Replication is at the heart of Kafka's durability and availability story.
Replication Factor
- Replication Factor = N: Each partition has N replicas (1 leader + N-1 followers)
- Leader: Handles all reads and writes
- Followers: Replicate data from the leader
- ISR (In-Sync Replicas): Replicas that have caught up with the leader
Leader Election
- Failure Detection β The controller detects that a leader broker is down.
- ISR Check β It identifies which replicas are still in-sync.
- New Leader Selection β A new leader is chosen from the ISR.
- Metadata Update β Cluster metadata is updated with the new leader.
- Client Notification β Producers and consumers get updated metadata.
This process ensures that the cluster continues to operate even when individual brokers fail.
π Data Durability and Acknowledgments
Kafka provides tuneable durability guarantees via acknowledgment levels:
Acknowledgment Levels
- `acks=0` β Fire-and-forget. Producer does not wait for any acknowledgment.
- `acks=1` β Wait for leader to write the record.
- `acks=all` β Wait for all in-sync replicas to acknowledge the write.
Durability Trade-offs
- Higher Durability (acks=all, more replicas)
- Safer, more resilient to failures
- Slightly higher latency and lower throughput
- Lower Durability (acks=0 or 1)
- Faster and higher throughput
- Higher risk of data loss on failures
Choosing the right configuration depends on your business requirements (e.g., payments vs logs).
βοΈ Performance Optimization
Kafka is fast out of the box, but real systems often require tuning.
Producer Optimization
- Batching β Increase
batch.sizeandlinger.msfor better throughput. - Compression β Use
compression.type(e.g., lz4, snappy) to reduce bandwidth. - Async Sending β Use asynchronous send patterns to avoid blocking.
- Partitioning Strategy β Ensure keys are well-distributed across partitions.
Consumer Optimization
- Fetch Size β Tune
fetch.min.bytesandmax.partition.fetch.bytesfor batching vs latency. - Session Timeout β
session.timeout.msshould balance failure detection with stability. - Auto Commit β Decide between auto commit and manual commit based on processing guarantees.
- Consumer Groups β Use groups to parallelize processing.
Broker Optimization
- Disk I/O β Prefer SSDs and ensure good disk throughput.
- Memory β Tune JVM heap and OS page cache usage.
- Network β Optimize network settings for high throughput.
- Compression & Retention β Configure log retention and compression appropriately.
π Monitoring and Observability
A Kafka cluster without monitoring is a ticking time bomb.
Key Metrics to Track
- Throughput β Messages per second (produce and fetch).
- Latency β End-to-end and client-side request latency.
- Consumer Lag β How far behind consumers are from the latest offsets.
- Disk Usage β Storage consumption per broker and topic.
- Network Usage β Ingress and egress bandwidth.
Common Monitoring Tools
- Kafka Manager β Web-based Kafka management (open-source tools).
- Confluent Control Center β Enterprise-grade monitoring and management.
- JMX Metrics β Built-in Java metrics exposed by brokers and clients.
- Grafana + Prometheus β Popular stack for custom dashboards and alerting.
Good monitoring lets you catch issues like lag, disk pressure, or slow consumers before they become outages.
π Security Considerations
Kafka runs in many production environments where security is critical.
Authentication
- SASL β Pluggable authentication (e.g., SASL/PLAIN, SASL/SCRAM).
- SSL/TLS β Encrypts communication between clients and brokers.
- Kerberos β Common in enterprise environments with existing Kerberos infrastructure.
Authorization
- ACLs (Access Control Lists) β Control which users can access which topics, consumer groups, or cluster operations.
- RBAC (Role-Based Access Control) β Role-based permissions (often via enterprise platforms).
- Resource-Level Controls β Fine-grained permissions per topic, group, or cluster resource.
Encryption
- Data in Transit β Protect with SSL/TLS.
- Data at Rest β Use disk encryption (OS-level or infrastructure-level).
- Message-Level Encryption β Application-level encryption when needed.
Security should be part of the design from the beginning, not an afterthought.
β Key Takeaways
- Kafkaβs architecture is log-based, partitioned, and replicated, enabling durability and high throughput.
- Producers batch and compress data, while brokers store it efficiently as append-only logs.
- Consumers use offsets and consumer groups to process data reliably and in parallel.
- Replication, leader election, and ISR provide fault tolerance and high availability.
- Modern Kafka can run in KRaft mode, removing the dependency on Zookeeper.
- Performance tuning, monitoring, and security are essential for real-world Kafka deployments.
π Whatβs Next?
In the next module, youβll focus on one of Kafkaβs most powerful features:
βConsumer Groups in Kafkaβ β how Kafka distributes partitions across consumers, manages offsets, and scales message processing horizontally.
Continue with: Module 5 β Consumer Groups in Kafka.
Hands-on Examples
Complete Kafka Architecture Diagram
# Complete Kafka Architecture
## Cluster Overview:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Kafka Cluster β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Broker 1 β β Broker 2 β β Broker 3 β β
β β β β β β β β
β β Topic A β β Topic A β β Topic A β β
β β Partition 0 β β Partition 1 β β Partition 2 β β
β β (Leader) β β (Leader) β β (Leader) β β
β β β β β β β β
β β Topic B β β Topic B β β Topic B β β
β β Partition 0 β β Partition 1 β β Partition 2 β β
β β (Follower) β β (Follower) β β (Follower) β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Layer β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Producer β β Consumer β β Consumer β β
β β β β Group A β β Group B β β
β β Batch Size β β β β β β
β β Compression β β Partition 0 β β Partition 1 β β
β β Acks=all β β Partition 1 β β Partition 2 β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## Data Flow:
1. Producer sends message to topic
2. Broker determines partition (hash(key) % partitions)
3. Message written to partition log
4. Replicated to follower brokers
5. Acknowledgment sent to producer
6. Consumer fetches from partition
7. Offset committed to broker
## Replication Details:
- Replication Factor: 3
- Min In-Sync Replicas: 2
- Leader handles all read/write
- Followers replicate leader data
- Controller manages leader electionThis architecture diagram shows how producers, brokers, topics, partitions, and consumers interact in a Kafka cluster to provide reliable, scalable event streaming.