04

Module 4: Kafka Architecture (Deep Dive)

Chapter 4 β€’ Intermediate

45 min

Kafka Architecture (Deep Dive)

Understanding Kafka's internal architecture is crucial for designing reliable, scalable event-driven systems. In this module, you will go under the hood and see how producers, brokers, topics, partitions, replication, and controllers all fit together.


🎯 What You Will Learn

By the end of this module, you will be able to:

  • Visualize Kafka's high-level architecture (producers, brokers, consumers, topics, partitions)
  • Explain the internal architecture of producers, brokers, and consumers
  • Understand how replication, leader election, and ISR provide fault tolerance
  • Describe the difference between Zookeeper-based Kafka and KRaft (modern Kafka)
  • Apply basic performance tuning, monitoring, and security best practices for Kafka clusters

πŸ—ΊοΈ High-Level Kafka Architecture

At a high level, Kafka connects producers, brokers, and consumers using topics and partitions.

code
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Producer   β”‚    β”‚  Producer   β”‚    β”‚  Producer   β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
           β”‚                  β”‚                  β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚                  β”‚                  β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
    β”‚  Broker 1   β”‚    β”‚  Broker 2   β”‚    β”‚  Broker 3   β”‚
    β”‚             β”‚    β”‚             β”‚    β”‚             β”‚
    β”‚ Topic A     β”‚    β”‚ Topic A     β”‚    β”‚ Topic A     β”‚
    β”‚ Partition 0 β”‚    β”‚ Partition 1 β”‚    β”‚ Partition 2 β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
           β”‚                  β”‚                  β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚                  β”‚                  β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
    β”‚  Consumer   β”‚    β”‚  Consumer   β”‚    β”‚  Consumer   β”‚
    β”‚  Group A    β”‚    β”‚  Group A    β”‚    β”‚  Group A    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    
  • Producers send messages to topics
  • Topics are split into partitions, distributed across brokers
  • Consumers read messages from partitions, usually as part of a consumer group

The rest of this module dives into how each part is implemented internally.


🧩 1. Producer Architecture

Producers are responsible for publishing records to Kafka topics in an efficient and reliable way.

Producer Components

  • RecordAccumulator

Buffers messages in memory to send them in batches for better throughput.

  • Sender Thread

Runs in the background and sends accumulated batches to the appropriate brokers.

  • Metadata

Caches topic and partition information (which broker is leader for which partition).

  • Serializer

Converts keys and values (e.g., Java objects, JSON) into bytes before sending.

Producer Flow

  1. Message Creation – Application creates a record with key/value.
  2. Serialization – Key/value are serialized to bytes.
  3. Partitioning – Kafka chooses a partition (by key hash or round-robin).
  4. Batching – Records for the same partition are accumulated into batches.
  5. Sending – Batches are sent to the broker that is leader for the partition.
  6. Acknowledgment – Producer waits for acknowledgment (based on acks config).

Example Producer Configuration

properties.js
    # Reliability
    acks=all                    # Wait for all in-sync replicas
    retries=3                   # Retry failed sends
    enable.idempotence=true     # Prevent duplicates
    
    # Performance
    batch.size=16384            # Batch size in bytes
    linger.ms=5                 # Wait up to 5ms to batch messages
    compression.type=lz4        # Compress messages on the wire
    
  • Higher reliability (acks=all, idempotence) β†’ safer, but slightly higher latency
  • Higher batching (batch.size, linger.ms) β†’ better throughput, slightly more latency

πŸ–₯️ 2. Broker Architecture

A Kafka broker is a server that stores data and serves client requests. A Kafka cluster consists of multiple brokers.

Broker Components

  • Log Manager

Manages topic partitions stored as log segments on disk.

  • Replica Manager

Handles replication between leader and follower replicas.

  • Controller (in the cluster)

Special broker responsible for managing cluster metadata, leader elections, and partition assignments.

  • Request Handler

Handles client requests from producers and consumers (produce, fetch, metadata, etc.).

Topic and Partition Structure

code
    Topic: user-events
    β”œβ”€β”€ Partition 0 (Leader: Broker 1, Replicas: 1,2,3)
    β”œβ”€β”€ Partition 1 (Leader: Broker 2, Replicas: 2,3,1)
    └── Partition 2 (Leader: Broker 3, Replicas: 3,1,2)
    

Each partition has:

  • A leader replica that handles reads/writes
  • One or more follower replicas that replicate data from the leader

Partition Storage Layout

On disk, each partition is stored as a sequence of segment files:

code
    /kafka-logs/user-events-0/
    β”œβ”€β”€ 00000000000000000000.log         # Segment file (data)
    β”œβ”€β”€ 00000000000000000000.index       # Offset index
    β”œβ”€β”€ 00000000000000000000.timeindex   # Timestamp index
    └── leader-epoch-checkpoint          # Leader epoch tracking
    

This log-based design allows:

  • Fast sequential writes
  • Efficient retention and compaction
  • Simple offset-based reads

πŸ“₯ 3. Consumer Architecture

Consumers read records from Kafka and process them. Often, they are part of consumer groups to share the load.

Consumer Components

  • Fetcher

Fetches batches of messages from brokers.

  • Coordinator

Communicates with the Group Coordinator broker to manage group membership and partition assignments.

  • Offset Manager

Tracks and commits offsets to Kafka (or an external store).

  • Deserializer

Converts bytes back into objects (e.g., strings, JSON, custom types).

Consumer Group Coordination (High-Level)

  1. Group Join – Consumers join a group using group.id.
  2. Rebalance – The coordinator assigns partitions to consumers in the group.
  3. Offset Commit – Consumers commit offsets after processing messages.
  4. Heartbeat – Consumers send heartbeats to remain active in the group.

You will explore consumer groups in much more detail in Module 5.


🧠 4. Zookeeper vs KRaft (Kafka 2.8+)

Historically, Kafka used Zookeeper to store cluster metadata. Modern Kafka supports KRaft mode, which removes the Zookeeper dependency.

Zookeeper Mode (Legacy)

code
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Zookeeper  β”‚ ◄──── β”‚   Kafka     β”‚
    β”‚   Cluster   β”‚       β”‚  Brokers    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    
  • Zookeeper stores metadata about brokers, topics, partitions, and ACLs.
  • Kafka brokers rely on Zookeeper for controller election and cluster coordination.

KRaft Mode (Modern Kafka)

code
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Kafka     β”‚ ◄──── β”‚   Kafka     β”‚
    β”‚ Controller  β”‚       β”‚  Brokers    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    
  • Kafka uses its own Raft-based consensus (KRaft) for metadata.
  • No external Zookeeper cluster is required.
  • Simplifies deployment and improves metadata scalability.

New Kafka deployments increasingly use KRaft mode, while older clusters may still run with Zookeeper.


πŸ›‘οΈ Replication and Fault Tolerance

Replication is at the heart of Kafka's durability and availability story.

Replication Factor

  • Replication Factor = N: Each partition has N replicas (1 leader + N-1 followers)
  • Leader: Handles all reads and writes
  • Followers: Replicate data from the leader
  • ISR (In-Sync Replicas): Replicas that have caught up with the leader

Leader Election

  1. Failure Detection – The controller detects that a leader broker is down.
  2. ISR Check – It identifies which replicas are still in-sync.
  3. New Leader Selection – A new leader is chosen from the ISR.
  4. Metadata Update – Cluster metadata is updated with the new leader.
  5. Client Notification – Producers and consumers get updated metadata.

This process ensures that the cluster continues to operate even when individual brokers fail.


πŸ”’ Data Durability and Acknowledgments

Kafka provides tuneable durability guarantees via acknowledgment levels:

Acknowledgment Levels

  • `acks=0` – Fire-and-forget. Producer does not wait for any acknowledgment.
  • `acks=1` – Wait for leader to write the record.
  • `acks=all` – Wait for all in-sync replicas to acknowledge the write.

Durability Trade-offs

  • Higher Durability (acks=all, more replicas)
  • Safer, more resilient to failures
  • Slightly higher latency and lower throughput
  • Lower Durability (acks=0 or 1)
  • Faster and higher throughput
  • Higher risk of data loss on failures

Choosing the right configuration depends on your business requirements (e.g., payments vs logs).


βš™οΈ Performance Optimization

Kafka is fast out of the box, but real systems often require tuning.

Producer Optimization

  • Batching – Increase batch.size and linger.ms for better throughput.
  • Compression – Use compression.type (e.g., lz4, snappy) to reduce bandwidth.
  • Async Sending – Use asynchronous send patterns to avoid blocking.
  • Partitioning Strategy – Ensure keys are well-distributed across partitions.

Consumer Optimization

  • Fetch Size – Tune fetch.min.bytes and max.partition.fetch.bytes for batching vs latency.
  • Session Timeout – session.timeout.ms should balance failure detection with stability.
  • Auto Commit – Decide between auto commit and manual commit based on processing guarantees.
  • Consumer Groups – Use groups to parallelize processing.

Broker Optimization

  • Disk I/O – Prefer SSDs and ensure good disk throughput.
  • Memory – Tune JVM heap and OS page cache usage.
  • Network – Optimize network settings for high throughput.
  • Compression & Retention – Configure log retention and compression appropriately.

πŸ“Š Monitoring and Observability

A Kafka cluster without monitoring is a ticking time bomb.

Key Metrics to Track

  • Throughput – Messages per second (produce and fetch).
  • Latency – End-to-end and client-side request latency.
  • Consumer Lag – How far behind consumers are from the latest offsets.
  • Disk Usage – Storage consumption per broker and topic.
  • Network Usage – Ingress and egress bandwidth.

Common Monitoring Tools

  • Kafka Manager – Web-based Kafka management (open-source tools).
  • Confluent Control Center – Enterprise-grade monitoring and management.
  • JMX Metrics – Built-in Java metrics exposed by brokers and clients.
  • Grafana + Prometheus – Popular stack for custom dashboards and alerting.

Good monitoring lets you catch issues like lag, disk pressure, or slow consumers before they become outages.


πŸ” Security Considerations

Kafka runs in many production environments where security is critical.

Authentication

  • SASL – Pluggable authentication (e.g., SASL/PLAIN, SASL/SCRAM).
  • SSL/TLS – Encrypts communication between clients and brokers.
  • Kerberos – Common in enterprise environments with existing Kerberos infrastructure.

Authorization

  • ACLs (Access Control Lists) – Control which users can access which topics, consumer groups, or cluster operations.
  • RBAC (Role-Based Access Control) – Role-based permissions (often via enterprise platforms).
  • Resource-Level Controls – Fine-grained permissions per topic, group, or cluster resource.

Encryption

  • Data in Transit – Protect with SSL/TLS.
  • Data at Rest – Use disk encryption (OS-level or infrastructure-level).
  • Message-Level Encryption – Application-level encryption when needed.

Security should be part of the design from the beginning, not an afterthought.


βœ… Key Takeaways

  • Kafka’s architecture is log-based, partitioned, and replicated, enabling durability and high throughput.
  • Producers batch and compress data, while brokers store it efficiently as append-only logs.
  • Consumers use offsets and consumer groups to process data reliably and in parallel.
  • Replication, leader election, and ISR provide fault tolerance and high availability.
  • Modern Kafka can run in KRaft mode, removing the dependency on Zookeeper.
  • Performance tuning, monitoring, and security are essential for real-world Kafka deployments.

πŸ“š What’s Next?

In the next module, you’ll focus on one of Kafka’s most powerful features:

β€œConsumer Groups in Kafka” – how Kafka distributes partitions across consumers, manages offsets, and scales message processing horizontally.

Continue with: Module 5 – Consumer Groups in Kafka.

Hands-on Examples

Complete Kafka Architecture Diagram

# Complete Kafka Architecture
    
    ## Cluster Overview:
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                    Kafka Cluster                           β”‚
    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
    β”‚  β”‚   Broker 1  β”‚  β”‚   Broker 2  β”‚  β”‚   Broker 3  β”‚        β”‚
    β”‚  β”‚             β”‚  β”‚             β”‚  β”‚             β”‚        β”‚
    β”‚  β”‚ Topic A     β”‚  β”‚ Topic A     β”‚  β”‚ Topic A     β”‚        β”‚
    β”‚  β”‚ Partition 0 β”‚  β”‚ Partition 1 β”‚  β”‚ Partition 2 β”‚        β”‚
    β”‚  β”‚ (Leader)    β”‚  β”‚ (Leader)    β”‚  β”‚ (Leader)    β”‚        β”‚
    β”‚  β”‚             β”‚  β”‚             β”‚  β”‚             β”‚        β”‚
    β”‚  β”‚ Topic B     β”‚  β”‚ Topic B     β”‚  β”‚ Topic B     β”‚        β”‚
    β”‚  β”‚ Partition 0 β”‚  β”‚ Partition 1 β”‚  β”‚ Partition 2 β”‚        β”‚
    β”‚  β”‚ (Follower)  β”‚  β”‚ (Follower)  β”‚  β”‚ (Follower)  β”‚        β”‚
    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                    Client Layer                             β”‚
    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
    β”‚  β”‚  Producer   β”‚  β”‚  Consumer   β”‚  β”‚  Consumer    β”‚        β”‚
    β”‚  β”‚             β”‚  β”‚   Group A   β”‚  β”‚   Group B    β”‚        β”‚
    β”‚  β”‚ Batch Size  β”‚  β”‚             β”‚  β”‚             β”‚        β”‚
    β”‚  β”‚ Compression β”‚  β”‚ Partition 0 β”‚  β”‚ Partition 1 β”‚        β”‚
    β”‚  β”‚ Acks=all    β”‚  β”‚ Partition 1 β”‚  β”‚ Partition 2 β”‚        β”‚
    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    
    ## Data Flow:
    1. Producer sends message to topic
    2. Broker determines partition (hash(key) % partitions)
    3. Message written to partition log
    4. Replicated to follower brokers
    5. Acknowledgment sent to producer
    6. Consumer fetches from partition
    7. Offset committed to broker
    
    ## Replication Details:
    - Replication Factor: 3
    - Min In-Sync Replicas: 2
    - Leader handles all read/write
    - Followers replicate leader data
    - Controller manages leader election

This architecture diagram shows how producers, brokers, topics, partitions, and consumers interact in a Kafka cluster to provide reliable, scalable event streaming.