Module 4: Kafka Architecture (Deep Dive)

Chapter 4 • Intermediate

45 min

Kafka Architecture (Deep Dive)

Understanding Kafka's internal architecture is crucial for designing reliable, scalable event-driven systems. In this module, you will go under the hood and see how producers, brokers, topics, partitions, replication, and controllers all fit together.

🎯 What You Will Learn

By the end of this module, you will be able to:

Visualize Kafka's high-level architecture (producers, brokers, consumers, topics, partitions)
Explain the internal architecture of producers, brokers, and consumers
Understand how replication, leader election, and ISR provide fault tolerance
Describe the difference between Zookeeper-based Kafka and KRaft (modern Kafka)
Apply basic performance tuning, monitoring, and security best practices for Kafka clusters

🗺️ High-Level Kafka Architecture

At a high level, Kafka connects producers, brokers, and consumers using topics and partitions.

code

    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │  Producer   │    │  Producer   │    │  Producer   │
    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘
           │                  │                  │
           └──────────────────┼──────────────────┘
                              │
           ┌──────────────────┼──────────────────┐
           │                  │                  │
    ┌──────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
    │  Broker 1   │    │  Broker 2   │    │  Broker 3   │
    │             │    │             │    │             │
    │ Topic A     │    │ Topic A     │    │ Topic A     │
    │ Partition 0 │    │ Partition 1 │    │ Partition 2 │
    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘
           │                  │                  │
           └──────────────────┼──────────────────┘
                              │
           ┌──────────────────┼──────────────────┐
           │                  │                  │
    ┌──────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
    │  Consumer   │    │  Consumer   │    │  Consumer   │
    │  Group A    │    │  Group A    │    │  Group A    │
    └─────────────┘    └─────────────┘    └─────────────┘

Producers send messages to topics
Topics are split into partitions, distributed across brokers
Consumers read messages from partitions, usually as part of a consumer group

The rest of this module dives into how each part is implemented internally.

🧩 1. Producer Architecture

Producers are responsible for publishing records to Kafka topics in an efficient and reliable way.

Producer Components

RecordAccumulator

Buffers messages in memory to send them in batches for better throughput.

Sender Thread

Runs in the background and sends accumulated batches to the appropriate brokers.

Metadata

Caches topic and partition information (which broker is leader for which partition).

Serializer

Converts keys and values (e.g., Java objects, JSON) into bytes before sending.

Producer Flow

Message Creation – Application creates a record with key/value.
Serialization – Key/value are serialized to bytes.
Partitioning – Kafka chooses a partition (by key hash or round-robin).
Batching – Records for the same partition are accumulated into batches.
Sending – Batches are sent to the broker that is leader for the partition.
Acknowledgment – Producer waits for acknowledgment (based on acks config).

Example Producer Configuration

properties.js

    # Reliability
    acks=all                    # Wait for all in-sync replicas
    retries=3                   # Retry failed sends
    enable.idempotence=true     # Prevent duplicates
    
    # Performance
    batch.size=16384            # Batch size in bytes
    linger.ms=5                 # Wait up to 5ms to batch messages
    compression.type=lz4        # Compress messages on the wire

Higher reliability (acks=all, idempotence) → safer, but slightly higher latency
Higher batching (batch.size, linger.ms) → better throughput, slightly more latency

🖥️ 2. Broker Architecture

A Kafka broker is a server that stores data and serves client requests. A Kafka cluster consists of multiple brokers.

Broker Components

Log Manager

Manages topic partitions stored as log segments on disk.

Replica Manager

Handles replication between leader and follower replicas.

Controller (in the cluster)

Special broker responsible for managing cluster metadata, leader elections, and partition assignments.

Request Handler

Handles client requests from producers and consumers (produce, fetch, metadata, etc.).

Topic and Partition Structure

code

    Topic: user-events
    ├── Partition 0 (Leader: Broker 1, Replicas: 1,2,3)
    ├── Partition 1 (Leader: Broker 2, Replicas: 2,3,1)
    └── Partition 2 (Leader: Broker 3, Replicas: 3,1,2)

Each partition has:

A leader replica that handles reads/writes
One or more follower replicas that replicate data from the leader

Partition Storage Layout

On disk, each partition is stored as a sequence of segment files:

code

    /kafka-logs/user-events-0/
    ├── 00000000000000000000.log         # Segment file (data)
    ├── 00000000000000000000.index       # Offset index
    ├── 00000000000000000000.timeindex   # Timestamp index
    └── leader-epoch-checkpoint          # Leader epoch tracking

This log-based design allows:

Fast sequential writes
Efficient retention and compaction
Simple offset-based reads

📥 3. Consumer Architecture

Consumers read records from Kafka and process them. Often, they are part of consumer groups to share the load.

Consumer Components

Fetcher

Fetches batches of messages from brokers.

Coordinator

Communicates with the Group Coordinator broker to manage group membership and partition assignments.

Offset Manager

Tracks and commits offsets to Kafka (or an external store).

Deserializer

Converts bytes back into objects (e.g., strings, JSON, custom types).

Consumer Group Coordination (High-Level)

Group Join – Consumers join a group using group.id.
Rebalance – The coordinator assigns partitions to consumers in the group.
Offset Commit – Consumers commit offsets after processing messages.
Heartbeat – Consumers send heartbeats to remain active in the group.

You will explore consumer groups in much more detail in Module 5.

🧠 4. Zookeeper vs KRaft (Kafka 2.8+)

Historically, Kafka used Zookeeper to store cluster metadata. Modern Kafka supports KRaft mode, which removes the Zookeeper dependency.

Zookeeper Mode (Legacy)

code

    ┌─────────────┐       ┌─────────────┐
    │  Zookeeper  │ ◄──── │   Kafka     │
    │   Cluster   │       │  Brokers    │
    └─────────────┘       └─────────────┘

Zookeeper stores metadata about brokers, topics, partitions, and ACLs.
Kafka brokers rely on Zookeeper for controller election and cluster coordination.

KRaft Mode (Modern Kafka)

code

    ┌─────────────┐       ┌─────────────┐
    │   Kafka     │ ◄──── │   Kafka     │
    │ Controller  │       │  Brokers    │
    └─────────────┘       └─────────────┘

Kafka uses its own Raft-based consensus (KRaft) for metadata.
No external Zookeeper cluster is required.
Simplifies deployment and improves metadata scalability.

New Kafka deployments increasingly use KRaft mode, while older clusters may still run with Zookeeper.

🛡️ Replication and Fault Tolerance

Replication is at the heart of Kafka's durability and availability story.

Replication Factor

Replication Factor = N: Each partition has N replicas (1 leader + N-1 followers)
Leader: Handles all reads and writes
Followers: Replicate data from the leader
ISR (In-Sync Replicas): Replicas that have caught up with the leader

Leader Election

Failure Detection – The controller detects that a leader broker is down.
ISR Check – It identifies which replicas are still in-sync.
New Leader Selection – A new leader is chosen from the ISR.
Metadata Update – Cluster metadata is updated with the new leader.
Client Notification – Producers and consumers get updated metadata.

This process ensures that the cluster continues to operate even when individual brokers fail.

🔒 Data Durability and Acknowledgments

Kafka provides tuneable durability guarantees via acknowledgment levels:

Acknowledgment Levels

`acks=0` – Fire-and-forget. Producer does not wait for any acknowledgment.
`acks=1` – Wait for leader to write the record.
`acks=all` – Wait for all in-sync replicas to acknowledge the write.

Durability Trade-offs

Higher Durability (acks=all, more replicas)
Safer, more resilient to failures
Slightly higher latency and lower throughput

Lower Durability (acks=0 or 1)
Faster and higher throughput
Higher risk of data loss on failures

Choosing the right configuration depends on your business requirements (e.g., payments vs logs).

⚙️ Performance Optimization

Kafka is fast out of the box, but real systems often require tuning.

Producer Optimization

Batching – Increase batch.size and linger.ms for better throughput.
Compression – Use compression.type (e.g., lz4, snappy) to reduce bandwidth.
Async Sending – Use asynchronous send patterns to avoid blocking.
Partitioning Strategy – Ensure keys are well-distributed across partitions.

Consumer Optimization

Fetch Size – Tune fetch.min.bytes and max.partition.fetch.bytes for batching vs latency.
Session Timeout – session.timeout.ms should balance failure detection with stability.
Auto Commit – Decide between auto commit and manual commit based on processing guarantees.
Consumer Groups – Use groups to parallelize processing.

Broker Optimization

Disk I/O – Prefer SSDs and ensure good disk throughput.
Memory – Tune JVM heap and OS page cache usage.
Network – Optimize network settings for high throughput.
Compression & Retention – Configure log retention and compression appropriately.

📊 Monitoring and Observability

A Kafka cluster without monitoring is a ticking time bomb.

Key Metrics to Track

Throughput – Messages per second (produce and fetch).
Latency – End-to-end and client-side request latency.
Consumer Lag – How far behind consumers are from the latest offsets.
Disk Usage – Storage consumption per broker and topic.
Network Usage – Ingress and egress bandwidth.

Common Monitoring Tools

Kafka Manager – Web-based Kafka management (open-source tools).
Confluent Control Center – Enterprise-grade monitoring and management.
JMX Metrics – Built-in Java metrics exposed by brokers and clients.
Grafana + Prometheus – Popular stack for custom dashboards and alerting.

Good monitoring lets you catch issues like lag, disk pressure, or slow consumers before they become outages.

🔐 Security Considerations

Kafka runs in many production environments where security is critical.

Authentication

SASL – Pluggable authentication (e.g., SASL/PLAIN, SASL/SCRAM).
SSL/TLS – Encrypts communication between clients and brokers.
Kerberos – Common in enterprise environments with existing Kerberos infrastructure.

Authorization

ACLs (Access Control Lists) – Control which users can access which topics, consumer groups, or cluster operations.
RBAC (Role-Based Access Control) – Role-based permissions (often via enterprise platforms).
Resource-Level Controls – Fine-grained permissions per topic, group, or cluster resource.

Encryption

Data in Transit – Protect with SSL/TLS.
Data at Rest – Use disk encryption (OS-level or infrastructure-level).
Message-Level Encryption – Application-level encryption when needed.

Security should be part of the design from the beginning, not an afterthought.

✅ Key Takeaways

Kafka’s architecture is log-based, partitioned, and replicated, enabling durability and high throughput.
Producers batch and compress data, while brokers store it efficiently as append-only logs.
Consumers use offsets and consumer groups to process data reliably and in parallel.
Replication, leader election, and ISR provide fault tolerance and high availability.
Modern Kafka can run in KRaft mode, removing the dependency on Zookeeper.
Performance tuning, monitoring, and security are essential for real-world Kafka deployments.

📚 What’s Next?

In the next module, you’ll focus on one of Kafka’s most powerful features:

“Consumer Groups in Kafka” – how Kafka distributes partitions across consumers, manages offsets, and scales message processing horizontally.

Continue with: Module 5 – Consumer Groups in Kafka.

Hands-on Examples

Complete Kafka Architecture Diagram

# Complete Kafka Architecture
    
    ## Cluster Overview:
    ┌─────────────────────────────────────────────────────────────┐
    │                    Kafka Cluster                           │
    ├─────────────────────────────────────────────────────────────┤
    │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
    │  │   Broker 1  │  │   Broker 2  │  │   Broker 3  │        │
    │  │             │  │             │  │             │        │
    │  │ Topic A     │  │ Topic A     │  │ Topic A     │        │
    │  │ Partition 0 │  │ Partition 1 │  │ Partition 2 │        │
    │  │ (Leader)    │  │ (Leader)    │  │ (Leader)    │        │
    │  │             │  │             │  │             │        │
    │  │ Topic B     │  │ Topic B     │  │ Topic B     │        │
    │  │ Partition 0 │  │ Partition 1 │  │ Partition 2 │        │
    │  │ (Follower)  │  │ (Follower)  │  │ (Follower)  │        │
    │  └─────────────┘  └─────────────┘  └─────────────┘        │
    └─────────────────────────────────────────────────────────────┘
                                    │
                                    │
    ┌─────────────────────────────────────────────────────────────┐
    │                    Client Layer                             │
    ├─────────────────────────────────────────────────────────────┤
    │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
    │  │  Producer   │  │  Consumer   │  │  Consumer    │        │
    │  │             │  │   Group A   │  │   Group B    │        │
    │  │ Batch Size  │  │             │  │             │        │
    │  │ Compression │  │ Partition 0 │  │ Partition 1 │        │
    │  │ Acks=all    │  │ Partition 1 │  │ Partition 2 │        │
    │  └─────────────┘  └─────────────┘  └─────────────┘        │
    └─────────────────────────────────────────────────────────────┘
    
    ## Data Flow:
    1. Producer sends message to topic
    2. Broker determines partition (hash(key) % partitions)
    3. Message written to partition log
    4. Replicated to follower brokers
    5. Acknowledgment sent to producer
    6. Consumer fetches from partition
    7. Offset committed to broker
    
    ## Replication Details:
    - Replication Factor: 3
    - Min In-Sync Replicas: 2
    - Leader handles all read/write
    - Followers replicate leader data
    - Controller manages leader election

This architecture diagram shows how producers, brokers, topics, partitions, and consumers interact in a Kafka cluster to provide reliable, scalable event streaming.

Module 3: How Kafka Solves the Problem

Module 5: Consumer Groups in Kafka

Module 4: Kafka Architecture (Deep Dive)

Kafka Architecture (Deep Dive)

🎯 What You Will Learn

🗺️ High-Level Kafka Architecture

🧩 1. Producer Architecture

Producer Components

Producer Flow

Example Producer Configuration

🖥️ 2. Broker Architecture

Broker Components

Topic and Partition Structure

Partition Storage Layout

📥 3. Consumer Architecture

Consumer Components

Consumer Group Coordination (High-Level)

🧠 4. Zookeeper vs KRaft (Kafka 2.8+)

Zookeeper Mode (Legacy)

KRaft Mode (Modern Kafka)

🛡️ Replication and Fault Tolerance

Replication Factor

Leader Election

🔒 Data Durability and Acknowledgments

Acknowledgment Levels

Durability Trade-offs

⚙️ Performance Optimization

Producer Optimization

Consumer Optimization

Broker Optimization

📊 Monitoring and Observability

Key Metrics to Track

Common Monitoring Tools

🔐 Security Considerations

Authentication

Authorization

Encryption

✅ Key Takeaways

📚 What’s Next?

Hands-on Examples

Complete Kafka Architecture Diagram

Related Tutorials

Previous: Module 3: How Kafka Solves the Problem

Next: Module 5: Consumer Groups in Kafka