05

Module 5: Consumer Groups in Kafka

Chapter 5 β€’ Intermediate

40 min

Consumer Groups in Kafka

Consumer Groups are one of Kafka's most powerful features for scaling message processing and distributing load across multiple consumers. They enable parallel processing while keeping message ordering per partition and providing strong delivery guarantees (typically at-least-once, and effectively once with the right patterns).

In this module, you'll learn how consumer groups work, how partitions are assigned, how offsets are managed, and how to scale consumers safely.


🎯 What You Will Learn

By the end of this module, you will be able to:

  • Explain what a consumer group is and why it exists
  • Describe how partitions are assigned to consumers in a group
  • Understand rebalancing, its impact, and different assignment strategies
  • Manage offsets, consumer lag, and reset policies
  • Scale consumers horizontally and apply best practices for performance
  • Recognize common patterns and troubleshoot typical consumer group issues

πŸ” What is a Consumer Group?

A Consumer Group is a collection of consumers that work together to consume data from one or more topics.

  • All consumers in the same group share the load
  • Each partition of a topic is processed by at most one consumer in that group
  • Different consumer groups can read the same topic independently

This lets you scale processing while preserving ordering per partition.


🧠 Key Concepts

1. Partition Assignment

  • Each partition in a topic is assigned to only one consumer within a group at a time
  • Multiple consumers in the same group cannot read the same partition simultaneously
  • If you have more consumers than partitions, some consumers will be idle

Rule of thumb:

For maximum parallelism, aim for consumers ≀ partitions per consumer group.


2. Load Balancing and Rebalancing

  • Kafka automatically distributes partitions among consumers in a group
  • When a consumer joins, leaves, or crashes, Kafka triggers a rebalance
  • During a rebalance:
  • Consumers temporarily stop processing
  • Partitions are reassigned
  • Consumers resume with their new assignments

Rebalancing is necessary for fault tolerance and elasticity, but too frequent rebalancing hurts performance.


3. Offset Management

Offsets track how far each consumer group has progressed in each partition.

  • Each consumer group maintains its own offsets per partition
  • Different groups can read the same messages at different speeds
  • Offsets are typically stored in a special internal topic: __consumer_offsets

Proper offset management is key to avoiding duplicates or data loss.


πŸ—οΈ Consumer Group Architecture

Consider a topic with three partitions:

code
    Topic: user-events (3 partitions)
    β”œβ”€β”€ Partition 0: [msg1, msg4, msg7, ...]
    β”œβ”€β”€ Partition 1: [msg2, msg5, msg8, ...]
    └── Partition 2: [msg3, msg6, msg9, ...]
    

Consumer Group "analytics"

code
    Consumer Group "analytics":
    β”œβ”€β”€ Consumer 1 β†’ Partition 0
    β”œβ”€β”€ Consumer 2 β†’ Partition 1
    └── Consumer 3 β†’ Partition 2
    

Here, the group processes the topic in parallel, with each consumer handling a different partition.

Consumer Group "notifications"

code
    Consumer Group "notifications":
    β”œβ”€β”€ Consumer A β†’ All partitions (independent processing)
    β”œβ”€β”€ Consumer B β†’ All partitions (independent processing)
    └── Consumer C β†’ All partitions (independent processing)
    

Another group can read the same data but perform different processing (e.g., notifications vs analytics).


🧭 Consumer Group Coordination

Group Coordinator

Kafka designates one broker as the group coordinator for each consumer group. It:

  • Manages consumer group membership
  • Handles partition assignment and rebalancing
  • Stores and updates consumer group metadata

Rebalancing Process (High-Level)

  1. Consumer Joins – New consumer starts and joins the group.
  2. Rebalance Trigger – Coordinator detects membership change (join/leave/failure).
  3. Stop Processing – Consumers temporarily stop processing.
  4. Partition Assignment – Coordinator calculates a new assignment.
  5. Resume Processing – Consumers resume with new partition assignments.

Rebalancing is necessary but should be controlled to avoid constant redistributions.


Rebalancing Strategies

Kafka supports several partition assignment strategies:

  • Range

Assigns consecutive partitions to consumers (e.g., 0–3 to C1, 4–7 to C2).

  • Round Robin

Distributes partitions evenly in a round-robin fashion.

  • Sticky

Tries to minimize partition movement between rebalances.

  • Cooperative Sticky (Kafka 2.4+)

Performs incremental rebalancing, reducing pauses and improving stability.

Choosing the right strategy can reduce disruption and improve throughput.


πŸ“ Offset Management

Offsets tell Kafka:

β€œFor this consumer group, up to which message in this partition have we successfully processed?”

Offset Storage

  • Stored in Kafka’s internal topic: __consumer_offsets
  • Each consumer group has its own offset entries
  • Enables independent processing per group

Commit Strategies

  • Automatic Commit
properties.js
      enable.auto.commit=true
      auto.commit.interval.ms=5000
      
  • Kafka commits offsets automatically at intervals
  • Simple, but can cause duplicates or lost messages if the consumer crashes between commit and processing
  • Manual Commit (Recommended for control)

Your application decides when to commit, usually after successful processing.

  • Synchronous Commit – Blocking, safer, simpler error handling
  • Asynchronous Commit – Non-blocking, higher throughput, requires careful error handling

Offset Reset Policies

  • `earliest` – Start from the beginning of the partition if no offset is found
  • `latest` – Start from the latest message
  • `none` – Fail if no offset is found (forces explicit offset handling)

These are used when a consumer group appears for the first time or when offsets are invalid.


⏱️ Consumer Lag and Monitoring

What is Consumer Lag?

  • Lag = Latest offset in the partition – Consumer’s committed offset
  • High Lag: Consumer is falling behind
  • Zero (or low) Lag: Consumer is keeping up with real-time data

Lag is one of the most important metrics in Kafka systems.

Monitoring Consumer Groups

Common ways to monitor consumer lag and health:

  • Kafka Manager – Web-based management tool
  • Confluent Control Center – Enterprise monitoring
  • JMX Metrics – Built-in metrics for monitoring
  • Grafana / Prometheus – Popular combo for dashboards and alerts

Set alerts for high lag to detect slow consumers or insufficient capacity.


πŸ“ˆ Scaling Consumers Horizontally

Adding More Consumers to a Group

  1. Scale Up – Start more consumer instances with the same group.id.
  2. Partition Limit – Only up to one consumer per partition in a group can be active.
  3. Rebalance – Kafka automatically reassigns partitions across consumers.
  4. Result – Improved throughput and better fault tolerance (if one consumer fails, others take over).

Best Practices

  • Plan partition count with future scaling in mind.
  • Try to keep consumer count ≀ partition count.
  • Avoid frequent restarts/rescaling that cause constant rebalancing.
  • Always monitor consumer lag and rebalance frequency.

🌍 Real-World Examples

Example 1: E-commerce Analytics

code
    Topic: user-events (12 partitions)
    
    Analytics Group (4 consumers):
    β”œβ”€β”€ Consumer 1 β†’ Partitions 0, 1, 2
    β”œβ”€β”€ Consumer 2 β†’ Partitions 3, 4, 5
    β”œβ”€β”€ Consumer 3 β†’ Partitions 6, 7, 8
    └── Consumer 4 β†’ Partitions 9, 10, 11
    
    Throughput: 100,000 events/second
    Latency: < 100ms
    

This group processes user behavior in parallel, keeping up with high event volumes.


Example 2: Real-time Notifications

code
    Topic: notifications (6 partitions)
    
    Notification Group (3 consumers):
    β”œβ”€β”€ Consumer A β†’ Partitions 0, 1
    β”œβ”€β”€ Consumer B β†’ Partitions 2, 3
    └── Consumer C β†’ Partitions 4, 5
    
    Processing: Email, SMS, Push notifications
    Latency: < 50ms
    

Here, the group ensures that notification events are processed quickly and independently.


βš™οΈ Consumer Group Configuration

Key Configuration Parameters

properties.js
    # Group Settings
    group.id=my-consumer-group
    group.instance.id=consumer-1
    
    # Session Management
    session.timeout.ms=30000
    heartbeat.interval.ms=3000
    
    # Offset Management
    enable.auto.commit=true
    auto.commit.interval.ms=5000
    auto.offset.reset=latest
    
    # Fetch Settings
    fetch.min.bytes=1
    fetch.max.wait.ms=500
    max.partition.fetch.bytes=1048576
    

Performance Tuning Tips

  • Session Timeout – Balance responsiveness vs stability
  • Heartbeat Interval – Frequent heartbeats for faster failure detection
  • Fetch Size – Larger fetches for better throughput
  • Commit Frequency – Balance between performance and durability

🧱 Common Patterns with Consumer Groups

Pattern 1: Fan-out Processing

  • Multiple consumer groups process the same data
  • Each group has different processing logic (e.g., analytics, notifications, billing)
  • Independent scaling and fault tolerance for each group

Pattern 2: Pipeline Processing

  • Sequential processing through multiple topics
  • Each stage has its own consumer group
  • Enables complex data transformations and streaming ETL

Pattern 3: Microservices Integration

  • Each microservice has its own consumer group
  • Decoupled processing and scaling
  • Independent deployment and monitoring per service

πŸ› οΈ Troubleshooting Consumer Groups

Common Issues

  1. Consumer Lag – Consumers are falling behind.
  2. Frequent Rebalancing – Constant partition movement, causing pauses.
  3. Offset Management Problems – Lost or duplicate messages.
  4. Uneven Partition Assignment – Some consumers do much more work than others.

Solutions

  1. Scale Consumers – Add more consumers or partitions to reduce lag.
  2. Optimize Processing – Improve consumer processing speed (DB, I/O, logic).
  3. Tune Configuration – Adjust session timeouts, fetch sizes, and commit strategies.
  4. Monitor Metrics – Track consumer lag, rebalancing, and error rates.

βœ… Key Takeaways

  • A consumer group is how Kafka scales message processing horizontally while preserving partition ordering.
  • Partition assignment ensures only one consumer per partition within a group.
  • Kafka handles rebalancing, but too many rebalances can hurt performance.
  • Offsets and commit strategies determine your guarantees around duplicates and data loss.
  • Consumer lag is the main health signal for consumer groups.
  • Consumer groups power patterns like fan-out, pipelines, and microservice integration.

πŸ“š What’s Next?

After understanding consumer groups conceptually, the next step is to work with real code:

Build Kafka consumers, run them as a group, observe rebalancing and lag, and experiment with offset commit strategies.

That’s where you’ll see these concepts turn into hands-on experience.

Hands-on Examples

Consumer Group Load Balancing Demo

# Consumer Group Load Balancing Example
    
    ## Scenario: E-commerce Event Processing
    Topic: "order-events" (6 partitions)
    Consumer Group: "order-processors"
    
    ## Initial Setup (3 consumers):
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                Consumer Group: order-processors         β”‚
    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
    β”‚  Consumer 1    Consumer 2    Consumer 3                β”‚
    β”‚      β”‚             β”‚             β”‚                     β”‚
    β”‚   Partition 0   Partition 2   Partition 4             β”‚
    β”‚   Partition 1   Partition 3   Partition 5             β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    
    ## Adding Consumer 4 (Rebalancing):
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                Consumer Group: order-processors         β”‚
    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
    β”‚  Consumer 1    Consumer 2    Consumer 3    Consumer 4  β”‚
    β”‚      β”‚             β”‚             β”‚             β”‚        β”‚
    β”‚   Partition 0   Partition 1   Partition 3   Partition 5β”‚
    β”‚   Partition 2   Partition 4                             β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    
    ## Processing Flow:
    1. Order created β†’ Topic "order-events"
    2. Message routed to partition (hash(order_id) % 6)
    3. Consumer processes message from assigned partition
    4. Offset committed after successful processing
    5. Next message processed from same partition
    
    ## Benefits:
    - Parallel processing across partitions
    - Automatic load balancing
    - Fault tolerance through replication
    - Independent scaling of consumers

This example shows how consumer groups automatically distribute partitions among consumers, enabling parallel processing and fault tolerance.