Module 5: Consumer Groups in Kafka
Chapter 5 β’ Intermediate
Consumer Groups in Kafka
Consumer Groups are one of Kafka's most powerful features for scaling message processing and distributing load across multiple consumers. They enable parallel processing while keeping message ordering per partition and providing strong delivery guarantees (typically at-least-once, and effectively once with the right patterns).
In this module, you'll learn how consumer groups work, how partitions are assigned, how offsets are managed, and how to scale consumers safely.
π― What You Will Learn
By the end of this module, you will be able to:
- Explain what a consumer group is and why it exists
- Describe how partitions are assigned to consumers in a group
- Understand rebalancing, its impact, and different assignment strategies
- Manage offsets, consumer lag, and reset policies
- Scale consumers horizontally and apply best practices for performance
- Recognize common patterns and troubleshoot typical consumer group issues
π What is a Consumer Group?
A Consumer Group is a collection of consumers that work together to consume data from one or more topics.
- All consumers in the same group share the load
- Each partition of a topic is processed by at most one consumer in that group
- Different consumer groups can read the same topic independently
This lets you scale processing while preserving ordering per partition.
π§ Key Concepts
1. Partition Assignment
- Each partition in a topic is assigned to only one consumer within a group at a time
- Multiple consumers in the same group cannot read the same partition simultaneously
- If you have more consumers than partitions, some consumers will be idle
Rule of thumb:
For maximum parallelism, aim for consumers β€ partitions per consumer group.
2. Load Balancing and Rebalancing
- Kafka automatically distributes partitions among consumers in a group
- When a consumer joins, leaves, or crashes, Kafka triggers a rebalance
- During a rebalance:
- Consumers temporarily stop processing
- Partitions are reassigned
- Consumers resume with their new assignments
Rebalancing is necessary for fault tolerance and elasticity, but too frequent rebalancing hurts performance.
3. Offset Management
Offsets track how far each consumer group has progressed in each partition.
- Each consumer group maintains its own offsets per partition
- Different groups can read the same messages at different speeds
- Offsets are typically stored in a special internal topic:
__consumer_offsets
Proper offset management is key to avoiding duplicates or data loss.
ποΈ Consumer Group Architecture
Consider a topic with three partitions:
Topic: user-events (3 partitions)
βββ Partition 0: [msg1, msg4, msg7, ...]
βββ Partition 1: [msg2, msg5, msg8, ...]
βββ Partition 2: [msg3, msg6, msg9, ...]
Consumer Group "analytics"
Consumer Group "analytics":
βββ Consumer 1 β Partition 0
βββ Consumer 2 β Partition 1
βββ Consumer 3 β Partition 2
Here, the group processes the topic in parallel, with each consumer handling a different partition.
Consumer Group "notifications"
Consumer Group "notifications":
βββ Consumer A β All partitions (independent processing)
βββ Consumer B β All partitions (independent processing)
βββ Consumer C β All partitions (independent processing)
Another group can read the same data but perform different processing (e.g., notifications vs analytics).
π§ Consumer Group Coordination
Group Coordinator
Kafka designates one broker as the group coordinator for each consumer group. It:
- Manages consumer group membership
- Handles partition assignment and rebalancing
- Stores and updates consumer group metadata
Rebalancing Process (High-Level)
- Consumer Joins β New consumer starts and joins the group.
- Rebalance Trigger β Coordinator detects membership change (join/leave/failure).
- Stop Processing β Consumers temporarily stop processing.
- Partition Assignment β Coordinator calculates a new assignment.
- Resume Processing β Consumers resume with new partition assignments.
Rebalancing is necessary but should be controlled to avoid constant redistributions.
Rebalancing Strategies
Kafka supports several partition assignment strategies:
- Range
Assigns consecutive partitions to consumers (e.g., 0β3 to C1, 4β7 to C2).
- Round Robin
Distributes partitions evenly in a round-robin fashion.
- Sticky
Tries to minimize partition movement between rebalances.
- Cooperative Sticky (Kafka 2.4+)
Performs incremental rebalancing, reducing pauses and improving stability.
Choosing the right strategy can reduce disruption and improve throughput.
π Offset Management
Offsets tell Kafka:
βFor this consumer group, up to which message in this partition have we successfully processed?β
Offset Storage
- Stored in Kafkaβs internal topic:
__consumer_offsets - Each consumer group has its own offset entries
- Enables independent processing per group
Commit Strategies
- Automatic Commit
enable.auto.commit=true
auto.commit.interval.ms=5000
- Kafka commits offsets automatically at intervals
- Simple, but can cause duplicates or lost messages if the consumer crashes between commit and processing
- Manual Commit (Recommended for control)
Your application decides when to commit, usually after successful processing.
- Synchronous Commit β Blocking, safer, simpler error handling
- Asynchronous Commit β Non-blocking, higher throughput, requires careful error handling
Offset Reset Policies
- `earliest` β Start from the beginning of the partition if no offset is found
- `latest` β Start from the latest message
- `none` β Fail if no offset is found (forces explicit offset handling)
These are used when a consumer group appears for the first time or when offsets are invalid.
β±οΈ Consumer Lag and Monitoring
What is Consumer Lag?
- Lag = Latest offset in the partition β Consumerβs committed offset
- High Lag: Consumer is falling behind
- Zero (or low) Lag: Consumer is keeping up with real-time data
Lag is one of the most important metrics in Kafka systems.
Monitoring Consumer Groups
Common ways to monitor consumer lag and health:
- Kafka Manager β Web-based management tool
- Confluent Control Center β Enterprise monitoring
- JMX Metrics β Built-in metrics for monitoring
- Grafana / Prometheus β Popular combo for dashboards and alerts
Set alerts for high lag to detect slow consumers or insufficient capacity.
π Scaling Consumers Horizontally
Adding More Consumers to a Group
- Scale Up β Start more consumer instances with the same
group.id. - Partition Limit β Only up to one consumer per partition in a group can be active.
- Rebalance β Kafka automatically reassigns partitions across consumers.
- Result β Improved throughput and better fault tolerance (if one consumer fails, others take over).
Best Practices
- Plan partition count with future scaling in mind.
- Try to keep consumer count β€ partition count.
- Avoid frequent restarts/rescaling that cause constant rebalancing.
- Always monitor consumer lag and rebalance frequency.
π Real-World Examples
Example 1: E-commerce Analytics
Topic: user-events (12 partitions)
Analytics Group (4 consumers):
βββ Consumer 1 β Partitions 0, 1, 2
βββ Consumer 2 β Partitions 3, 4, 5
βββ Consumer 3 β Partitions 6, 7, 8
βββ Consumer 4 β Partitions 9, 10, 11
Throughput: 100,000 events/second
Latency: < 100ms
This group processes user behavior in parallel, keeping up with high event volumes.
Example 2: Real-time Notifications
Topic: notifications (6 partitions)
Notification Group (3 consumers):
βββ Consumer A β Partitions 0, 1
βββ Consumer B β Partitions 2, 3
βββ Consumer C β Partitions 4, 5
Processing: Email, SMS, Push notifications
Latency: < 50ms
Here, the group ensures that notification events are processed quickly and independently.
βοΈ Consumer Group Configuration
Key Configuration Parameters
# Group Settings
group.id=my-consumer-group
group.instance.id=consumer-1
# Session Management
session.timeout.ms=30000
heartbeat.interval.ms=3000
# Offset Management
enable.auto.commit=true
auto.commit.interval.ms=5000
auto.offset.reset=latest
# Fetch Settings
fetch.min.bytes=1
fetch.max.wait.ms=500
max.partition.fetch.bytes=1048576
Performance Tuning Tips
- Session Timeout β Balance responsiveness vs stability
- Heartbeat Interval β Frequent heartbeats for faster failure detection
- Fetch Size β Larger fetches for better throughput
- Commit Frequency β Balance between performance and durability
π§± Common Patterns with Consumer Groups
Pattern 1: Fan-out Processing
- Multiple consumer groups process the same data
- Each group has different processing logic (e.g., analytics, notifications, billing)
- Independent scaling and fault tolerance for each group
Pattern 2: Pipeline Processing
- Sequential processing through multiple topics
- Each stage has its own consumer group
- Enables complex data transformations and streaming ETL
Pattern 3: Microservices Integration
- Each microservice has its own consumer group
- Decoupled processing and scaling
- Independent deployment and monitoring per service
π οΈ Troubleshooting Consumer Groups
Common Issues
- Consumer Lag β Consumers are falling behind.
- Frequent Rebalancing β Constant partition movement, causing pauses.
- Offset Management Problems β Lost or duplicate messages.
- Uneven Partition Assignment β Some consumers do much more work than others.
Solutions
- Scale Consumers β Add more consumers or partitions to reduce lag.
- Optimize Processing β Improve consumer processing speed (DB, I/O, logic).
- Tune Configuration β Adjust session timeouts, fetch sizes, and commit strategies.
- Monitor Metrics β Track consumer lag, rebalancing, and error rates.
β Key Takeaways
- A consumer group is how Kafka scales message processing horizontally while preserving partition ordering.
- Partition assignment ensures only one consumer per partition within a group.
- Kafka handles rebalancing, but too many rebalances can hurt performance.
- Offsets and commit strategies determine your guarantees around duplicates and data loss.
- Consumer lag is the main health signal for consumer groups.
- Consumer groups power patterns like fan-out, pipelines, and microservice integration.
π Whatβs Next?
After understanding consumer groups conceptually, the next step is to work with real code:
Build Kafka consumers, run them as a group, observe rebalancing and lag, and experiment with offset commit strategies.
Thatβs where youβll see these concepts turn into hands-on experience.
Hands-on Examples
Consumer Group Load Balancing Demo
# Consumer Group Load Balancing Example
## Scenario: E-commerce Event Processing
Topic: "order-events" (6 partitions)
Consumer Group: "order-processors"
## Initial Setup (3 consumers):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Consumer Group: order-processors β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Consumer 1 Consumer 2 Consumer 3 β
β β β β β
β Partition 0 Partition 2 Partition 4 β
β Partition 1 Partition 3 Partition 5 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## Adding Consumer 4 (Rebalancing):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Consumer Group: order-processors β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Consumer 1 Consumer 2 Consumer 3 Consumer 4 β
β β β β β β
β Partition 0 Partition 1 Partition 3 Partition 5β
β Partition 2 Partition 4 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## Processing Flow:
1. Order created β Topic "order-events"
2. Message routed to partition (hash(order_id) % 6)
3. Consumer processes message from assigned partition
4. Offset committed after successful processing
5. Next message processed from same partition
## Benefits:
- Parallel processing across partitions
- Automatic load balancing
- Fault tolerance through replication
- Independent scaling of consumersThis example shows how consumer groups automatically distribute partitions among consumers, enabling parallel processing and fault tolerance.