What Are Consumer Groups and How Do They Achieve Load Balancing?
The Fundamental Parallelism Constraint
The maximum parallelism in a consumer group equals the partition count. If a topic has 100 partitions, at most 100 consumers can actively process work simultaneously. Each partition is assigned to exactly one consumer; no two consumers share a partition. Deploy 50 consumers and each handles approximately 2 partitions. Deploy 150 consumers and 50 sit idle, waiting for partitions that will never come. This makes partition count a critical decision at design time because it sets the ceiling for horizontal scaling.
How Partition Assignment Works
A group coordinator (typically running on one of the brokers) manages partition assignment. When group membership changes: a consumer joins, leaves gracefully, or crashes after missing heartbeats (periodic signals that indicate the consumer is alive), the coordinator triggers a rebalance. During rebalance, the coordinator revokes existing partition assignments and redistributes partitions among surviving members. This ensures that every partition has exactly one active consumer, preventing duplicate processing while maximizing parallelism. The assignment algorithm (round-robin, range, or sticky) determines how partitions map to consumers.
Offset Tracking for Fault Tolerance
Each consumer group tracks offsets (per-partition cursors indicating the position of the next record to consume) independently. When a consumer processes records, it periodically commits its offset to durable storage. If that consumer crashes, the coordinator reassigns its partitions to surviving consumers, which resume from the last committed offset rather than from the beginning. This checkpoint-and-resume mechanism provides fault tolerance. However, the gap between processing and committing creates the possibility of duplicate processing: records processed after the last commit but before the crash will be replayed when a new consumer takes over.
At-Least-Once Delivery Default
The standard pattern is process-then-commit: consume a batch of records, process them, then commit the offset. This provides at-least-once delivery: every record is processed at least once, but crashes between processing and committing result in some records being processed twice. The alternative, commit-then-process, risks data loss if a crash occurs after committing but before processing. Most systems prefer duplicate processing (which downstream systems can handle with idempotency) over data loss (which is often unrecoverable).