Message Queues & Streaming • Consumer Groups & Load BalancingEasy⏱️ ~3 min
What Are Consumer Groups and How Do They Achieve Load Balancing?
Consumer groups are a pull based pattern where multiple worker processes cooperate to consume a partitioned stream. Each partition is assigned to at most one consumer in the group at any moment, giving you parallel processing while preserving order within each partition. The key insight is that the partition becomes your unit of parallelism: records within a partition are always processed in order, but different partitions can be processed concurrently by different consumers.
The hard constraint is that maximum concurrency equals partition count. If you have 100 partitions, at most 100 consumers can actively process work at once. Deploy 50 consumers and each will handle roughly 2 partitions; deploy 150 consumers and 50 will sit idle. This makes partition count a critical scaling decision at design time.
Load balancing happens through partition assignment coordinated by a group coordinator (the broker or control plane). When membership changes (a consumer joins, leaves, or crashes after missing heartbeats), the coordinator triggers rebalancing: it revokes old assignments and redistributes partitions among surviving members. Strategies like round robin aim for equal partition counts per consumer, while sticky assignment tries to minimize partition movement to reduce disruption. LinkedIn processes trillions of messages daily using this pattern across fleets with hundreds of thousands of partitions.
Each consumer group tracks offsets (per partition cursors) separately, so a crashed consumer can resume from the last committed position. This gives you at least once delivery by default because a crash between processing and committing means replaying some records. Exactly once requires idempotent writes or transactional coupling of processing and offset commits.
💡 Key Takeaways
•Maximum concurrency is bounded by partition count: 200 partitions support at most 200 active consumers per group; extra consumers remain idle until rebalancing shifts work.
•Partition assignment preserves ordering within each partition while enabling parallelism across partitions; global ordering across all events requires limiting to one consumer.
•Rebalancing is triggered when consumers join, leave, or miss heartbeats (typically 3 to 10 second session timeout); sticky strategies minimize partition movement to reduce stop the world pauses.
•Offsets are tracked per partition per group, allowing crashed consumers to resume from last commit; default at least once delivery means duplicates on crash, requiring idempotent or transactional processing for exactly once.
•Amazon Kinesis enforces 1 MB/s write and 2 MB/s shared read per shard; enhanced fan out gives each consumer 2 MB/s dedicated read throughput per shard, removing the shared read bottleneck.
•Idle consumers beyond partition count waste resources; autoscaling must use lag and assignment metrics, not just CPU or memory, to avoid scaling up uselessly.
📌 Examples
LinkedIn processes trillions of messages per day and petabytes of data using consumer groups with tens to hundreds of thousands of partitions across large Kafka clusters.
Amazon Kinesis Lambda event source mappings map one Lambda invocation per shard: a 200 shard stream drives up to 200 concurrent consumer invocations per group.
Agoda ingests 1.5 million price updates per minute from a single supplier using consumer groups; standard round robin assignment led to 5 of 40 partitions chronically backlogged due to heterogeneous hardware.