Message Queues & Streaming • Consumer Groups & Load BalancingMedium⏱️ ~3 min
Offset Management and At Least Once vs Exactly Once Semantics
Offsets are per partition cursors that track how far a consumer has progressed. Each consumer group maintains its own set of offsets, allowing multiple groups to consume the same topic independently. When a consumer crashes, the group coordinator reassigns its partitions to survivors, which resume from the last committed offset. This commit model determines your delivery guarantees and recovery behavior.
The fundamental tension is between processing and committing. Commit before processing and a crash loses data (gaps). Commit after processing and a crash causes duplicates (records between last commit and crash are replayed). At least once delivery is the default: process then commit, accepting duplicates on failure. Most systems tolerate duplicates by making downstream writes idempotent (upserts, deduplication tables with event IDs). Exactly once requires either transactional coupling (read process write in one atomic transaction) or idempotent sinks that naturally deduplicate.
Commit frequency is a critical tuning parameter. Frequent commits (every 100 records or every 5 seconds) minimize replay window on crash but increase overhead and broker load. Batched commits (every 10000 records or every 60 seconds) improve throughput by 10 to 30 percent but increase replay cost on failure. For example, committing every 10000 records with 2000 records per second throughput means up to 5 seconds of replay on crash. Monitor commit latency and failure rates; spikes indicate broker contention or network issues.
Exactly once semantics in Kafka require transactional producers and consumers, idempotent producers (to avoid duplicates from retries), and isolation level set to read committed. This prevents consumers from seeing uncommitted (in flight) records but adds latency (typically 10 to 50 milliseconds) and limits some operational flexibility. Many systems achieve effective exactly once without full transactions by storing offsets and results atomically in the same sink database, using outbox patterns, or relying on natural idempotency in the domain.
💡 Key Takeaways
•At least once is default: commit after processing accepts duplicates on crash; the replay window equals commit interval times processing rate (commit every 10000 records at 2000 rec/s means up to 5 seconds replay).
•Commit frequency trades off throughput versus recovery cost: committing every 5 seconds or 100 records minimizes replay but increases broker load; batching to 60 seconds or 10000 records improves throughput 10 to 30 percent but increases replay window.
•Exactly once in Kafka requires transactional producers and consumers with read committed isolation, adding 10 to 50 milliseconds latency; many systems achieve effective exactly once with idempotent sinks or outbox patterns instead.
•Store offsets and processing results atomically in the same database to avoid write versus offset inconsistencies; this is simpler than full transactions and works for many use cases.
•Monitor commit latency and failure rates as leading indicators of broker contention, network issues, or rebalancing storms; spikes correlate with downstream write latency that slows processing.
•Idempotent writes (upserts with immutable event IDs, deduplication tables) are cheaper than transactions and sufficient for most at least once deployments; design your data model to tolerate replays naturally.
📌 Examples
A financial ledger system commits offsets and balance updates in the same PostgreSQL transaction using an outbox table, achieving exactly once without Kafka transactions.
An analytics pipeline commits every 10000 records; at 5000 rec/s sustained rate, a crash causes up to 2 seconds of replay, which is acceptable given idempotent upserts into the data warehouse.
Kafka transactions with read committed isolation add 20 milliseconds P99 latency but eliminate duplicates; used for critical payment processing where duplicates would cause financial errors.