Offset Management and At Least Once vs Exactly Once Semantics
What Offsets Are and Why They Matter
Offsets are per-partition cursors that track how far a consumer has progressed through a partition. Each record in a partition has a unique sequential offset (0, 1, 2, ...). When a consumer commits offset 1000, it is declaring: "I have successfully processed all records up to and including offset 999; the next time I consume this partition, start at offset 1000." Consumer groups store offsets in durable storage (typically a special internal topic), allowing any consumer that takes over a partition to resume from the last committed position rather than reprocessing from the beginning.
The Commit Timing Trade-off
The timing of offset commits relative to processing creates a fundamental trade-off between data loss and duplicate processing. Commit-before-process: If the consumer commits offset 1000, then crashes before processing record 999, that record is lost forever. The next consumer starts at 1000, skipping 999. Process-before-commit: If the consumer processes records 1000-1050, then crashes before committing, the next consumer resumes at 1000 and reprocesses records 1000-1050. This is the standard pattern because duplicate processing can be handled (via idempotency or deduplication), while data loss often cannot be recovered.
At-Least-Once Delivery Semantics
Process-then-commit provides at-least-once delivery: every record is guaranteed to be processed at least once, but some records may be processed multiple times on failure. This is the default behavior and is acceptable when downstream systems can tolerate duplicates. Examples: updating a database with idempotent writes (same update applied twice has same effect), sending events to a deduplication-capable system, or incrementing counters where slight overcounting is acceptable. The replay window (records that might be duplicated) spans from the last committed offset to the point of crash.
Commit Frequency Tuning
Commit frequency trades off replay window size against broker load. Frequent commits (every 100 records or 5 seconds) minimize the replay window: a crash replays at most 100 records. However, each commit is a write to the offset storage, adding latency and broker load. Batched commits (every 10,000 records or 60 seconds) improve throughput by 10-30% but increase the replay window: a crash could replay 10,000 records. Choose based on your duplicate tolerance and throughput requirements.
Exactly-Once Without Transactions
Many systems achieve effectively exactly-once processing without complex distributed transactions. The key insight: if you store the offset and the result atomically in the same destination database, you get exactly-once semantics. When resuming, check the stored offset in the destination; if already processed, skip. The outbox pattern (write business result and offset to database, then asynchronously publish) provides this guarantee. Natural idempotency in the domain (updating a row to a specific state, setting a flag) also eliminates duplicate concerns. Full transactional exactly-once (coordinating offset commits with producer writes) adds significant complexity and latency; prefer simpler approaches when possible.