Stream Processing Architectures • Kafka Streams ArchitectureHard⏱️ ~3 min
Exactly Once Semantics and State Recovery
The Guarantee Breakdown:
Exactly once semantics (EOS) in Kafka Streams means that reading input records, updating state stores, and writing output records happens as one atomic operation from Kafka's perspective. Even if your application crashes mid processing, downstream consumers will never see duplicate or missing results, and state stores will remain consistent.
This relies on three Kafka features working together. First, idempotent producers ensure that retries don't create duplicates in output topics. Second, transactional writes group multiple topic writes into a single atomic commit. Third, read committed isolation level means downstream consumers only see fully committed transactions.
How It Actually Works:
Each task groups its work into transactions. A task reads a batch of records from input partitions, processes them through the topology, updates local state stores, writes to output topics and changelog topics, then commits the transaction. This commit atomically advances the consumer offset and makes all writes visible. If the application crashes before commit, the entire transaction is aborted. On recovery, processing restarts from the last committed offset.
The changelog topic is crucial here. Every state store update is written both to the local disk and to a Kafka changelog topic within the same transaction. When a task moves to a different instance due to rebalancing or failure, the new owner first restores state by reading the entire changelog up to the last committed offset. This makes state recovery deterministic and bounded.
State Store Recovery Reality:
Recovery time depends on state store size and changelog compaction. A well compacted changelog might be 20 GB for a state store tracking millions of keys. At typical restore rates of 50 to 100 MB per second per task, that's 3 to 7 minutes of downtime for that partition. If your state stores grow to hundreds of gigabytes because of large retention windows or high cardinality keys, recovery can stretch to tens of minutes.
This is why teams carefully tune state store sizes. Windowed aggregations need bounded retention. High cardinality keys might require key space reduction or sampling. The goal is keeping state stores small enough that p99 recovery time stays under SLA requirements.
Exactly Once Performance Cost
5-15%
THROUGHPUT LOSS
Higher
P99 LATENCY
✓ In Practice: Many teams use exactly once only for critical pipelines where duplicates are extremely costly, like financial transactions or billing. For analytics or monitoring pipelines, at least once mode with idempotent downstream processing offers better throughput.
Commit Interval Trade Off:
Kafka Streams commits transactions periodically based on commit.interval.ms. A smaller interval (for example 100 ms) means less reprocessing after a crash but more transaction overhead. A larger interval (for example 1000 ms) improves throughput but means up to 1 second of records must be reprocessed on failure. Teams balance this based on acceptable Recovery Point Objective (RPO) and throughput requirements.💡 Key Takeaways
✓Exactly once semantics combines idempotent producers, transactional writes, and read committed isolation to guarantee atomic read process write cycles
✓Changelog topics replicate every state update within transactions, enabling deterministic state recovery by replaying up to the last committed offset
✓State store size directly impacts recovery time: 20 GB stores take 3 to 7 minutes to restore at 50 to 100 MB per second
✓Transaction overhead costs 5 to 15 percent throughput, leading teams to selectively use exactly once only for critical pipelines
✓Commit interval tuning balances throughput versus reprocessing cost: smaller intervals reduce RPO but increase transaction overhead
📌 Examples
1A financial transaction pipeline uses exactly once mode to ensure no duplicate charges, accepting the 10 percent throughput reduction for correctness guarantees
2An analytics pipeline processing clickstream data runs in at least once mode, using idempotent aggregations downstream to handle occasional duplicates while maximizing throughput