Distributed Systems PrimitivesIdempotency & Retry PatternsHard⏱️ ~3 min

Production Implementation: Request Token Idempotency at Scale

Implementing request token idempotency for client facing unsafe operations such as payments, orders, and resource creation requires careful design of the deduplication store, token lifecycle, and concurrency control. The client generates a unique operation token, typically a UUID Version 4, and sends it with the request in a header such as Idempotency-Key. The server maintains a deduplication store keyed by token, which can be a relational database table with a unique index, a distributed cache like Redis with atomic operations, or a dedicated key value store. On first sighting, the server atomically reserves the token with an initial status such as received or processing, executes the business operation, records the final status such as succeeded or failed and response metadata, and returns the result to the client. On duplicate requests with the same token, the server looks up the token, finds the existing record, and returns the previously recorded outcome without re-executing the business logic. Enforcing a uniqueness constraint on the token column is critical to prevent concurrent double execution. Two requests with the same token arriving simultaneously must be serialized: one will successfully insert or compare and swap the token record, and the other will fail the uniqueness check and retry the lookup. Store a hash of the request parameters alongside the token to detect and reject parameter drift. If the same idempotency key arrives with different request parameters such as a different payment amount, the server must reject the request with an error indicating parameter mismatch, because allowing parameter drift breaks the idempotency guarantee and risks unintended state changes. Stripe implements this check: duplicate keys with matching parameters return the original result, but duplicate keys with mismatched parameters return an error. Token lifecycle and storage sizing are critical trade-offs. Associate a time to live with each token based on your retry horizon: minutes for interactive APIs, hours to 24 hours for payment and order systems that may experience delayed retries or client side queueing. A payments service at 1,000 requests per second with a 24 hour window stores up to approximately 86.4 million keys. At 200 bytes per record including key, parameter hash, status, timestamps, and a response pointer, raw storage is approximately 17 gigabytes per day plus index overhead. Shard the deduplication store by token hash to avoid hot partitions and enable horizontal scaling. Store only minimal response metadata needed to reconstruct or redirect the same response; for large response bodies, store a reference such as an object identifier or ETag rather than the full payload. Implement time to live based eviction or scheduled cleanup jobs to cap storage growth. If the deduplication window is shorter than the retry horizon such as a client retrying hours later but the server evicted the key after 1 hour, duplicates slip through, so align key time to live with maximum expected retry delays including queued or delayed retries.
💡 Key Takeaways
Enforce a uniqueness constraint on the idempotency token to serialize concurrent requests and prevent race conditions that could double execute operations.
Store a hash of request parameters and reject duplicate tokens with mismatched parameters to prevent misuse and guarantee that the same token always represents the same logical operation.
At 1,000 requests per second with 24 hour deduplication, storage reaches approximately 86.4 million keys and 17 gigabytes per day, requiring sharding by token hash and time to live eviction.
Stripe returns the original result for duplicate idempotency keys with matching parameters, but explicitly rejects requests where the same key arrives with different parameters such as a different charge amount.
Align token time to live with maximum retry horizons including delayed network replays and client side queueing; if time to live is shorter than retry delays, duplicates can slip through after eviction.
Store minimal response metadata or a reference such as an object identifier rather than full response bodies to reduce storage and cache pressure, especially for large payloads.
📌 Examples
Stripe idempotency: POST /v1/charges with Idempotency-Key: abc123 and amount: 5000. Server inserts {key: abc123, params_hash: hash(amount=5000), status: processing, charge_id: ch_xyz} with unique constraint on key. Duplicate with same key and amount returns ch_xyz; duplicate with amount: 6000 returns error.
High throughput sharding: Payment service shards deduplication store across 16 partitions by hash(idempotency_key) mod 16. At 10,000 requests per second, each shard handles approximately 625 requests per second and stores approximately 5.4 million keys per day with 24 hour time to live.
Concurrency race: Two threads receive the same idempotency key simultaneously. Thread A attempts INSERT and succeeds. Thread B attempts INSERT and fails on unique constraint violation, then immediately SELECT to retrieve the result recorded by Thread A.
Amazon order token: Client generates token order_abc123 for checkout. Order service attempts INSERT into idempotency_tokens with unique constraint. On success, creates order and stores order_id. On duplicate token, SELECT returns existing order_id and responds with same order confirmation.
← Back to Idempotency & Retry Patterns Overview
Production Implementation: Request Token Idempotency at Scale | Idempotency & Retry Patterns - System Overflow