Message Queues & Streaming • Notification System Design (Push Notifications)Medium⏱️ ~3 min
Idempotency, Ordering, and Duplicate Prevention
At least once delivery semantics in message queues guarantee messages are delivered but allow duplicates during retries or reprocessing after consumer failures. Without idempotency, a user might receive the same one time password or promotional offer multiple times, degrading experience and wasting provider costs. The solution is storing an idempotency record before sending each notification, keyed by a combination of tenant identifier, user identifier, campaign identifier, and collapse key. When a duplicate message arrives, the system checks this record and skips delivery if the notification was already sent within a configurable time to live (TTL) window (typically 24 to 48 hours).
Collapse keys (used by both Apple Push Notification service (APNs) collapse identifiers and Firebase Cloud Messaging (FCM) collapse keys) serve a dual purpose: deduplication and update coalescing. For a shopping cart badge, the collapse key might be cart count. If three updates arrive (5 items, 6 items, 7 items) while the device is offline, the provider delivers only the latest (7 items) when the device reconnects, reducing notification fatigue. Your backend should use the same collapse key in the idempotency record to prevent sending intermediate updates that will be discarded anyway.
Ordering is the harder problem. Standard message queues like Amazon Simple Queue Service (SQS) Standard maximize throughput but deliver messages out of order. If a user places an order, then cancels it, they might see cancel delivered before order confirmed if workers process in parallel. First In First Out (FIFO) queues preserve order but at significant cost: Amazon Web Services (AWS) Simple Queue Service (SQS) FIFO caps at roughly 3,000 messages per second versus unlimited for Standard. The compromise is partition based ordering: hash user identifier to a partition (for example, 256 partitions), route all messages for that user to the same partition, and process partitions in parallel. This gives per user ordering at scale (10,000 per second across 256 partitions equals roughly 40 per second per partition, well under limits).
Implement idempotency storage efficiently. Writing to a relational database on every send adds latency and database load. Instead, use a distributed cache like Redis with a TTL or a key value store like DynamoDB with automatic item expiry. For 10,000 notifications per second, expect roughly 864 million idempotency records per day; with 48 hour TTL, steady state holds roughly 1.7 billion records. At 100 bytes per record, that is 170 gigabytes (GB) in cache, easily handled by a moderately sized Redis cluster or DynamoDB table with on demand pricing.
💡 Key Takeaways
•At least once delivery allows duplicates during retries. Store idempotency records keyed by tenant, user, campaign, and collapse key with 24 to 48 hour time to live (TTL). For 10,000 per second, expect roughly 1.7 billion records at steady state (170 gigabytes at 100 bytes each).
•Collapse keys serve dual purpose: idempotency and update coalescing. For cart badge notifications, use cart as collapse key so providers deliver only latest count (7 items) instead of intermediate updates (5, 6, 7) when device reconnects.
•First In First Out (FIFO) queues preserve order but cap throughput at roughly 3,000 messages per second. Partition based ordering hashes user identifiers to 256 partitions, giving per user ordering at 10,000 per second (roughly 40 per second per partition).
•Implement idempotency storage in Redis with TTL or DynamoDB with automatic item expiry. Avoid relational database writes on hot path: each send requires cache check (sub millisecond) versus database round trip (5 to 10 milliseconds), adding prohibitive latency at scale.
•Apple Push Notification service (APNs) collapse identifiers and Firebase Cloud Messaging (FCM) collapse keys coalesce updates at provider layer. Use same collapse key in backend idempotency logic to prevent sending intermediate updates that providers will discard.
📌 Examples
Shopping app sends cart updates with collapse key cart_count. User adds 3 items while offline; backend sends 3 notifications (5, 6, 7 items) but Firebase Cloud Messaging (FCM) delivers only latest (7) when device reconnects, preventing notification spam.
Banking app uses idempotency key tenant:bank1 user:U123 campaign:fraud_alert_txn_456 to prevent duplicate fraud alerts during retry storms. Redis stores key with 24 hour TTL; duplicate retries are dropped in sub millisecond cache lookup.
Social media platform partitions 10 million users into 256 Simple Queue Service (SQS) First In First Out (FIFO) queues by user_id hash. Per user notifications stay ordered (post, comment, like) while total throughput reaches 10,000 per second (roughly 40 per second per queue).