Message Queues & Streaming • Notification System Design (Push Notifications)Medium⏱️ ~3 min
Idempotency, Ordering, and Duplicate Prevention
Duplicate Prevention:
At-least-once delivery allows duplicates during retries. Without idempotency, users receive same OTP or promo multiple times. Store idempotency record before sending, keyed by tenant_id + user_id + campaign_id + collapse_key. Check record before delivery, skip if exists within TTL window (24-48 hours).
Collapse Keys:
Used by both APNs and FCM for deduplication and update coalescing. For shopping cart badge with collapse key "cart_count": if 3 updates arrive (5, 6, 7 items) while device offline, provider delivers only latest (7 items). Use same collapse key in idempotency record to prevent sending intermediate updates that will be discarded.
Ordering Trade-offs:
SQS Standard maximizes throughput but delivers out-of-order. If user places order then cancels, they might see cancel before confirm. FIFO preserves order but caps at ~3K/sec vs unlimited Standard. Compromise: partition-based ordering—hash user_id to 256 partitions, route all messages for that user to same partition. Per-user ordering at scale (10K/sec ÷ 256 = ~40/sec per partition).
Idempotency Storage Math
10K/sec × 86,400 = 864M records/day
With 48h TTL: ~1.7B records steady state. At 100 bytes/record = 170GB in cache. Use Redis with TTL or DynamoDB with item expiry.
💡 Key Takeaways
✓At least once delivery allows duplicates during retries. Store idempotency records keyed by tenant, user, campaign, and collapse key with 24 to 48 hour time to live (TTL). For 10,000 per second, expect roughly 1.7 billion records at steady state (170 gigabytes at 100 bytes each).
✓Collapse keys serve dual purpose: idempotency and update coalescing. For cart badge notifications, use cart as collapse key so providers deliver only latest count (7 items) instead of intermediate updates (5, 6, 7) when device reconnects.
✓First In First Out (FIFO) queues preserve order but cap throughput at roughly 3,000 messages per second. Partition based ordering hashes user identifiers to 256 partitions, giving per user ordering at 10,000 per second (roughly 40 per second per partition).
✓Implement idempotency storage in Redis with TTL or DynamoDB with automatic item expiry. Avoid relational database writes on hot path: each send requires cache check (sub millisecond) versus database round trip (5 to 10 milliseconds), adding prohibitive latency at scale.
✓Apple Push Notification service (APNs) collapse identifiers and Firebase Cloud Messaging (FCM) collapse keys coalesce updates at provider layer. Use same collapse key in backend idempotency logic to prevent sending intermediate updates that providers will discard.
📌 Interview Tips
1Shopping app sends cart updates with collapse key cart_count. User adds 3 items while offline; backend sends 3 notifications (5, 6, 7 items) but Firebase Cloud Messaging (FCM) delivers only latest (7) when device reconnects, preventing notification spam.
2Banking app uses idempotency key tenant:bank1 user:U123 campaign:fraud_alert_txn_456 to prevent duplicate fraud alerts during retry storms. Redis stores key with 24 hour TTL; duplicate retries are dropped in sub millisecond cache lookup.
3Social media platform partitions 10 million users into 256 Simple Queue Service (SQS) First In First Out (FIFO) queues by user_id hash. Per user notifications stay ordered (post, comment, like) while total throughput reaches 10,000 per second (roughly 40 per second per queue).