Message Queues & Streaming • Notification System Design (Push Notifications)Medium⏱️ ~3 min
Push Notification System Architecture Overview
Definition
A push notification system delivers messages from your servers to users' devices—the "Your driver is arriving" alert, the "Someone liked your photo" ping, or the "Your order shipped" update.
💡 Key Takeaways
✓Asynchronous queue based architecture decouples producers from delivery, allowing systems to buffer spikes and prevent blocking. At 10,000 notifications per second peak, you need roughly 100 workers processing 100 messages per second each.
✓At least once delivery semantics require idempotency keys to prevent duplicates. Store idempotency records keyed by tenant, user, campaign, and collapse key with 24 to 48 hour time to live (TTL).
✓Priority lane isolation is critical: separate topics and worker pools for high priority (one time password, fraud) versus promotional traffic ensure sub 1 second latency for critical flows even during bulk campaigns.
✓Push delivery is two phase: server to provider typically takes tens of milliseconds, but provider to device varies with radio state and operating system power management. Android Doze can defer normal priority messages for minutes.
✓Storage splits into hot cache layer for preferences (single digit millisecond reads via DynamoDB with DAX cache) and cold append only tracking store. For 10 million sends per day, expect roughly 2 GB daily growth in tracking data.
📌 Interview Tips
1AWS reference architecture: Events published to Amazon Simple Notification Service (SNS) Standard topics (30,000 messages per second capacity) fan out to Amazon Simple Queue Service (SQS) queues consumed by Lambda or container workers. Preferences stored in DynamoDB with DAX cache (minimum 3 nodes) for sub 10 millisecond reads at 2,000 to 5,000 events per second.
2Apple APNs uses HTTP/2 with multiplexed streams and token based authentication. Collapse identifiers coalesce updates (for example, cart count badges) so only the latest value reaches the device, reducing notification fatigue.
3Google FCM provides high versus normal priority levels. High priority wakes devices immediately but misuse gets deprioritized by Google. Normal priority respects Doze mode and can defer delivery by minutes to hours depending on device state.