Message Queues & Streaming • Notification System Design (Push Notifications)Medium⏱️ ~3 min
Push Notification System Architecture Overview
A production grade notification system decouples event producers from delivery using asynchronous message queues and channel specific workers. When an application needs to send notifications, it hits an ingress service that validates requests, authenticates the caller, enforces rate limits, and publishes events to priority separated topics rather than directly calling Apple Push Notification service (APNs) or Firebase Cloud Messaging (FCM). This asynchronous design prevents upstream services from blocking on slow delivery and allows the system to buffer spikes.
Downstream workers consume from these topics and handle the complexity of notification delivery. They resolve user and device preferences from a cache backed preference store, select appropriate channels (push, email, Short Message Service (SMS), in app), render templates with localization, enforce per user and per campaign limits, and finally deliver via platform providers. The system guarantees at least once delivery through retries and dead letter queues (DLQs), while idempotency keys prevent duplicate notifications from reaching users.
Push notifications follow a two step path: your server to the provider (APNs or FCM), then provider to device. Providers maintain persistent connections to millions of devices and apply operating system specific policies. Android Doze mode defers low priority messages to save battery, while iOS uses collapse identifiers to coalesce multiple updates of the same type. The key architectural decision is isolating high priority flows (one time passwords, fraud alerts) from bulk promotional campaigns using separate topics and worker pools, ensuring bounded latency even under heavy load.
For capacity planning with 1 million daily active users (DAU) sending 10 notifications per day, expect roughly 10 million sends per day or 115 queries per second (QPS) average. Peak traffic can spike 10 to 100 times higher during campaigns. Storage grows at approximately 2 gigabytes (GB) per day assuming 200 bytes per notification for tracking and audit.
💡 Key Takeaways
•Asynchronous queue based architecture decouples producers from delivery, allowing systems to buffer spikes and prevent blocking. At 10,000 notifications per second peak, you need roughly 100 workers processing 100 messages per second each.
•At least once delivery semantics require idempotency keys to prevent duplicates. Store idempotency records keyed by tenant, user, campaign, and collapse key with 24 to 48 hour time to live (TTL).
•Priority lane isolation is critical: separate topics and worker pools for high priority (one time password, fraud) versus promotional traffic ensure sub 1 second latency for critical flows even during bulk campaigns.
•Push delivery is two phase: server to provider typically takes tens of milliseconds, but provider to device varies with radio state and operating system power management. Android Doze can defer normal priority messages for minutes.
•Storage splits into hot cache layer for preferences (single digit millisecond reads via DynamoDB with DAX cache) and cold append only tracking store. For 10 million sends per day, expect roughly 2 GB daily growth in tracking data.
📌 Examples
AWS reference architecture: Events published to Amazon Simple Notification Service (SNS) Standard topics (30,000 messages per second capacity) fan out to Amazon Simple Queue Service (SQS) queues consumed by Lambda or container workers. Preferences stored in DynamoDB with DAX cache (minimum 3 nodes) for sub 10 millisecond reads at 2,000 to 5,000 events per second.
Apple APNs uses HTTP/2 with multiplexed streams and token based authentication. Collapse identifiers coalesce updates (for example, cart count badges) so only the latest value reaches the device, reducing notification fatigue.
Google FCM provides high versus normal priority levels. High priority wakes devices immediately but misuse gets deprioritized by Google. Normal priority respects Doze mode and can defer delivery by minutes to hours depending on device state.