Message Queues & StreamingNotification System Design (Push Notifications)Hard⏱️ ~3 min

Priority Lane Isolation and Throughput Planning

Shared notification pipelines create a critical failure mode: bulk promotional campaigns can starve time sensitive traffic like one time passwords or fraud alerts. The solution is priority lane isolation using separate message queue topics and dedicated worker pools. For a system targeting 10,000 notifications per second at peak, you might allocate topics into high priority (one time password, fraud, 10% of traffic), medium priority (transactional updates, 30%), and low priority (marketing, 60%). Each lane gets dedicated consumers sized to handle its peak independently. Throughput planning starts with worker capacity. A single worker using batched provider Application Programming Interfaces (APIs) can reliably process roughly 100 notifications per second (including preference lookup, template rendering, and provider delivery with retries). To sustain 10,000 per second, you need approximately 100 workers assuming even distribution. However, with priority lanes, you size each pool for its peak: high priority might need 34 workers for 3,333 per second, medium needs 34, and low priority needs 67 for 6,667 per second. Over provisioning high priority lanes by 50% provides headroom when bulk campaigns spike low priority queues. Batching dramatically improves efficiency but adds latency. If you batch 10 notifications before calling provider APIs, worker count drops from 100 to roughly 10 for the same throughput because network round trips dominate processing time. The trade off is latency: batches must wait for fill or timeout (typically 50 to 200 milliseconds). High priority lanes often disable batching to achieve sub 100 millisecond processing, accepting higher worker costs, while low priority lanes batch aggressively to minimize infrastructure spend. Amazon Web Services (AWS) Simple Notification Service (SNS) and Simple Queue Service (SQS) demonstrate these limits concretely. SNS Standard topics sustain roughly 30,000 messages per second for fanout, while First In First Out (FIFO) topics drop to 3,000 per second due to ordering guarantees. SQS Standard queues scale horizontally without limit, but FIFO enforces per message group ordering at reduced throughput. If you need 10,000 per second with some ordering (say, per user), partition users into 256 message groups in an SQS FIFO queue, giving each group roughly 40 per second well within limits while preserving per user sequencing.
💡 Key Takeaways
Worker capacity planning: a single worker processes approximately 100 notifications per second with batched APIs. For 10,000 per second peak, provision roughly 100 workers, or partition into priority lanes with independent sizing (34 high, 34 medium, 67 low).
Batching reduces worker count by 10x (from 100 to 10 workers for same throughput) by amortizing network round trips, but adds 50 to 200 millisecond latency waiting for batch fill or timeout. Disable batching for high priority lanes needing sub 100 millisecond processing.
Amazon Web Services (AWS) Simple Notification Service (SNS) Standard topics sustain 30,000 messages per second, but First In First Out (FIFO) drops to 3,000 per second with ordering. Partition users into 256 message groups to get 10,000 per second with per user ordering at roughly 40 per second per group.
Over provision high priority worker pools by 50% to absorb spikes when low priority campaigns saturate shared infrastructure like caches or databases. Monitor queue depth and age: alert if high priority exceeds 5 to 10 seconds.
Failure mode: without lane isolation, a single bulk campaign sending 1 million notifications can backlog the entire system for minutes, causing one time passwords to time out and fraud alerts to miss Service Level Objectives (SLOs). Separate topics prevent this cascading failure.
📌 Examples
Banking notification system with 1 million Daily Active Users (DAU): high priority lane handles 10,000 one time passwords per day (average 0.1 per second, peak 10 per second during login storms) with 2 workers and zero batching for sub 200 millisecond latency.
E-commerce platform during Black Friday: low priority marketing lane hits 50,000 per second peak (500 workers with batch size 10), while high priority order confirmation lane maintains 1,000 per second (10 dedicated workers, batch size 1) with isolated capacity.
Ride sharing app separating driver arrival notifications (high priority, 5,000 per second peak, 50 workers, no batching) from promotional offers (low priority, 20,000 per second, 200 workers, batch 10) to guarantee sub 1 second driver notifications even during campaign blasts.
← Back to Notification System Design (Push Notifications) Overview
Priority Lane Isolation and Throughput Planning | Notification System Design (Push Notifications) - System Overflow