Scaling Throughput with Partitions: Cloud Service Limits

Partition-Based Throughput Scaling:

By spreading keys across many partitions, systems increase parallelism and aggregate throughput: 10 partitions with 1,000 messages per second each yield 10,000 messages per second total. Each partition maintains its own ordered log and dedicated consumer, allowing independent parallel processing.

Cloud Service Limits:Per-Partition Throughput Limits
1MB/s
KINESIS SHARD
3K/s
SQS FIFO (BATCHED)


Amazon Kinesis shards support up to 1MB/s write and 2MB/s read, or 1,000 records/sec write. Azure Event Hubs throughput units provide ~1MB/s ingress and 2MB/s egress each. SQS FIFO caps at 3,000 messages/sec with batching per queue.

Capacity Planning:

If your consumer can process 500 messages per second per partition while preserving serial order, and you need 10,000 messages per second total, you need at least 20 partitions. Add 30-50% headroom for rebalances and spikes. LinkedIn runs clusters with millions of partitions to handle trillions of messages daily.

Scaling Trade-offs:

Adding partitions is not free. Hash-based mapping changes when N changes, remapping many keys to different partitions. Consumer group rebalances pause consumption temporarily. Start with fewer, hotter partitions to minimize coordination overhead, then scale out as load grows.

💡 Key Takeaways

✓Partitions scale throughput linearly with partition count. 10 partitions at 1,000 messages per second each equals 10,000 messages per second total; doubling partitions doubles aggregate throughput while maintaining per key order.

✓Cloud services have hard per partition limits. Kinesis shards: 1 MB per second write, 2 MB per second read, 1,000 records per second write. Event Hubs TUs: roughly 1 MB per second ingress, 2 MB per second egress. SQS FIFO: 3,000 messages per second per queue with batching.

✓Capacity plan from consumer processing rate. partitions equals target throughput divided by per consumer throughput. If consumers handle 500 messages per second and you need 10,000 messages per second, provision 20+ partitions with headroom.

✓Adding partitions disrupts key mapping and triggers rebalances. hash of key mod N changes when N changes, remapping keys to new partitions. Consumer rebalances pause processing; cooperative protocols minimize disruption but still add latency.

✓Start conservative and scale out based on observed load. Fewer initial partitions reduce coordination overhead; LinkedIn commonly starts new topics at 4 to 32 partitions, monitoring per partition throughput and lag before expanding.

📌 Interview Tips

1Kinesis stream scaling: application needs 50 MB per second write throughput, each shard supports 1 MB per second write, provision 50+ shards with headroom (typically 65 to 75 shards for 30 to 50 percent buffer)

2Azure Event Hubs: event ingestion at 80 MB per second requires 80 TUs at 1 MB per second ingress each; commonly provision 16 partitions (5 MB per second per partition) to distribute load and enable 16 concurrent consumers

3Amazon SQS FIFO: order processing needs 9,000 messages per second with ordering per order ID; use 3+ message groups (each supporting 3,000 messages per second) and partition orders across groups by hash of order ID

← Back to Message Ordering & Partitioning Overview