Message Queues & Streaming • Message Ordering & PartitioningMedium⏱️ ~2 min
Partition Key Design and Hot Partition Problem
Choosing Partition Keys:
The partition key choice fundamentally shapes both correctness and performance. Choose keys that align with business invariants requiring order: account ID for financial transactions, order ID for order state transitions, device ID for telemetry streams. The key ensures all state changes for that entity flow through a single partition in sequence.
Cardinality Requirements:
High cardinality is critical for load distribution. A rule of thumb is maintaining at least 10 to 20 times more active keys than partitions. With 100 partitions, aim for 1,000+ active keys to smooth the distribution. Low cardinality keys (like status codes or country codes) create severe imbalance.
⚠️ Common Pitfall: A celebrity user generating 10,000 events/sec while average users generate 1/sec creates a hot partition handling 10,000x more load. Amazon reports that even 1% of keys being hot can degrade p99 latencies by 10x or more.
Hot Partition Mitigation:
You can split hot keys using sub-partitioning (key = entityID + hash of operation bucket), but this trades away strict per-key order for throughput. Microsoft Azure Event Hubs users commonly monitor per-partition throughput and redesign keying when observing skew.💡 Key Takeaways
✓Partition keys must align with ordering invariants. Use entity identifiers (account ID, order ID, device ID) where business logic requires sequential processing of state changes for that entity.
✓Maintain 10 to 20x key cardinality vs partition count. With 100 partitions, 1,000+ active keys smooth distribution; low cardinality keys like boolean flags or country codes create severe imbalance.
✓Hot keys cause head of line blocking for entire partition. One celebrity user at 10,000 events per second saturates the partition, stalling all other keys hashed to it and degrading p99 latency by 10x or more.
✓Sub partitioning trades order for throughput. Using key equals entityID plus bucket allows spreading hot entities across partitions but breaks strict per entity ordering; only viable when business rules permit.
✓Monitor per partition skew with concrete metrics. Track Gini coefficient or p95/p99 key volume share; alert when single partition exceeds 2x average throughput or accumulates persistent lag.
📌 Interview Tips
1Amazon SQS FIFO uses message group ID as partition key: order queue with high cardinality order IDs distributes well, but low cardinality status updates (pending, shipped, delivered) create 3 hot groups
2LinkedIn activity streams: partition by member ID works for 900M+ users, but company pages with millions of followers use sub keys (pageID plus follower bucket) to distribute fan out writes
3Gaming leaderboards: partitioning by player ID maintains per player order, but global top 10 players generate 100x more events; solution uses playerID plus event type hash to parallelize while accepting relaxed ordering