Message Queues & Streaming • Kafka/Event Streaming ArchitectureMedium⏱️ ~3 min
Event Streaming Architecture: Append Only Logs and Partitioned Parallelism
Event streaming centers on an immutable, append only log where producers write events to named topics and consumers read sequentially using self managed offsets. The critical architectural insight is partitioning: topics are sharded into partitions to scale throughput linearly, but ordering is guaranteed only within a single partition, never across an entire topic. This design choice trades global ordering for massive parallelism.
Consumers control their read position via offsets, enabling powerful replay and backfill capabilities. If you discover a bug in your analytics pipeline from three days ago, you can reset your consumer offset and reprocess millions of events. This time decoupling means producers and consumers evolve independently. A new analytics team can start consuming historical data without impacting existing consumers or requiring producer changes.
Production scale numbers are impressive. Modern brokers with NVMe storage and 25 to 100 Gbps networking sustain 200 to 800 MB/s per broker with 1 to 10 KB messages. LinkedIn processes trillions of messages daily across many clusters. Netflix ingests hundreds of billions of events per day with sub second to low seconds end to end latency for personalization and A/B testing. The key enablers are batching, sequential disk input/output, and heavy use of operating system page cache rather than application managed memory.
The tradeoff is operational complexity. You must carefully plan partition counts (typically 2 to 3 times max consumer concurrency), choose partition keys that balance load while preserving locality, and monitor metrics like in sync replica (ISR) health and consumer lag distributions. Poor key selection creates hotspots where a single partition becomes a bottleneck, spiking tail latencies from milliseconds to seconds.
💡 Key Takeaways
•Partitions enable linear scaling: each partition is an independent ordered log. Clusters commonly run tens to hundreds of thousands of partitions with practical limits around 4,000 to 10,000 partitions per broker due to metadata overhead.
•Offset management enables replay: consumers store their position and can rewind to any historical offset within the retention window (hours to weeks). This supports backfill, A/B test analysis on historical data, and disaster recovery.
•Throughput is hardware bound, not architecture bound: 200 to 800 MB/s per broker sustained with modern NVMe and networking. Batching reduces per message overhead from microseconds to amortized nanoseconds.
•Consumer groups divide partitions among members: each partition is assigned to exactly one consumer in a group, preserving order while scaling reads horizontally. Adding a 10th consumer helps only if you have at least 10 partitions.
•Retention policies decouple storage from processing speed: time based retention (delete after 7 days) suits event transport; log compaction (keep only latest value per key) suits event sourced state, maintaining complete entity history in bounded space.
📌 Examples
LinkedIn partitions activity stream events by user_id to guarantee per user ordering for feed generation while parallelizing across millions of users. With 10,000 partitions and 100 consumers per group, each consumer handles 100 partitions.
Netflix uses high partition counts (thousands per topic) to distribute personalization events. Each viewer interaction is keyed by viewer_id, ensuring all events for a viewer land in the same partition for correct session analysis.
Uber built uReplicator to handle tens of thousands of topics across regions. Each topic's partitions replicate independently with per topic throttling, isolating failures and preventing a single hot topic from saturating cross datacenter bandwidth.