Kappa Architecture at Production Scale

Real World System Design

Consider a large e commerce platform with 50 million daily active users. The system generates 200,000 events per second on average and spikes to 1 million per second during flash sales. Every click, search query, cart action, and purchase flows into a central event log.

Multiple stream processing jobs run in parallel, each serving a different use case. One job maintains per user feature vectors for the recommendation engine, producing updates within 1 to 2 seconds of user action. Another aggregates transaction data per merchant and per card for fraud detection, targeting p99 latency under 2 seconds from event arrival to score availability. A third computes near real time business metrics like Gross Merchandise Value (GMV), conversion rate, and funnel analytics with p99 latency of 30 to 60 seconds.

Serving Layer Optimization

Each job writes to serving stores matched to its access pattern. User features go into a replicated key value store handling 1 to 5 millisecond p99 read latency at hundreds of thousands of queries per second. Business metrics land in a columnar store optimized for analytical queries scanning millions of rows. Fraud scores are pushed to a cache colocated with the transaction processing system for ultra low latency lookups.

These serving stores are treated as disposable. If you lose an entire cluster, you do not restore from backups. You rerun the Kappa pipeline from the event log to rebuild the views, accepting some catch up time. This mindset shift is central: the event log is the source of truth, and everything downstream is a derived cache that can be regenerated.

✓ In Practice: LinkedIn and Confluent have publicly described similar architectures where a durable log and stream processing power both real time systems and reprocessing, eliminating separate batch layers.
Reprocessing in Production

When the data science team deploys a new recommendation model needing different input features, they launch a new version of the streaming pipeline. This new job reads 90 days of clickstream and transaction events from offset zero in the event log. To avoid overwhelming the cluster, replay is throttled to 5 to 10 times current ingest rate.

During replay, new materialized views are built in parallel to existing ones, often in separate database tables or namespaces. The original job continues serving live traffic with zero disruption. Once the new job catches up to the tail and validation passes (comparing sample outputs, checking metric deltas), an orchestration system switches traffic to the new views. The entire process might take 1 to 3 days for 90 days of replay at 5x speed, but production users see no downtime.

Typical Latency Targets
1-2 sec
USER FEATURES
2 sec
FRAUD P99
30-60 sec
METRICS P99
Operational Concerns

Monitoring focuses on consumer lag (how far behind the tail each job is), state store size, checkpoint duration, and processing latency distributions. Schema evolution is managed with a registry to ensure producers and consumers can evolve independently. When scale increases 10x, teams scale out by adding partitions to the log, increasing consumer instances to match partitions, and horizontally scaling serving stores while keeping latency within Service Level Objectives (SLOs).

💡 Key Takeaways

✓Production systems handle 200K to 1M events/sec with multiple parallel jobs serving recommendations (1-2 sec latency), fraud detection (2 sec p99), and analytics (30-60 sec p99)

✓Serving stores are treated as disposable: if lost, rebuild by replaying event log rather than restoring from backups

✓Reprocessing runs at 5 to 10x real time speed with new jobs writing to parallel stores, switching traffic only after catching up and validation

✓Operational scaling increases log partitions, consumer instances to match partitions, and serving store capacity while maintaining latency SLOs

📌 Interview Tips

1E commerce with 50M daily users: streaming job replays 90 days of events at 5x speed (1-3 days total) to rebuild user profiles with new model, zero downtime for production.

2Fraud system writes scores to colocated cache with 1-5ms p99 read latency at hundreds of thousands QPS, rebuilt from event log if cache cluster fails.

← Back to Kappa Architecture Pattern Overview