How Kappa Architecture Works: Event Log and Stream Processing

The Foundation: Append Only Event Log

At the heart of Kappa is a partitioned, replicated log system that stores events in order and never modifies them. Think of a Kafka cluster with topics configured for 90 day retention. Every upstream system writes events like user_clicked, order_placed, or payment_authorized. These events are immutable facts about what happened.

The log is partitioned for parallelism. For example, partition by user_id to ensure all events for a user land in the same partition, preserving order. A system handling 500,000 events per second at 1 kilobyte per event generates roughly 43 gigabytes per day. With 90 day retention, that is 3.9 petabytes of raw data before compression. Even with 3 to 4 times compression, you are storing over a petabyte.

Data Volume at Scale
500K
EVENTS/SEC
3.9 PB
90 DAY RAW
Stream Processing Layer

Long lived streaming jobs consume from these topics. Each job defines a topology: filters, joins, aggregations, and enrichments. For stateful operations like counting or windowing, the processor maintains local state that is periodically snapshotted. Many systems also write state changes to a changelog topic, which is itself part of the event log. This dual mechanism (snapshot plus changelog) allows fast recovery without replaying the entire input history.

A job might maintain per user feature vectors for recommendations. When a user clicks, the job updates their vector and writes it to a low latency key value store. The vector becomes a materialized view: a derived, query optimized representation built from the raw events.

Real Time and Reprocessing with One Code Path

For real time processing, jobs read from the tail of the log, consuming new events as they arrive with latencies under 2 seconds. When business logic changes, you deploy a new version that starts from the earliest offset. The same transformation code now processes months of historical events, often at 5 to 10 times real time rate to catch up quickly.

During replay, the new job writes to separate output topics or separate tables in the serving layer. The old job continues serving production traffic. Once the new job catches up to real time and passes validation (comparing sample outputs, checking metrics), traffic switches over. This eliminates the Lambda Architecture problem of maintaining two code bases with different semantics.

💡 Key Takeaways

✓Event log is partitioned (e.g., by user_id) and replicated with long retention, typically 30 to 180 days, storing petabytes of immutable events

✓Stream processors maintain state using snapshots plus changelog topics, enabling fast recovery without full replay

✓Real time processing reads from tail with sub 2 second latency while reprocessing reads from beginning at 3 to 10x real time speed

✓New job versions build materialized views in parallel during replay, switching traffic only after catching up and validating correctness

📌 Interview Tips

150 million daily active users generating 200K events/sec on average, 1M/sec during peak. Stream job updates user features within 1-2 seconds for recommendations.

2Financial fraud detection consumes transaction events with p50 latency under 500ms, p99 under 2 seconds from event ingestion to score availability in serving cache.

← Back to Kappa Architecture Pattern Overview