When to Choose Lambda vs Alternatives: Architecture Trade-offs

The Core Trade-off:

Lambda Architecture trades operational simplicity for the ability to get both low latency and recomputational correctness. You maintain two separate systems that must implement the same business logic. This doubles testing surface area, increases deployment complexity, and creates ongoing risk of logic divergence. The question is: when is this complexity worth it?

Lambda Architecture
Two code paths, high correctness, complex operations
vs
Kappa Architecture
One code path, stream replay, simpler to maintain
When Lambda Makes Sense:

First, data volume matters. If you process 50 TB per day and need to reprocess historical data frequently, streaming replay becomes expensive. Replaying six months of events at 200 thousand events per second takes 35 to 40 hours at full throughput. Batch processing the same data on a large cluster takes 4 to 8 hours. Lambda wins when reprocessing terabytes to petabytes of history.

Second, correctness requirements matter. Systems handling money (billing, payouts, financial reporting) or compliance audits need ironclad correctness. Being able to say "we can recompute any metric from scratch using immutable source data" is architecturally valuable. The batch layer provides this guarantee naturally. Pure streaming systems can achieve similar guarantees but require more sophisticated state management.

Third, real time requirements matter but have limits. If you need sub second p99 latency for complex aggregations over large state, Lambda's dedicated speed layer often performs better than trying to serve both historical and real time queries from a single streaming system. However, if your real time needs are simple counters or filtering, pure streaming suffices.

When to Choose Pure Batch:

If you don't need low latency (everything can wait 15 minutes to 2 hours), pure batch is dramatically simpler. One code path, one storage system, one operational model. This works for: nightly reporting, weekly ML model training, monthly financial closes, and regulatory reports with daily or weekly cadence.

Pure batch also makes sense when event arrival patterns are bursty and unpredictable. If 80 percent of your daily events arrive in a 2 hour window due to batch uploads from partners, streaming infrastructure sits mostly idle. Batch processing can spin up for the window and shut down, saving costs.

When to Choose Kappa (Pure Streaming):

Kappa Architecture uses a single streaming pipeline for both historical and real time processing. Historical queries are answered by replaying the stream from the appropriate offset. This eliminates dual code paths but requires:

First, efficient stream replay. Your log must retain sufficient history (weeks to months) and support fast replay. At 200 thousand events per second, replaying one week of data at 10x speed takes roughly 16 hours. This is acceptable for occasional reprocessing but not for frequent backfills.

Second, the streaming engine must handle both modes efficiently. Processing live events with sub second latency while also running backfill jobs that scan weeks of history creates resource contention. Some modern streaming platforms handle this well with separate consumer groups and rate limiting.

"Choose Lambda when you need both ironclad correctness and sub second freshness at tens of terabytes per day. Choose Kappa when you can accept slightly relaxed replay SLAs to gain operational simplicity."
Decision Framework:

Ask these questions: What's your daily data volume? Over 10 TB per day favors Lambda or pure batch. Under 1 TB per day, Kappa is viable. How often do you reprocess historical data? Weekly or monthly reprocessing favors Lambda. Rare reprocessing (quarterly or for bug fixes) favors Kappa. What's your correctness requirement? Financial systems favor Lambda's explicit batch layer. Analytics and ML often work with Kappa. What's your latency requirement? Sub 5 second p99 with complex state favors Lambda. Relaxed latency or simple aggregations work with pure batch or Kappa.

A concrete example: a payment processing system with 30 TB daily volume, monthly financial reconciliation, and fraud detection requiring 2 second p99 latency clearly needs Lambda. A product analytics system with 2 TB daily volume, quarterly model retraining, and 1 minute acceptable latency could use Kappa or even pure batch.

💡 Key Takeaways

✓Lambda makes sense when daily data volume exceeds 10 TB and you need both sub 5 second latency and frequent reprocessing (weekly or monthly) with high correctness guarantees

✓Pure batch is dramatically simpler (one code path) and sufficient when all use cases tolerate 15 minutes to 2 hours of latency, common for reporting and monthly analytics

✓Kappa (pure streaming with replay) works well under 1 TB per day with rare reprocessing, but replaying one week at 200k events/sec takes roughly 16 hours at 10x speed

✓The key decision factors are: daily data volume, reprocessing frequency, correctness requirements, and latency targets, not just "we need real time"

📌 Interview Tips

1A payment processor with 30 TB daily, monthly reconciliation, and 2 second fraud detection latency needs Lambda: batch for correctness, speed for alerts

2A product analytics system with 2 TB daily, quarterly model retraining, and 1 minute acceptable latency can use Kappa: stream replay for occasional backfills, live stream for dashboards

3A financial reporting system with 5 TB daily but all queries tolerate 2 hour latency uses pure batch: nightly jobs on data lake, no streaming infrastructure needed

← Back to Lambda Architecture Pattern Overview