Batch vs Streaming: Latency, Cost, and Operational Complexity

Batch processing accumulates data over windows of minutes to hours, runs transformations on entire datasets, then writes results atomically. Streaming processing reacts to events as they arrive, maintaining incremental state and emitting results continuously. The choice hinges on latency requirements, operational cost, and tolerance for complexity.

Streaming delivers sub-second to minute latency and enables fine-grained updates for operational dashboards, anomaly detection, and low-latency Machine Learning (ML) features. However, achieving exactly-once semantics with idempotent sinks and watermark management adds operational burden. Out-of-order events require careful windowing: if p99 event-time lateness is 12 minutes but spikes to 30 minutes during peak traffic, a 15 minute watermark will drop tail data unless you add a late-data sweeper job to backfill. Streaming compute is also costlier: maintaining stateful workers and handling rebalancing requires persistent infrastructure.

Batch offers simplicity and high throughput at low cost. Set-based operations over entire partitions are easy to reason about and retry. Amazon retail teams run large batch jobs throttled to 100 to 200 megabytes per second per table to avoid overwhelming downstream systems while meeting hourly or daily Service Level Agreements (SLAs). Batch is ideal for heavy dimensional modeling, large joins, and backfills. The tradeoff is latency: batch windows of 1 to 24 hours are common.

In practice, hybrid architectures dominate. LinkedIn publicly reports multi-million messages per second in streaming for real-time metrics, while batch jobs curate fact tables overnight. Amazon uses partitioned streams provisioned for 1 megabyte per second writes and 2 megabytes per second reads per partition; for 100,000 events per second at 1 kilobyte each, teams provision around 100 write partitions with autoscaling headroom for 2 to 5 times peak load. Micro-batch compaction runs every 5 to 10 minutes, landing curated data with p95 end-to-end latency of 2 to 10 minutes.

💡 Key Takeaways

✓Streaming enables sub-second to minute latency for operational use cases but requires stateful infrastructure and careful watermark tuning for out-of-order events.

✓Batch provides simplicity and low cost for high-throughput jobs with hourly to daily SLAs, ideal for dimensional modeling and large joins.

✓LinkedIn scale reference: multi-million messages per second sustained, trillions per day. Amazon provisions around 100 stream partitions for 100,000 events per second with 2 to 5 times autoscaling headroom.

✓Watermark tuning is critical: p99 lateness of 12 minutes with 30 minute spikes requires a 15 minute base watermark plus daily late-data sweeper jobs to avoid dropping tail events.

✓Hybrid architectures dominate: streams feed real-time metrics, batch curates fact tables overnight. Amazon micro-batch compaction achieves p95 end-to-end latency of 2 to 10 minutes for curated tables.

📌 Interview Tips

1Amazon retail backfills: throttle batch jobs to 100 to 200 MB/s per table to avoid overwhelming downstream systems during peak promotional traffic spikes of 5 to 10 times normal load.

2Micro-batch calculation: 200k events/s × 1 KB × 120 seconds = 24 GB raw per 2 minute window. After 8x columnar compression, write ~3 GB per window. Downstream must sustain >300 MB/s aggregate write throughput to keep up.

← Back to ETL Pipelines & Data Integration Overview