Batch vs Stream Processing • Batch vs Stream Processing Trade-offsEasy⏱️ ~3 min
What is Batch vs Stream Processing?
Definition
Batch processing collects data over a period (hours or days) and processes it in bulk. Stream processing consumes events continuously as they arrive, targeting sub second to sub minute latencies.
Typical Latency Targets
100ms
STREAM P50
1-24 hrs
BATCH TYPICAL
💡 Key Takeaways
✓Batch processing collects data over hours or days, then processes in bulk with typical latencies from 30 minutes to 24 hours
✓Stream processing handles events continuously as they arrive, targeting sub second to 1 minute end to end latency
✓Batch optimizes for completeness and cost: you see all data, handle late arrivals naturally, and scale clusters to zero between jobs
✓Stream enables immediate reactions for fraud detection, real time alerts, and operational decisions but requires constant resources
✓At scale (500k to 5M events per second), most systems use both: streaming for low latency needs, batch for historical analysis and source of truth
📌 Interview Tips
1Fraud detection at a payment provider targets p50 latency under 100ms from event creation to blocking decision using stream processing
2Finance teams run daily batch jobs processing 500 GB of transaction data overnight, completing in 1 to 6 hours for regulatory reporting
3Real time recommendation feeds at Netflix use stream processing with 5 to 60 second lag, while model training uses batch over months of history