Definition
Batch processing collects data over a period (hours or days) and processes it in bulk. Stream processing consumes events continuously as they arrive, targeting sub second to sub minute latencies.
At large scale, you face a fundamental problem: turning a continuous firehose of events into reliable insights and actions. Think clickstreams, payments, app events, logs, and metrics generating anywhere from 10,000 to multiple million events per second.
The Core Tension: Not all consumers need the same latency. Fraud detection might need decisions in under 100 milliseconds. Finance teams might only need daily aggregates by morning. This mismatch drives the batch versus stream trade off.
How Batch Processing Works: Data accumulates in storage for a period, maybe an hour or a full day. Then a scheduled job processes everything in bulk. You might scan 500 GB of events, perform complex joins across multiple datasets, aggregate results, and write outputs to a data warehouse. Typical end to end latencies range from 30 minutes to 24 hours.
The appeal is simplicity. You see the complete dataset. Late arriving events are already there. You can sort, filter, and reprocess if bugs appear. Because work is scheduled, you spin up large clusters only when needed, then scale to zero. This makes batch cheaper per terabyte processed.
How Stream Processing Works: Events flow continuously from sources like mobile apps or backend services into a durable message bus. Stream processors read near the head of this log, applying transformations to individual events or small time windows. They maintain state in memory or fast storage for aggregates like counts per user or sliding activity windows.
Stream processing enables immediate reactions: blocking fraudulent transactions, updating real time recommendation feeds, alerting on call engineers. The cost is higher complexity, constant resource usage, and careful handling of ordering issues and duplicate events.
✓Batch processing collects data over hours or days, then processes in bulk with typical latencies from 30 minutes to 24 hours
✓Stream processing handles events continuously as they arrive, targeting sub second to 1 minute end to end latency
✓Batch optimizes for completeness and cost: you see all data, handle late arrivals naturally, and scale clusters to zero between jobs
✓Stream enables immediate reactions for fraud detection, real time alerts, and operational decisions but requires constant resources
✓At scale (500k to 5M events per second), most systems use both: streaming for low latency needs, batch for historical analysis and source of truth
1Fraud detection at a payment provider targets p50 latency under 100ms from event creation to blocking decision using stream processing
2Finance teams run daily batch jobs processing 500 GB of transaction data overnight, completing in 1 to 6 hours for regulatory reporting
3Real time recommendation feeds at Netflix use stream processing with 5 to 60 second lag, while model training uses batch over months of history