Batch vs Real-time: Making the Choice
The Fundamental Trade-off
This is not about "better" or "worse." It is about marginal value of freshness versus marginal cost and operational complexity. Every second of reduced staleness has a cost. Every nine of availability in your Service Level Agreement (SLA) costs more.
Decision Framework: Four Questions
First, what is acceptable freshness? If churn prediction for next month can use yesterday's model, batch wins. If payment fraud needs current transaction context, real-time is required. Second, what is the per interaction value? Low value, high volume interactions (email recommendations, content feeds) favor batch. High value interactions (fraud gating, ad auctions where milliseconds equal dollars) justify real-time cost. Third, what is your read to write ratio? Systems that are write heavy (over 80% writes) like event logs should minimize online compute. Systems that are read heavy (over 99% reads) like user profiles can afford online enrichment. Fourth, can you decompose the problem? Most production systems do. Compute expensive embeddings and candidate sets offline. Do lightweight re-ranking and contextualization online.
The Hybrid Pattern: Best of Both
Netflix style recommendations illustrate this perfectly. Offline batch computes top 1000 candidate videos per user daily using heavy models and collaborative filtering. This runs for hours using massive clusters. Online service reads the precomputed candidates (one Redis lookup, under 5ms), applies real-time filters (recently watched, device type, current session), and re-ranks with a lightweight model in under 100 milliseconds. Total cost: batch runs once daily, online only pays for fast lookups and light models. Freshness: candidates refresh daily, contextualization is real-time. This is the pattern at YouTube, Pinterest, LinkedIn feeds: heavy lifting offline, last mile online.
Cost Reality Check
Real-time serving can cost 5x to 20x more than batch for the same number of predictions. Why? You pay for peak capacity 24/7, not just the hours you are computing. Warm pools to avoid cold start penalties. Redundancy for availability. Networking and orchestration overhead. Batch scales to zero. Spin up 10,000 cores for 2 hours, process 1 billion predictions, pay for 20,000 core hours, done. Real-time serving 1 billion predictions at 10,000 per second takes 100,000 seconds (28 hours), but you must provision for peak QPS and keep capacity running continuously.
When Hybrid Breaks Down
Hybrid assumes stable batch components and volatile online context. This fails when the stable part becomes volatile. Example: news recommendation during breaking events. Precomputed candidates from this morning miss the story everyone wants now. You need either very frequent batch refreshes (every 15 minutes, expensive) or shift more logic online (complex). Another failure: version skew. Online ranker expects feature schema version N+1 while batch produced N. Predictions become garbage. Mitigation: enforce version pinning and atomic rollouts.