Feature Engineering & Feature StoresFeature Monitoring (Drift, Missing Values, Outliers)Medium⏱️ ~3 min

Streaming vs Batch Monitoring: Latency, Cost, and Complexity Tradeoffs

The Architecture Spectrum

Monitoring architecture sits on a spectrum between streaming (real time aggregation) and batch (periodic computation). The choice affects detection latency, infrastructure cost, and operational complexity. Most production systems use a combination tuned to feature criticality and freshness requirements.

Streaming Monitoring

Emits a lightweight event for every inference, maintaining rolling window aggregates in memory with 1 to 5 minute detection latency. Requires always on infrastructure (Flink, Kafka Streams, or custom services) that processes every request. Cost scales with request volume: at 10,000 QPS, processing 864 million events per day demands significant compute and storage. Best suited for high value, latency sensitive features where catching drift within minutes justifies the cost.

Batch Monitoring

Periodically processes logs (hourly or daily), computing feature statistics from cold storage with detection latency ranging from 1 to 24 hours. Dramatically cheaper since compute runs only during batch windows and storage uses commodity object stores. For features where hourly detection suffices, batch monitoring costs 10 to 50x less than streaming equivalents. Most features at most companies can tolerate batch monitoring.

Tiered Strategy

Implement tiered monitoring where critical features (top 10 by importance, fraud signals, safety features) use streaming with 5 minute detection. Important features (top 50) use hourly batch. Remaining features use daily batch. This stratification optimizes cost while maintaining rapid detection for high impact signals.

Log Sampling Trade-offs

For extremely high volume systems, sample logs for batch monitoring (1 to 10 percent sample rates). Statistical significance degrades: detecting 5 percent drift requires larger samples than detecting 20 percent drift. Adaptive sampling over samples rare events (errors, outliers) while under sampling common patterns.

💡 Key Takeaways
Streaming monitoring achieves 1 to 5 minute detection latency by maintaining in memory rolling aggregates; fraud system at 5k TPS detects issues within 60 to 180 seconds, limiting exposure to 300k to 900k transactions versus 18M with hourly batch
CPU overhead for streaming: 1 to 2% of serving resources at 1:10 sampling; 5k TPS with 150 features requires two 8 vCPU aggregator instances to handle 75k updates per second with headroom for approximate algorithms (t-digest, HyperLogLog)
Batch monitoring processes logs periodically (hourly to daily), zero marginal serving cost, suitable for low risk workloads where 12 to 24 hour detection latency is acceptable (weekly retrained models, low volume services)
Sampling strategies: start with 1:10 for high QPS (reduces CPU by 90%), validate drift metric stability versus full stream; low traffic segments may need 1:1 sampling or extended windows to reach minimum 5k events per window
Hybrid pattern: stream critical path features (identity, location, price) with 5 minute windows and SPC alerts, batch monitor secondary features with daily windows for trend analysis and correlation studies
Storage tradeoff: streaming retains only rolling state (roughly 30 MB per model), batch requires sampled logs (0.1 to 1% of traffic) with 30 to 90 day retention for forensics, adding terabytes for high throughput systems
📌 Interview Tips
1Uber Michelangelo: streaming for critical path features (user location, request time) with 5 min windows, batch for secondary features and daily trend dashboards; segmented by city to localize drift before global rollout
2Netflix ranking: streaming monitors prediction distribution and acceptance rate (feedback loop risk, needs fast detection), batch monitoring for detailed per feature drift analysis across thousands of features, informing weekly retrain decisions
3Airbnb pricing model: batch daily monitoring sufficient for 24 to 72 hour label delays; streaming overlay for prediction drift and approval rate as early warning, triggering investigation before labels arrive
← Back to Feature Monitoring (Drift, Missing Values, Outliers) Overview