Micro-batching Architecture in Production Systems

The Full System View:
In production, micro-batching sits between raw event ingestion and serving layers, orchestrating a pipeline that transforms streaming data into actionable insights. Consider a large scale ad platform processing hundreds of thousands of events per second.

Mobile and web clients emit impression and click events to a message broker like Kafka, which buffers several minutes of traffic across partitions. The micro-batch engine periodically pulls events from Kafka, say every 5 seconds or every 50,000 messages, forming one micro-batch per partition. This dual trigger strategy (time or size, whichever comes first) prevents both excessive latency and oversized batches.

Processing Flow:
For each micro-batch, the engine applies a series of transformations. First, join events with campaign metadata from a key-value store. Second, compute per-campaign aggregates like click-through rates and conversion metrics. Third, update machine learning features for bidding models. Finally, write results to multiple sinks simultaneously.

These sinks serve different purposes. A real-time analytics store (like Druid or Pinot) powers dashboards showing campaign performance with 10 to 15 second freshness. A feature store serves online models that need updated user or campaign features. A columnar warehouse (like Parquet in S3 or Delta tables) archives data for later offline analysis.

Production Pipeline Metrics
200k
EVENTS/SEC
5s
BATCH INTERVAL
2-3s
PROCESSING TIME
Control Plane and Observability:
The system includes several supporting components. A scheduler orchestrates micro-batch jobs and manages resource allocation. A metadata layer stores source offsets, checkpoints, and schema versions for each batch. Monitoring tracks critical metrics: batch processing time, backlog size (number of queued batches), and per-batch failure rates.

If processing time approaches the batch interval, you have a problem. With a 5 second interval, if batches start taking 4.5 to 5 seconds, you're running at capacity. Any spike will create backlog. Autoscaling policies monitor this ratio and add executors when processing time exceeds 50 percent of the interval for several consecutive batches.

Real World Examples:
Netflix has documented using micro-batch pipelines for near real-time recommendations, recomputing candidate sets every few seconds rather than per event. Databricks promotes micro-batch streaming on Delta tables, where each micro-batch becomes an atomic transaction. Uber processes trip and driver events with single digit second latencies to compute surge multipliers and estimated time of arrival features.

✓ In Practice: Production systems aim to keep processing time at 40 to 60 percent of batch interval, providing headroom for traffic spikes while maintaining target SLO of under 30 seconds p99 end-to-end latency.

The goal is maintaining 99.9 percent availability via retries and replication, while keeping end-to-end p99 latency within Service Level Objective (SLO), typically under 30 seconds for analytics workloads and under 10 seconds for user-facing features.

💡 Key Takeaways

✓Production micro-batch systems process events from message brokers like Kafka using dual triggers: time based (every 5 seconds) or size based (every 50,000 events), whichever comes first

✓Each micro-batch writes to multiple sinks simultaneously: real-time analytics stores for dashboards, feature stores for ML models, and data warehouses for offline analysis

✓Autoscaling monitors the ratio of processing time to batch interval, adding capacity when processing time exceeds 50 to 60 percent of interval to prevent backlog buildup

✓Control planes track offsets, checkpoints, and schema versions while monitoring critical metrics like batch backlog size, processing time trends, and per-batch failure rates

📌 Interview Tips

1An ad platform pulls 1 million events per 5 second micro-batch from Kafka (200,000 events/sec), processes them in 2 to 3 seconds, and writes aggregated campaign metrics to Druid for dashboards, feature vectors to Redis for bidding models, and raw events to S3 Parquet for offline analysis

2A ride sharing service processes driver location and trip events every 3 seconds, computing surge pricing multipliers and updating them in a key-value store that serves the pricing API with 5 to 8 second end-to-end latency

3A recommendation pipeline recomputes user preference vectors every 10 seconds from clickstream data, writing updated features to a feature store that serves 50,000 requests per second from online models

← Back to Micro-batching Techniques Overview