ML Model OptimizationBatch Size & Throughput TuningMedium⏱️ ~3 min

Batching in Data Pipelines: Producer and Consumer Patterns

In streaming and messaging systems, batching happens at both the producer and consumer sides to reduce network overhead and increase pipeline throughput. Producers accumulate messages before sending to the broker, and consumers prefetch messages in batches before processing. This pattern is critical in high volume machine learning data pipelines ingesting clickstreams, logs, or user interactions. Producers batch by both size and time. A typical configuration accumulates messages until reaching tens of kilobytes, say 16 KB to 64 KB, or until a wait interval like 5 milliseconds expires, whichever comes first. This adds up to 5 milliseconds of producer side latency but reduces network calls per second by a factor of 5 to 10, which lowers broker Central Processing Unit (CPU) usage and increases end to end throughput. The broker receives fewer larger writes instead of many tiny writes, which reduces protocol overhead and improves disk write efficiency. On the consumer side, stream processors prefetch ahead and hand batches of 500 to 1,000 messages to processing functions. This avoids per message deserialization and input/output (I/O) overhead. Concurrency should be tuned so that total in flight messages equals batch size multiplied by worker count. For example, 32 workers processing batches of 100 gives 3,200 messages in flight, which should fit within memory limits and visibility timeout windows. The real world impact is dramatic. One cloud function case moved from 1 message per invocation to 100 messages per invocation, reducing wall time for 1 million messages from 27.7 hours down to 1.4 hours. Cost dropped roughly 100 times because the platform bills per invocation and per compute time. This pattern applies across Apache Kafka consumers, Amazon Kinesis readers, and Google Cloud Pub/Sub subscribers.
💡 Key Takeaways
Producers batch by bytes (16 KB to 64 KB) and time (5 milliseconds), reducing broker network calls by 5 to 10 times at cost of 5ms added latency
Consumers prefetch batches of 500 to 1,000 messages to amortize deserialization, avoiding per message I/O and protocol overhead
Set concurrency as total in flight messages equals batch size times worker count, for example 32 workers at batch 100 gives 3,200 in flight
Cloud function case study: 100 messages per invocation reduced 1 million message processing from 27.7 hours to 1.4 hours with 100x lower cost
Ensure maximum batch processing time plus network variance fits within visibility timeout or lease duration to avoid lock expiration
Monitor batch fill ratio and flush frequency to detect under batching during low traffic periods
📌 Examples
Kafka producer at Netflix: Batches up to 32 KB with 10ms linger time. At 100,000 messages per second, reduces broker connections from 100,000 requests/sec to 10,000 requests/sec, cutting broker CPU by 60%.
Kinesis consumer at Uber: Prefetches 1,000 records per shard, processes in batches of 200. Feature transformation pipeline achieves 50,000 records per second per worker, up from 5,000 with single record processing.
Google Cloud Pub/Sub subscriber: 100 message batch for serverless function reduces invocations from 10 million to 100,000 for 1 billion messages, saving $15,000 in invocation costs per day.
← Back to Batch Size & Throughput Tuning Overview