Computer Vision SystemsReal-time Video ProcessingHard⏱️ ~3 min

GPU Inference Scheduling and Batching Strategies

Dynamic Batching for Video

GPUs process batches more efficiently than individual frames. A batch of 8 frames takes 20ms; 8 individual frames take 80ms. But batching adds latency since frames wait for the batch to fill.

Batch formation strategies: Wait for N frames (fixed batch size) or wait T milliseconds (timeout). Fixed size maximizes throughput but adds variable latency. Timeout caps latency but produces variable batch sizes.

Multi-Stream Batching

When processing multiple cameras, batch frames from different streams together. Camera A and Camera B each contribute 4 frames to an 8-frame batch. Both streams benefit from GPU efficiency without either waiting too long.

Stream prioritization: Some cameras matter more than others. Entrance cameras get priority over parking lot cameras. Priority affects batch ordering and timeout handling.

GPU Memory Management

Pre-allocation: Allocate GPU memory at startup. Avoid runtime allocation that causes fragmentation and unpredictable latency.

Double buffering: While GPU processes batch N, CPU prepares batch N+1. Hides preprocessing latency by overlapping CPU and GPU work.

Throughput vs Latency Tuning

Maximum throughput: Large batches (16-32 frames), long timeouts (50-100ms). Use for offline analysis or low-priority streams.

Minimum latency: Small batches (4-8 frames), short timeouts (10-20ms). Use for real-time alerts or safety-critical applications.

⚠️ Key Trade-off: Larger batches increase throughput but add latency. The optimal batch size depends on your latency requirements, not maximum GPU utilization.
💡 Key Takeaways
Batching 8 frames takes 20ms vs 80ms for 8 individual frames - 4x efficiency improvement
Multi-stream batching combines frames from different cameras for better GPU utilization
Pre-allocation and double buffering hide memory and preprocessing latency
Large batches (16-32) for throughput; small batches (4-8) for latency-sensitive applications
📌 Interview Tips
1Interview Tip: Explain timeout-based batching to cap latency while still gaining batching benefits
2Interview Tip: Mention double buffering - CPU prepares batch N+1 while GPU processes batch N
← Back to Real-time Video Processing Overview