Computer Vision SystemsReal-time Video ProcessingMedium⏱️ ~2 min

Real-Time Video Processing Pipeline Architecture

Definition
Real-time Video Processing is a pipeline that analyzes video streams as frames arrive, producing results within strict latency budgets. Unlike batch processing, each frame must be decoded, analyzed, and actioned before the next frame arrives (typically 33ms at 30 FPS).

Pipeline Architecture

A real-time video pipeline consists of three stages running in parallel: decode, analyze, and act. Each stage operates on different frames simultaneously to maintain throughput.

Decode stage: Receive compressed video stream, decompress frames, convert to tensor format. This is CPU-intensive. A single core handles 100-200 frames per second for 1080p H.264 video.

Analyze stage: Run ML models on decoded frames. Detection, classification, segmentation, tracking. This is GPU-intensive. Batching multiple frames improves throughput.

Act stage: Process model outputs, trigger alerts, update dashboards, store results. This stage must not block the pipeline even when downstream systems are slow.

Latency Budget

At 30 FPS, a new frame arrives every 33ms. Your entire pipeline must complete within this budget to process every frame. If total latency exceeds 33ms, you must either skip frames or accept growing queues.

Typical budget allocation: Decode 5-10ms, Model inference 15-25ms, Post-processing 2-5ms. This leaves 3-10ms of slack for variability.

💡 Key Takeaways
Three parallel stages: decode (CPU), analyze (GPU), act (output) - each operates on different frames
At 30 FPS, total pipeline latency must stay under 33ms to process every frame
Typical budget: decode 5-10ms, inference 15-25ms, post-processing 2-5ms, leaving 3-10ms slack
If latency exceeds frame interval, must skip frames or accept queue buildup
📌 Interview Tips
1Interview Tip: Break down latency budget by stage - shows understanding of where bottlenecks occur
2Interview Tip: Mention pipelining - different stages process different frames simultaneously to maximize throughput
← Back to Real-time Video Processing Overview