Computer Vision SystemsImage Preprocessing (Augmentation, Normalization)Hard⏱️ ~3 min

Common Preprocessing Failure Modes in Production

Preprocessing bugs in production often manifest as silent accuracy drops or catastrophic failures that are hard to debug because the error is upstream of the model. Understanding common failure modes helps catch issues early and design robust pipelines. Label corruption from geometric transforms is a frequent pitfall. When rotating or cropping images for object detection or segmentation, labels must be transformed with the same parameters and coordinate system. Off by one errors, incorrect interpolation modes, or forgetting to clip out of bounds boxes can corrupt training targets. A detection model trained with misaligned bounding boxes can drop mean Average Precision (MAP) by 10 to 20 points, and the error is not visible until validation metrics collapse. Always visualize a sampled batch with overlaid labels after augmentation to catch misalignments. Normalization mismatch between training and inference is another silent killer. Training with ImageNet statistics but deploying with per image scaling, forgetting to divide by 255, or swapping Red Green Blue (RGB) to Blue Green Red (BGR) channel order can drop top 1 accuracy by 2 to 10 percentage points. Quantization adds another layer of complexity; if training normalizes to [-1, 1] but inference quantization assumes [0, 255] input, the deployed model sees completely wrong distributions. Bake normalization into the model artifact and export it with the weights to guarantee consistency. Over augmentation can erase signal, especially for minority classes or small objects. Aggressive crops, heavy cutout, or extreme color jitter on datasets with small salient regions may improve validation loss due to regularization but hurt production performance on clean data. Monitor per class metrics; if recall for rare classes drops while overall accuracy improves, augmentation is likely too strong. Temporal inconsistency in video tasks is related; applying random photometric transforms independently to each frame breaks coherence and hurts tracking models. Use consistent augmentation across clip windows. Storage and IO failures manifest as step time variance and GPU starvation. Reading millions of small files from distributed storage induces metadata storms, with open and seek latency spiking during job startup. Symptoms include GPU utilization dropping from 95% to 40% intermittently. Sharding into large sequential files and warming up caches are essential mitigations.
💡 Key Takeaways
Label corruption from geometric transforms drops detection mean Average Precision (MAP) by 10 to 20 points; always visualize augmented labels to catch coordinate misalignments
Normalization mismatch causes 2 to 10 percentage point accuracy drops; channel order swaps RGB to BGR, scale errors like forgetting divide by 255, or per image versus dataset stats
Over augmentation hurts minority class recall even when validation loss improves; heavy cutout on small object detection can make rare classes invisible
Temporal inconsistency in video models: independent frame augmentation breaks tracking; use consistent transforms across 8 to 16 frame clips
Small file metadata storms collapse throughput; millions of images on distributed storage cause seek latency spikes and GPU utilization drops from 95% to 40%
📌 Examples
NVIDIA quantized model bug: training normalized to [-1, 1], inference assumed [0, 255] input, deployed model saw 100x wrong scale, accuracy dropped from 76% to 58%
Google detection pipeline: visualizes 100 random batches with bounding boxes overlaid post augmentation, caught rotation bug where boxes were not transformed
Tesla video model: applies same brightness and contrast jitter across 10 frame clips to maintain temporal coherence for motion prediction
← Back to Image Preprocessing (Augmentation, Normalization) Overview