What is Data Augmentation in Computer Vision?
WHY AUGMENTATION MATTERS
Deep neural networks have millions of parameters and easily memorize training data. Without augmentation, a model trained on 10,000 images might achieve 99% training accuracy but only 70% on new images. Augmentation forces the model to learn invariant features ("cat" = cat regardless of position, lighting, or angle) rather than memorizing specific pixel patterns.
COMMON TRANSFORMATIONS
Geometric: Random crops (224x224 from 256x256), horizontal flips (50% probability), rotations (±15 degrees), scaling (0.8-1.2x).
Photometric: Brightness adjustment (±0.4), contrast changes, saturation shifts, Gaussian blur.
Regularization: Cutout (mask random patches), Mixup (blend two images), CutMix (paste patch from one image onto another).
PERFORMANCE REQUIREMENTS
Online augmentation (during training) must not become a bottleneck. Target: 2,000-3,000 images per second to saturate 8 GPUs. Budget: 1-2 milliseconds per image for all transforms combined. Allocate 4-8 CPU cores per GPU to avoid data loading bottlenecks.