Learn→Computer Vision Systems→Data Augmentation (AutoAugment, Mixup, Synthetic Data)→1 of 6

Computer Vision Systems • Data Augmentation (AutoAugment, Mixup, Synthetic Data)Easy⏱️ ~2 min

What is Data Augmentation in Computer Vision?

Definition
Data augmentation artificially expands your training dataset by applying transformations (rotation, cropping, color changes) to existing images. This teaches the model that a cat rotated 15 degrees is still a cat, improving generalization without collecting more real data.
WHY AUGMENTATION MATTERS
Deep neural networks have millions of parameters and easily memorize training data. Without augmentation, a model trained on 10,000 images might achieve 99% training accuracy but only 70% on new images. Augmentation forces the model to learn invariant features ("cat" = cat regardless of position, lighting, or angle) rather than memorizing specific pixel patterns.
COMMON TRANSFORMATIONS
Geometric: Random crops (224x224 from 256x256), horizontal flips (50% probability), rotations (±15 degrees), scaling (0.8-1.2x).
Photometric: Brightness adjustment (±0.4), contrast changes, saturation shifts, Gaussian blur.
Regularization: Cutout (mask random patches), Mixup (blend two images), CutMix (paste patch from one image onto another).
💡 Key Insight: Augmentation is essentially free data. Adding random crops and flips typically improves accuracy 1-3 percentage points with negligible training cost increase.
PERFORMANCE REQUIREMENTS
Online augmentation (during training) must not become a bottleneck. Target: 2,000-3,000 images per second to saturate 8 GPUs. Budget: 1-2 milliseconds per image for all transforms combined. Allocate 4-8 CPU cores per GPU to avoid data loading bottlenecks.

💡 Key Takeaways

✓Augmentation creates virtual training data by applying transformations, improving generalization without collecting more real data

✓Common transforms: random crops, horizontal flips, rotation (±15°), brightness/contrast changes, and regularization techniques

✓Throughput target: 2,000-3,000 images/second for 8 GPU training; budget 1-2ms per image for all transforms

✓Typical accuracy improvement: 1-3 percentage points on image classification benchmarks

📌 Interview Tips

1When explaining data augmentation, cover both geometric transforms (crops, flips, rotations) and photometric transforms (brightness, contrast, color)

2Mention the throughput requirement: 4-8 CPU cores per GPU to avoid becoming a data loading bottleneck

3Emphasize that augmentation is essentially free data, providing accuracy gains with minimal compute overhead

← Back to Data Augmentation (AutoAugment, Mixup, Synthetic Data) Overview