Mixup: Linear Interpolation for Regularization
WHAT MIXUP DOES
Mixup blends pairs of training images and their labels using linear interpolation. If image A is a cat (label [1,0]) and image B is a dog (label [0,1]), with mixing coefficient λ=0.7, the blended image is 0.7×A + 0.3×B and the label becomes [0.7, 0.3]. This soft labeling acts as a regularizer, preventing the model from being overconfident.
THE MIXING COEFFICIENT
Sample λ from a Beta distribution: λ ~ Beta(α, α). The hyperparameter α controls mixing strength. α=0.2 produces λ values concentrated near 0 or 1 (most images are nearly unmixed). α=0.4 allows more blending. Higher α (>0.5) creates stronger regularization but risks underfitting. Start with α=0.2-0.4 for most classification tasks.
CALIBRATION BENEFITS
Beyond accuracy, Mixup improves model calibration. Standard training produces overconfident predictions: the model outputs 0.99 probability for classes it only gets right 85% of the time. Mixup reduces Expected Calibration Error (ECE) by 2-5 percentage points, making confidence scores more reliable for downstream decision systems.
COMPUTATIONAL OVERHEAD
Mixup adds negligible overhead: less than 0.5 milliseconds per image for blending on CPU. The operation is a simple weighted average of pixel values. With proper pipelining, GPU utilization is unaffected. Typical accuracy improvement: 0.5-2 percentage points on top-1 accuracy for models without heavy baseline regularization.