Loss Balancing and Gradient Interference

The Loss Balancing Problem
Each task has its own loss function. Classification uses cross-entropy. Regression uses mean squared error. Detection uses a combination of localization and classification losses. These losses have different scales and gradients.
The problem: If detection loss is 100x larger than classification loss, the model optimizes almost entirely for detection. Classification performance suffers because its gradients get overwhelmed.
Manual Loss Weighting
The simplest approach: multiply each loss by a weight. Total loss = w1 × loss1 + w2 × loss2 + w3 × loss3. Tune weights manually until all tasks perform acceptably.
Practical approach: Start with weights that normalize loss magnitudes. If one loss averages 10 and another averages 0.1, use weights of 0.01 and 1.0 respectively. Then adjust based on validation performance.
Gradient Interference
Even with balanced losses, task gradients can conflict. Task A wants to increase a weight; Task B wants to decrease it. The net gradient is small, but both tasks suffer. This is called destructive interference.
Detection: Monitor individual task losses during training. If one task improves while another degrades, gradient interference is likely occurring in shared layers.
Mitigation: Gradient surgery techniques modify conflicting gradients before applying them. Project each task gradient to remove components that conflict with other tasks. This preserves beneficial updates while eliminating destructive ones.
Dynamic Loss Weighting
Instead of fixed weights, adjust weights during training based on task difficulty or progress. Tasks that are learning slowly get higher weights; tasks that have converged get lower weights. This keeps all tasks improving throughout training.

💡 Key Takeaways

✓Different loss scales cause imbalanced optimization - larger losses dominate gradient updates

✓Manual weighting normalizes loss magnitudes: if losses differ 100x, weights should differ 100x inversely

✓Gradient interference: conflicting task gradients cancel out, harming both tasks despite balanced losses

✓Dynamic weighting adjusts task importance during training based on learning progress

📌 Interview Tips

1Interview Tip: Explain loss balancing as a practical first step - normalize magnitudes, then tune on validation

2Interview Tip: Mention gradient interference as a deeper issue that loss balancing alone cannot solve

← Back to Multi-task Learning Overview