Computer Vision SystemsMulti-task LearningHard⏱️ ~2 min

Loss Balancing and Gradient Interference

The Loss Balancing Problem

Each task has its own loss function. Classification uses cross-entropy. Regression uses mean squared error. Detection uses a combination of localization and classification losses. These losses have different scales and gradients.

The problem: If detection loss is 100x larger than classification loss, the model optimizes almost entirely for detection. Classification performance suffers because its gradients get overwhelmed.

Manual Loss Weighting

The simplest approach: multiply each loss by a weight. Total loss = w1 × loss1 + w2 × loss2 + w3 × loss3. Tune weights manually until all tasks perform acceptably.

Practical approach: Start with weights that normalize loss magnitudes. If one loss averages 10 and another averages 0.1, use weights of 0.01 and 1.0 respectively. Then adjust based on validation performance.

Gradient Interference

Even with balanced losses, task gradients can conflict. Task A wants to increase a weight; Task B wants to decrease it. The net gradient is small, but both tasks suffer. This is called destructive interference.

Detection: Monitor individual task losses during training. If one task improves while another degrades, gradient interference is likely occurring in shared layers.

Mitigation: Gradient surgery techniques modify conflicting gradients before applying them. Project each task gradient to remove components that conflict with other tasks. This preserves beneficial updates while eliminating destructive ones.

Dynamic Loss Weighting

Instead of fixed weights, adjust weights during training based on task difficulty or progress. Tasks that are learning slowly get higher weights; tasks that have converged get lower weights. This keeps all tasks improving throughout training.

💡 Key Takeaways
Different loss scales cause imbalanced optimization - larger losses dominate gradient updates
Manual weighting normalizes loss magnitudes: if losses differ 100x, weights should differ 100x inversely
Gradient interference: conflicting task gradients cancel out, harming both tasks despite balanced losses
Dynamic weighting adjusts task importance during training based on learning progress
📌 Interview Tips
1Interview Tip: Explain loss balancing as a practical first step - normalize magnitudes, then tune on validation
2Interview Tip: Mention gradient interference as a deeper issue that loss balancing alone cannot solve
← Back to Multi-task Learning Overview
Loss Balancing and Gradient Interference | Multi-task Learning - System Overflow