Failure Modes: Negative Transfer and Data Drift
Negative Transfer
Negative transfer occurs when adding a task hurts performance on existing tasks. Instead of helping each other, tasks compete for shared capacity. The multi-task model performs worse than separate single-task models.
Why it happens: Tasks may require conflicting features. Texture classification benefits from high frequency details. Shape classification benefits from smoothed, abstract features. Forcing both through the same backbone creates a compromise that serves neither well.
Detection: Compare multi-task model performance against single-task baselines. If any task is 2%+ worse in the multi-task setting, negative transfer is occurring.
Mitigation: Increase backbone capacity. Use soft parameter sharing instead of hard sharing. Add task-specific layers earlier in the network. In severe cases, remove the conflicting task from the multi-task setup.
Uneven Data Drift
Production data changes over time. In multi-task settings, tasks may drift at different rates. User behavior changes affect click prediction immediately. Seasonal patterns affect image classification gradually.
The problem: When you retrain on new data, one task improves dramatically while another barely changes or even degrades. The optimal retraining frequency differs per task, but multi-task models must be retrained as a unit.
Mitigation: Monitor per-task accuracy drift independently. If tasks drift at very different rates, consider decoupling them into separate models. Use task-specific calibration layers that can be updated independently.
Task Imbalance
When one task has 10x more training data than another, the model optimizes primarily for the data-rich task. The data-poor task gets insufficient gradient signal and underperforms.
Mitigation: Oversample minority tasks. Use loss weighting inversely proportional to data volume. Apply curriculum learning: start with balanced sampling, gradually shift toward natural distribution.