Computer Vision SystemsMulti-task LearningHard⏱️ ~2 min

Failure Modes: Negative Transfer and Data Drift

Negative Transfer

Negative transfer occurs when adding a task hurts performance on existing tasks. Instead of helping each other, tasks compete for shared capacity. The multi-task model performs worse than separate single-task models.

Why it happens: Tasks may require conflicting features. Texture classification benefits from high frequency details. Shape classification benefits from smoothed, abstract features. Forcing both through the same backbone creates a compromise that serves neither well.

Detection: Compare multi-task model performance against single-task baselines. If any task is 2%+ worse in the multi-task setting, negative transfer is occurring.

Mitigation: Increase backbone capacity. Use soft parameter sharing instead of hard sharing. Add task-specific layers earlier in the network. In severe cases, remove the conflicting task from the multi-task setup.

Uneven Data Drift

Production data changes over time. In multi-task settings, tasks may drift at different rates. User behavior changes affect click prediction immediately. Seasonal patterns affect image classification gradually.

The problem: When you retrain on new data, one task improves dramatically while another barely changes or even degrades. The optimal retraining frequency differs per task, but multi-task models must be retrained as a unit.

Mitigation: Monitor per-task accuracy drift independently. If tasks drift at very different rates, consider decoupling them into separate models. Use task-specific calibration layers that can be updated independently.

Task Imbalance

When one task has 10x more training data than another, the model optimizes primarily for the data-rich task. The data-poor task gets insufficient gradient signal and underperforms.

Mitigation: Oversample minority tasks. Use loss weighting inversely proportional to data volume. Apply curriculum learning: start with balanced sampling, gradually shift toward natural distribution.

💡 Key Takeaways
Negative transfer: multi-task model performs worse than single-task baselines due to feature conflicts
Detect negative transfer by comparing against single-task baselines - 2%+ degradation indicates problems
Uneven data drift forces suboptimal retraining schedules when tasks change at different rates
Task imbalance from 10x data differences causes model to neglect minority tasks
📌 Interview Tips
1Interview Tip: Explain negative transfer as task competition for shared capacity - not all tasks benefit from sharing
2Interview Tip: Mention monitoring per-task drift separately as a production best practice for multi-task systems
← Back to Multi-task Learning Overview