What is Multi-Task Learning?
Why Multi-task Learning Works
Related tasks share underlying structure. Detecting objects and estimating their depth both require understanding scene geometry. When tasks share a backbone network, features learned for one task help the others. This is called positive transfer.
Efficiency gain: Three separate models might use 300MB each (900MB total). A multi-task model uses 100MB for shared layers plus 20MB per task head (160MB total). You get 5x memory savings while maintaining or improving accuracy.
Architecture Overview
Shared backbone: Convolutional or transformer layers that process raw input. These layers learn features useful for all tasks.
Task-specific heads: Small networks branching from the backbone. Each head specializes in one task - classification, detection, segmentation.
Joint training: All tasks train together. Gradients from each task flow back through the shared backbone, creating representations that balance all task requirements.
When Multi-task Makes Sense
Multi-task learning helps when tasks are related and data is limited. If you have abundant data for each task, separate models may perform better. The sweet spot: related tasks where some have limited data that benefits from transfer.