Computer Vision SystemsMulti-task LearningMedium⏱️ ~3 min

When to Choose Multi-Task vs Separate Models

Choose multi task learning when tasks share the same input modality, have aligned feature needs, and face strict serving latency budgets. Vision tasks operating on the same camera frames are ideal candidates. Object detection, depth estimation, and lane segmentation all benefit from low level edge and texture features learned in early convolutional layers. Tesla uses a shared vision backbone to output all perception tasks within a 30 to 50 millisecond budget on embedded GPU accelerators, which would be impossible with separate models. Recommendation and ranking systems with multiple business objectives are another strong fit. Predicting CTR, CVR, dwell time, and engagement on the same user and item features allows dense tasks to help sparse tasks. Meta and Google report using multi objective or multi task architectures because serving four separate models would exceed latency SLOs and require 2 to 3 times more infrastructure. The key is that objectives are measured on the same user action, so features and data distributions align. Avoid multi task learning when tasks have different domains, different latency requirements, or fundamentally conflicting objectives that cannot be resolved through loss balancing. If one task needs 5 millisecond p99 latency while another can tolerate 100 milliseconds and benefits from a much larger model, separating them simplifies operations. If legal or policy constraints require strict control over objective weights, or if tasks are owned by different teams with independent release cycles, separate models with a downstream policy combiner provide better isolation and velocity. For edge cases where you want shared learning but operational simplicity, consider knowledge distillation. Train a large multi task teacher model offline to capture positive transfer and consistency across tasks. Then distill to separate per task student models that are smaller and decoupled for serving. This keeps the sample efficiency and regularization benefits of multi task learning while decoupling serving, SLAs, and team ownership. The distillation overhead is paid once offline, and serving remains simple.
💡 Key Takeaways
Multi task excels when tasks share input modality and low level features, like vision tasks on same camera frames within 30 to 50ms budget
Ranking multiple objectives on same user action (CTR, CVR, dwell) benefits from positive transfer, Meta and Google use this at scale
Separate models when tasks have different latency SLOs (5ms vs 100ms), different domains, or independent team ownership and release cycles
Knowledge distillation hybrid approach trains multi task teacher offline, distills to separate student models for serving simplicity
Conflicting objectives that cannot be resolved through loss balancing are better handled with separate models and downstream policy layer
📌 Examples
Tesla perception: Shared vision backbone for detection, depth, segmentation on same frames, meets 30ms latency on embedded GPU
Uber pricing and dispatch: Separate models because pricing needs 100ms for complex optimization, dispatch needs 10ms for driver matching
Google Search and Ads: Separate models with different latency needs and team ownership, combined in serving layer with explicit policy
← Back to Multi-task Learning Overview
When to Choose Multi-Task vs Separate Models | Multi-task Learning - System Overflow