Warm Start, Transfer Learning, and Multi Objective HPO
Warm Starting
Warm starting from prior experiments on similar data or models can cut required trials by 2 to 5 times in practice. The idea is to initialize the search using historical knowledge: set priors for hyperparameter ranges based on distributions of past successful configs, anchor the initial design around known good regions, or directly transfer the best config from a related task as the starting point. The risk is bias when task drift is significant; if the data distribution or architecture changed substantially, the prior can trap search in a suboptimal basin.
Transfer Learning for HPO
Transfer learning for HPO often uses meta features like dataset statistics (number of examples, feature dimensionality, class balance) or model characteristics (layer count, parameter count) to gate whether to apply a prior. Systems at Google Vizier and Meta Ax maintain registries of prior studies with their configs and outcomes. When starting a new study, they compute similarity scores and blend the most relevant priors with exploration.
Multi Objective Optimization
Multi objective and constrained optimization handles real production requirements like maximizing accuracy subject to inference latency under 100 milliseconds or model size under 500 megabytes for mobile deployment. Constrained Bayesian Optimization models both the objective and constraint surfaces, selecting candidates that maximize expected improvement while staying feasible. Pareto optimization maintains a frontier of non dominated solutions.
Production Requirements
In practice, multi objective search needs larger initial designs (50 to 100 quasi random seeds) to adequately sample the feasible region. Netflix and Uber commonly use constrained BO to tune models that must meet SLAs on latency percentiles and throughput while maximizing model quality metrics.