Core HPO Algorithms: Random vs Bayesian vs Multi Fidelity
Random and Grid Search
Random and grid search serve as robust baselines that are embarrassingly parallel but inefficient. Grid search scales exponentially with dimensions (10 values across 5 hyperparameters requires 100,000 trials), making it impractical beyond 3 to 4 dimensions. Random search finds good regions faster by sampling the space uniformly, but without pruning it wastes budget evaluating unpromising configurations to full fidelity.
Bayesian Optimization
Bayesian Optimization (BO) builds a surrogate model of the objective using Gaussian Processes, Tree structured Parzen Estimators, or random forest surrogates, then selects new points via acquisition functions like Expected Improvement. BO shines when each evaluation is expensive (minutes to hours) and parallelism is moderate at 8 to 64 concurrent trials. In practice, BO requires 3 to 10 times fewer full budget trials than random search to reach similar quality. The tradeoff is that BO requires careful handling of categorical and conditional spaces.
Multi Fidelity Methods
Bandit based multi fidelity methods like Successive Halving, Hyperband, and ASHA allocate small budgets (a fraction of epochs or subset of data) to many configs, then promote only the top quantiles to larger budgets. ASHA typically defines 3 to 5 rungs with downsampling factor of 3 to 5 per rung, promoting the top 20 to 30 percent at each checkpoint. This approach prunes 70 to 95 percent of trials after they consume only 10 to 30 percent of their full budget.
Trade-offs
Multi fidelity scales to hundreds of workers and achieves near linear wall clock speedup. The limitation is that it requires a meaningful fidelity axis and can occasionally prune late bloomers that start slow but converge better.