ML Model OptimizationNeural Architecture Search (NAS)Hard⏱️ ~3 min

Multi Fidelity Evaluation Strategy in NAS

Multi fidelity evaluation reduces NAS cost by training candidates at different levels of investment, progressively filtering to identify top performers without fully training every architecture. The strategy uses a staged pipeline where each level increases compute but evaluates fewer candidates. Stage 1 trains all candidates for 1 to 3 epochs on downsampled 160 pixel images, with aggressive early stopping for poor performers after just 1 epoch. This allows an Nvidia A100 GPU processing roughly 400 images per second at 160 pixels with mixed precision to evaluate a candidate in about 2.7 hours over ImageNet's 1.28 million images. With 1000 GPUs, a search can cycle through 350 to 400 candidates per hour at this low fidelity. Stage 2 promotes only the top 5 to 10 percent to train for 10 to 30 epochs at full 224 pixel resolution, filtering based on validation accuracy and estimated latency from lookup tables. Stage 3 fully trains the top 3 to 5 finalists to convergence, then applies quantization and measures real device latency on physical hardware. This pyramid structure invests maximum compute only on the most promising candidates. Learning curve extrapolation enhances this by predicting final performance from early training curves, stopping doomed runs even earlier. The critical trade off is evaluation bias versus cost. Supernet weight sharing, where all architectures share a single set of weights, can misrank candidates because a subnet that performs well under shared weights may underperform when trained from scratch. Google's research on NAS evaluation found that supernet rankings had Kendall tau correlation around 0.4 to 0.6 with standalone training rankings, meaning 40 to 60 percent agreement. Fair path sampling, path dropout rates around 0.2, and re-ranking the top K with partial independent training mitigate this. Production systems track both proxy metrics (low fidelity accuracy) and final metrics (full training plus device measurement), accepting that some promising candidates will be missed to keep search cost practical.
💡 Key Takeaways
Three stage pipeline: Stage 1 evaluates all 1000 candidates at 3 epochs and 160 pixels in 2.7 hours each, Stage 2 trains top 5% for 10 to 30 epochs at 224 pixels, Stage 3 fully trains top 5 to convergence
Cost reduction: With 1000 A100 GPUs processing 400 images per second at low fidelity, system cycles through 350 to 400 candidates per hour versus weeks for full training
Supernet weight sharing creates ranking bias: Kendall tau correlation of 0.4 to 0.6 between shared weight rankings and standalone training means 40 to 60% of relative orderings mismatch
Mitigation strategies include fair path sampling with uniform selection, path dropout around 0.2 to reduce co adaptation, and re ranking top K with independent partial training
Device measurement uses 32 physical phones per target, 30 inference runs per architecture with 10 warmup discards, logging median and 95th percentile latency to handle thermal throttling and background noise
📌 Examples
ImageNet search with A100: 1.28M images, 400 img/sec at 160px mixed precision, 3 epochs = 2.7 hours per candidate, 1000 GPUs evaluate 350 to 400 candidates/hour
Supernet ranking correlation: Research shows Kendall tau of 0.4 to 0.6 between supernet shared weights and standalone training, requiring re ranking of top candidates
Progressive filtering: Start with 1000 candidates in Stage 1, promote top 50 (5%) to Stage 2 with 10x more training, select top 5 (0.5%) for Stage 3 full training and quantization
← Back to Neural Architecture Search (NAS) Overview
Multi Fidelity Evaluation Strategy in NAS | Neural Architecture Search (NAS) - System Overflow