Learn→ML Infrastructure & MLOps→Cost Optimization (Spot Instances, Autoscaling)→5 of 6

ML Infrastructure & MLOps • Cost Optimization (Spot Instances, Autoscaling)Hard⏱️ ~3 min

Production Pattern: Baseline On Demand Plus Spot Burst Capacity

Hybrid Capacity Pattern: Run baseline load on reliable on-demand instances while handling traffic spikes and batch workloads on cheaper spot instances. This combines spot cost savings with on-demand reliability for a balanced cost-risk profile.
Architecture Pattern
Identify your minimum required capacity—the load that must be served even during spot unavailability. Run this baseline on on-demand instances with reserved capacity for additional savings. Above baseline, use spot instances that can be interrupted without violating SLAs. For inference: 60% on-demand baseline handles typical load, 40% spot handles peaks. For training: critical production model training on on-demand, experimental and hyperparameter search on spot. This hybrid approach captures 50-70% of spot savings while eliminating availability risk.
Dynamic Rebalancing
When spot instances are interrupted, automatically shift load to on-demand baseline. The baseline absorbs overflow temporarily while replacement spot instances launch. Design systems to handle this gracefully: queuing during transition, graceful degradation of non-critical features, priority-based request handling. If spot interruption rate exceeds threshold (e.g., 20% per hour), temporarily suspend spot usage and run entirely on-demand until spot market stabilizes.
Reserved Instances for Baseline
On-demand baseline is expensive at full price. Reserved instances (1-year or 3-year commitment) provide 30-60% discount for predictable baseline load. The optimal strategy: reserved instances for minimum sustained load, on-demand for predictable variation, spot for burst and experimental workloads. This layered approach minimizes cost at each tier while maintaining appropriate reliability guarantees.
Typical Split: 40% reserved (baseline), 20% on-demand (variation), 40% spot (burst). Actual ratios depend on load predictability and interruption tolerance.

💡 Key Takeaways

✓Run minimum required capacity on reliable on-demand, burst on spot

✓Hybrid approach captures 50-70% of spot savings while eliminating availability risk

✓Layered strategy: reserved (baseline), on-demand (variation), spot (burst)

📌 Interview Tips

1Inference: 60% on-demand baseline, 40% spot for peaks

2Typical split: 40% reserved, 20% on-demand, 40% spot

← Back to Cost Optimization (Spot Instances, Autoscaling) Overview