Cost Control: On Demand vs Spot, Scale to Zero, and Fractional Allocation
The Cost Scale
GPU compute cost dominates ML infrastructure budgets. A single NVIDIA A100 on demand instance costs approximately $3 to $4 per hour on major cloud providers, adding up to $2,200 to $3,000 per month for always on capacity. At scale with dozens or hundreds of GPUs, monthly bills reach hundreds of thousands of dollars. Effective cost optimization requires combining multiple strategies that trade off reliability, availability, and operational complexity.
On Demand vs Spot
On demand instances provide guaranteed availability and stable pricing, suitable for latency critical inference serving production traffic with strict SLOs. Spot or preemptible instances offer 60% to 80% discounts (reducing A100 cost from $3/hour to $0.60 to $1.20/hour) but can be interrupted with 30 to 120 seconds notice. This makes spot ideal for batch training, fine tuning jobs, and opportunistic inference overflow traffic that tolerates interruptions through checkpointing and retry logic.
Scale to Zero
Eliminates idle costs by shutting down GPU node groups when no workloads are running. A development cluster with sporadic usage can reduce monthly costs from $15,000 (always on) to $2,000 (actual usage hours) through aggressive scale to zero policies. The trade off is cold start latency: the first request after scale to zero waits 240+ seconds for node provisioning and model loading. Production systems use hybrid approaches: scale to zero for batch and development workloads, maintain small warm pools (one to two replicas) for latency critical inference.
Combined Strategy Example
A production system might use on demand full GPUs for critical inference (30% of capacity), spot fractional GPUs for batch jobs (50% of capacity), and scale to zero for development (20% of capacity), reducing total costs by 40% to 60% compared to naive always on on demand allocation. Fractional GPU allocation through MIG allows seven small models to share one A100 at $3/hour instead of seven separate V100s at $17.50/hour.