Implementation Patterns: Two Level Scheduling and Profiling Based Co-location
Two Level Scheduler Architecture
Production GPU schedulers implement two level architecture: global admission control with topology aware placement, and local device management with spatial or temporal sharing enforcement. The global layer runs gang scheduling, determines which jobs to admit based on priority tiers, and solves the topology constrained bin packing problem. It models the GPU to PCIe switch to CPU socket to node hierarchy and prefers packing within NVLink islands before spanning nodes.
Local Device Management
The local layer enforces sharing policies: whole GPU or MIG slice allocation for spatial isolation, or time slicing aligned to mini batch boundaries for temporal sharing. It manages CUDA context lifecycle to amortize creation cost (hundreds of milliseconds) and applies MPS when beneficial for many small kernels. For AoT execution, it warms up records once to build the operator DAG and stream assignments, then replays the schedule every iteration.
Co-location and Interference Management
Co-location and interference management require profiling. Systems build an interference matrix by running representative job pairs or triples under different allocation shares, measuring compute utilization, memory bandwidth saturation, and latency. Profiling reveals which workloads are complementary: pairing a memory bound data preprocessing task with a compute bound training iteration can increase aggregate throughput by 30 to 50 percent. A greedy or ILP based bin packer uses the matrix to maximize throughput subject to QoS constraints.
Elastic Scaling and Orchestration
Elastic scaling and workflow orchestration complete the system. Elastic jobs grow or shrink worker count at safe synchronization points to defragment capacity without destabilizing training. DAG schedulers predict task durations from historical runs to pre-warm capacity and reduce idle gaps. Critical path analysis prioritizes tasks that unblock many downstream operations.