Learn→ML Infrastructure & MLOps→Model Packaging (Docker, ONNX, SavedModel)→6 of 6

ML Infrastructure & MLOps • Model Packaging (Docker, ONNX, SavedModel)Hard⏱️ ~3 min

Building Lean Inference Containers: Multi Stage Builds and Optimization Patterns

Multi-Stage Builds: A Docker pattern where the Dockerfile has multiple FROM statements, each creating an intermediate image. Build-time dependencies stay in early stages; only runtime artifacts are copied to the final image. This separates build environment from runtime environment.
Why Multi-Stage Matters
Building a model package requires compilers, development headers, and build tools. Running inference requires only the runtime and model. Without multi-stage builds, all build dependencies end up in the final image. Example: compiling a Python package with C extensions needs gcc, make, and python-dev (adds 500MB). The compiled .so file is under 1MB. Multi-stage builds: stage 1 has build tools and compiles the package; stage 2 copies only the compiled artifact. Final image is 500MB smaller.
Optimization Patterns
Minimize layers: Combine related commands into single RUN statements. Each layer adds overhead; fewer layers mean smaller images. Order by change frequency: Put rarely-changing layers first (base image, system packages), frequently-changing layers last (model files). This maximizes cache hits during rebuilds. Clean up in same layer: If you install packages and then delete cache files, do both in the same RUN command. Docker layers are additive—deleting in a later layer does not reduce image size.
Practical Size Targets
CPU inference images: target under 500MB. GPU inference images: target under 2GB (CUDA libraries add significant size). If your image exceeds these targets, audit dependencies. Common bloat sources: full ML framework instead of inference-only build, test dependencies left in production image, unnecessary Python packages from requirements.txt copy-paste, cached package downloads not cleaned up. Use tools like dive to inspect layer contents and identify waste.
Dockerfile Example Pattern: Stage 1 (builder): install build tools, compile dependencies. Stage 2 (runtime): start from slim base, copy compiled artifacts from builder, copy model, set entrypoint.

💡 Key Takeaways

✓Multi-stage builds separate build dependencies from runtime, reducing image size

✓Order Dockerfile layers by change frequency for maximum cache efficiency

✓Target under 500MB for CPU images, under 2GB for GPU images

📌 Interview Tips

1Build tools (gcc, make) add 500MB but compiled artifacts are under 1MB

2Use dive tool to inspect layer contents and identify bloat sources

← Back to Model Packaging (Docker, ONNX, SavedModel) Overview