Learn→ML Infrastructure & MLOps→Model Packaging (Docker, ONNX, SavedModel)→5 of 6

ML Infrastructure & MLOps • Model Packaging (Docker, ONNX, SavedModel)Hard⏱️ ~3 min

Model Packaging Failure Modes: Conversion Pitfalls and Production Gotchas

Common Failure Modes: Model packaging fails through silent conversion errors, missing preprocessing, dependency conflicts, and hardware mismatches. These failures often appear as degraded accuracy rather than obvious errors, making debugging difficult.
Silent Conversion Errors
Converting between formats (PyTorch to ONNX, TensorFlow to TFLite) can silently change model behavior. Custom operations may not have equivalents and get approximated. Dynamic shapes might be fixed to incorrect defaults. Numerical precision changes (float32 to float16) introduce subtle differences. Always compare original and converted model outputs on diverse test inputs. A 0.1% output difference might be acceptable; a 5% difference indicates a conversion bug.
Missing Preprocessing Artifacts
The model file contains only the neural network. Preprocessing artifacts—tokenizers, vocabulary files, normalization statistics, feature transformers—must be packaged separately. A common failure: model is packaged but tokenizer is not. At serving time, a different tokenizer version is loaded, producing different token IDs. The model receives garbage input and produces garbage output, but no error is raised. Package preprocessing artifacts with the same versioning and validation as the model itself.
Dependency Hell
ML dependencies are notoriously version-sensitive. PyTorch 1.9 and 1.10 might produce different results for the same model. CUDA 11.3 vs 11.7 can cause silent numerical differences. Pin exact versions of: framework (torch==1.13.1), numerical libraries (numpy==1.23.4), and system libraries (CUDA, cuDNN). Test the packaged container on target hardware before deployment. "Works on my GPU" does not mean it works on production GPUs.
Validation Checklist: Before deployment, verify: (1) converted model matches original within tolerance, (2) all preprocessing artifacts are present and versioned, (3) container runs on target hardware, (4) end-to-end inference produces expected outputs on golden test set.

💡 Key Takeaways

✓Format conversion can silently change model behavior (custom ops, precision)

✓Preprocessing artifacts (tokenizers, normalizers) must be versioned with the model

✓Pin exact dependency versions including CUDA/cuDNN for reproducibility

📌 Interview Tips

10.1% output difference acceptable; 5% indicates conversion bug

2Different tokenizer version produces different token IDs, garbage predictions

← Back to Model Packaging (Docker, ONNX, SavedModel) Overview