What is Model Versioning in Production ML Systems?
Why Full Lineage Matters
Without full lineage, rollback becomes dangerous. Imagine rolling back just the model binary but the feature preprocessing changed: your old model receives inputs it was never trained on, silently degrading accuracy by 10 to 20 percent while infrastructure metrics look fine. The model thinks it is receiving normalized values between 0 and 1, but the new preprocessing emits unnormalized floats. Latency is great, error rate is zero, but predictions are garbage.
Model Registry Architecture
Netflix, Uber, LinkedIn, and Airbnb all maintain central model registries that track complete lineage with explicit lifecycle states: Staging, Canary, Production, and Archived. Artifacts are stored with content addressable identifiers (cryptographic hash) for immutability and semantic versions (v2.3.1) for human readability. The manifest records the exact code commit, hyperparameters, feature schema versions, and dataset snapshot.
Time Travel Debugging
At Uber's scale (millions of predictions per second), this discipline enables forensic debugging: engineers can time travel to reproduce the exact model that served a request three weeks ago, including the feature values it saw. When a customer complains about a bad ETA prediction, engineers can reconstruct the entire inference context and verify whether the model behaved correctly given its inputs.
Version Tuple Components
A complete version tuple contains: model artifact (weights, architecture), training code version (git SHA), feature definitions and transformations (schema version), dataset snapshot pointer (timestamp or hash), runtime environment (container image, library versions), and configuration (hyperparameters, thresholds). Missing any component breaks reproducibility.