ML Infrastructure & MLOpsModel RegistryMedium⏱️ ~2 min

Model Versioning and Lineage Tracking

Definition
Model versioning tracks every iteration with unique identifiers, while lineage tracking records data, code, and parameters that produced each version—enabling reproducibility.

VERSION NUMBERING SCHEMES

Semantic: major.minor.patch (2.1.3). Major = breaking, minor = features, patch = fixes. Timestamp: 20240115-143052, automatic and sortable. Hash-based: Git SHA or content hash, guarantees uniqueness and reproducibility.

LINEAGE COMPONENTS

Data: Dataset version, preprocessing, features. Code: Git commit, script hash, library versions. Parameters: Hyperparameters, seeds, hardware. Experiment: Which tests validated this model.

💡 Insight: Lineage enables time travel debugging. When v2.3 regresses, lineage shows it trained on data v1.5 while v2.2 used v1.4—pinpointing the data change as cause.

IMMUTABLE ARTIFACTS

Every version must be immutable. Once registered, weights and metadata cannot change—updates create new versions. This prevents "worked yesterday" bugs. Use content-addressable storage.

VERSION LIFECYCLE

Registered: Available for testing. Staged: Passed validation. Production: Serving. Archived: Retained for rollback. Deprecated: Scheduled for deletion. Track transitions with timestamps.

⚠️ Trade-off: Full lineage adds storage overhead. Capture essentials: data version, code commit, key hyperparameters.
💡 Key Takeaways
Version schemes: semantic (human judgment), timestamp (automatic), hash (reproducible)
Lineage tracks data, code, parameters, and experiments that produced each version
Immutable artifacts prevent worked yesterday bugs—updates create new versions
📌 Interview Tips
1Explain lineage as time travel debugging—trace regressions to data or code changes
2Describe version lifecycle: registered → staged → production → archived
← Back to Model Registry Overview