ML Infrastructure & MLOpsModel RegistryMedium⏱️ ~3 min

Model Registry Core Entities and Immutability Design

The registry is built around three core entities. A Model represents a logical grouping like fraud detection or recommendation ranking. A Model Version is an immutable artifact from a specific training run, identified by a content hash computed over the binary and signature. A Stage or Environment like dev, staging, or prod acts as a pointer to a specific version. This separation between identity and labels is critical. The version itself never changes, but the prod pointer can move from v1.23 to v1.24 during promotion. Each version references an artifact in object storage, typically 300 MB to 5 GB, and includes a model signature that defines the interface contract. The signature specifies input feature names and types, output schema, required preprocessing versions, and supported content types. This prevents a common failure mode where a service upgrades its feature extraction code but loads an older model trained on different features, causing silent accuracy degradation. The registry also stores training metadata like the data snapshot identifier, feature set version, hyperparameters used, and the git commit of the training code. Evaluation metrics are recorded both offline, computed on held out test sets, and online, measured during canary analysis on live traffic. Governance data is essential for regulated domains. The registry tracks approval status with approver identity and timestamp, model card documentation describing intended use and known limitations, risk tier tags like high risk or low risk, and retention policies. Many organizations also record a Deployment binding that links an application release or git commit to the exact model version it must load. This closes the loop and prevents model code skew, a failure mode where different service instances load different model versions due to inconsistent pointers or race conditions. Immutability trades off agility for reliability. Making versions immutable with content addressing simplifies audit and enables reliable rollbacks, but forces new versions for small tweaks and increases storage costs. A typical retention policy keeps the last 10 production versions and all staging versions from the past 90 days, archiving older versions but retaining metadata indefinitely for compliance. Mutable pointers to latest provide speed but create ambiguity and race conditions during rollouts.
💡 Key Takeaways
Model Version is immutable with content hash over artifact and signature, semantic labels like v1.23 are just pointers that can move
Model signature enforces interface contract including input features, output schema, preprocessing versions to prevent training serving skew
Training metadata links version to data snapshot identifier, feature set version, hyperparameters, and git commit for full reproducibility
Deployment binding records mapping from application release to exact model version, preventing race conditions where instances load different versions
Governance data includes approval status with approver identity, model card documentation, risk tier tags, and retention policies for compliance
Immutability simplifies audit and rollback but forces new versions for small changes, typical policy retains last 10 production versions
📌 Examples
Content hash: sha256:a3f2b1c4... computed over model binary and signature, ensures identical artifact even if semantic label changes
Model signature: {"inputs": [{"name": "transaction_amount", "type": "float"}, {"name": "merchant_id", "type": "int64"}], "preprocessing_version": "v2.1"}
Deployment binding: Service fraud_api version 1.45 must load model fraud_detection version sha256:a3f2... to prevent skew during rollout
← Back to Model Registry Overview
Model Registry Core Entities and Immutability Design | Model Registry - System Overflow