ETL/ELT PatternsTransformation Layers (Bronze/Silver/Gold)Easy⏱️ ~3 min

What Are Transformation Layers (Bronze/Silver/Gold)?

Definition
Bronze/Silver/Gold layers (also called Medallion architecture) are a data organization pattern that separates raw ingestion, cleaning, and business metric computation into three distinct layers with different quality guarantees and ownership models.
The Core Problem: When you ingest data from tens or hundreds of sources (microservices, SaaS tools, event streams, databases), each has different schemas, quality issues, and update behaviors. A large ecommerce platform might pull order data from transaction services, clickstream events from web clients, CRM data from Salesforce, and financial records from an ERP system. Without structure, this chaos becomes unmaintainable. The three layer approach provides separation of concerns. Think of it like software architecture: you don't mix data access, business logic, and presentation. Similarly, you separate data capture, data cleaning, and business metrics. Bronze Layer (Raw Landing Zone): This is your source of truth for "what exactly did the system emit?" Data lands here in its original form, including all the messiness: duplicates, nulls, schema inconsistencies, even corrupted records. Storage is append only and lossless. If a source system sends malformed JSON at 3am, Bronze captures it exactly as received. This gives you forensic capability and the ability to replay history if you discover a bug months later. Silver Layer (Cleaned and Conformed): This refines Bronze into trusted, standardized datasets. It removes duplicates, fixes data types, validates schemas, and aligns naming conventions across sources. For example, if three source systems call the same field customer_id, cust_id, and userId, Silver normalizes these to a single standard. This becomes your default analytical source of record. Gold Layer (Business Metrics): This contains curated, denormalized tables optimized for specific business questions. Examples include daily revenue by region, customer 360 views, or churn prediction features. These are what BI dashboards and ML models query directly. Gold sacrifices generality for performance and business alignment.
✓ In Practice: At typical FAANG scale, you might ingest 10 to 100 TB daily from hundreds of sources. Bronze stores it all raw. Silver applies quality rules with 5 to 30 minute latency for streaming tables. Gold powers user facing dashboards with sub second query response times.
💡 Key Takeaways
Bronze stores raw, immutable copies of source data exactly as received, enabling forensic analysis and historical replay when bugs are discovered
Silver applies business agnostic cleaning rules like deduplication, schema validation, and naming standardization to create trusted analytical datasets
Gold contains denormalized, pre aggregated tables optimized for specific business use cases like dashboards and ML models
Each layer has different owners: platform teams typically manage Bronze and Silver, while domain teams own their Gold datasets
The separation allows independent evolution: source systems can change without breaking business metrics, and metric definitions can evolve without touching raw data
📌 Examples
1An ecommerce platform ingests order events to Bronze as raw JSON with all original fields. Silver transforms these into a standardized orders table with validated customer IDs, product IDs, and timestamps. Gold aggregates this into daily_revenue_by_region for executive dashboards.
2When a source system changes a field from <code>order_date</code> to <code>orderTimestamp</code>, Bronze captures both versions over time. Silver normalizes both to a standard <code>order_timestamp</code> field, shielding downstream Gold tables from the change.
← Back to Transformation Layers (Bronze/Silver/Gold) Overview