ETL/ELT PatternsTransformation Layers (Bronze/Silver/Gold)Medium⏱️ ~3 min

Silver Layer: Building Trusted Analytical Datasets

The Transformation Workflow: Silver jobs consume Bronze incrementally, often through streaming or micro batch processing. They apply a specific set of transformations: schema validation, deduplication, type casting, referential integrity checks, and naming standardization. The goal is to create clean, conformed datasets that analysts and data scientists can trust without needing to understand source system quirks. Consider a common scenario: three different source systems store customer information, each with its own schema. System A uses customer_id (integer), System B uses custId (string), and System C uses user_id (UUID). Silver transformations map all three to a standardized customer_uuid column with consistent type and format.
1
Schema Validation: Check that required fields exist, types match expectations, and values fall within valid ranges. Records failing validation are quarantined with error details.
2
Deduplication: Apply business keys plus event timestamps to identify and remove duplicate records, keeping the most recent or authoritative version.
3
Standardization: Normalize field names, date formats, and reference data across sources. Map US, USA, and United States to a single value.
4
SCD Type 2 Handling: For Slowly Changing Dimensions, maintain versioned records with valid_from, valid_to, and is_current flags.
Performance and Freshness: Databricks customers commonly achieve end to end Silver latency under 15 minutes for streaming refreshed tables at p99. This means that 99% of events that landed in Bronze are cleaned and available in Silver within 15 minutes. For batch oriented Silver tables updated hourly or daily, this latency extends but remains predictable.
Silver Layer Latency Targets
STREAMING
5-15 min
BATCH
1-24 hrs
Ownership and Stability: Central data platform teams typically own most Silver tables because they represent source aligned, business agnostic transformations. This centralization is intentional: it prevents different teams from applying conflicting cleaning rules to the same source data. If Marketing and Finance both need customer data, they both consume the same Silver customer table rather than creating two different cleaned versions. The risk here is Silver becoming a dumping ground. If teams start encoding business specific logic in Silver ("only include customers who made a purchase in the last 90 days"), then changes break other consumers. Silver should answer "what is the clean version of this source entity" not "what subset does my team need right now." Business specific filters and joins belong in downstream Platinum or Gold layers.
"Silver is your system of record for analytical work. It should be stable, well documented, and resistant to frequent breaking changes. If Silver schemas change weekly, downstream teams cannot build reliable pipelines."
💡 Key Takeaways
Silver applies business agnostic transformations: schema validation, deduplication, type casting, and naming standardization across sources
Typical streaming Silver latency is 5 to 15 minutes at p99, while batch refreshes occur hourly to daily depending on use case
Central platform teams own most Silver tables to ensure consistent cleaning rules and prevent conflicting interpretations of source data
Slowly Changing Dimensions in Silver use Type 2 with <code>valid_from</code>, <code>valid_to</code>, and <code>is_current</code> flags to support time travel queries
Silver becomes the default analytical source of record, so schema stability is critical; frequent breaking changes cascade to all downstream consumers
📌 Examples
1A Silver customer table consolidates three Bronze sources with inconsistent schemas. It maps <code>customer_id</code>, <code>custId</code>, and <code>user_id</code> to a standard <code>customer_uuid</code>, handles 15 different country code formats (US, USA, United States), and deduplicates based on the most recent <code>updated_at</code> timestamp.
2When a source system starts emitting duplicate events due to a retry bug, Bronze stores all copies. Silver deduplication logic uses a composite key of <code>event_id</code> and <code>event_timestamp</code> to keep only unique events, preventing downstream Gold metrics from double counting transactions.
← Back to Transformation Layers (Bronze/Silver/Gold) Overview
Silver Layer: Building Trusted Analytical Datasets | Transformation Layers (Bronze/Silver/Gold) - System Overflow