ETL/ELT Patterns • Transformation Layers (Bronze/Silver/Gold)Medium⏱️ ~3 min
Silver Layer: Building Trusted Analytical Datasets
The Transformation Workflow: Silver jobs consume Bronze incrementally, often through streaming or micro batch processing. They apply a specific set of transformations: schema validation, deduplication, type casting, referential integrity checks, and naming standardization. The goal is to create clean, conformed datasets that analysts and data scientists can trust without needing to understand source system quirks.
Consider a common scenario: three different source systems store customer information, each with its own schema. System A uses
Performance and Freshness: Databricks customers commonly achieve end to end Silver latency under 15 minutes for streaming refreshed tables at p99. This means that 99% of events that landed in Bronze are cleaned and available in Silver within 15 minutes. For batch oriented Silver tables updated hourly or daily, this latency extends but remains predictable.
Ownership and Stability: Central data platform teams typically own most Silver tables because they represent source aligned, business agnostic transformations. This centralization is intentional: it prevents different teams from applying conflicting cleaning rules to the same source data. If Marketing and Finance both need customer data, they both consume the same Silver customer table rather than creating two different cleaned versions.
The risk here is Silver becoming a dumping ground. If teams start encoding business specific logic in Silver ("only include customers who made a purchase in the last 90 days"), then changes break other consumers. Silver should answer "what is the clean version of this source entity" not "what subset does my team need right now." Business specific filters and joins belong in downstream Platinum or Gold layers.
customer_id (integer), System B uses custId (string), and System C uses user_id (UUID). Silver transformations map all three to a standardized customer_uuid column with consistent type and format.
1
Schema Validation: Check that required fields exist, types match expectations, and values fall within valid ranges. Records failing validation are quarantined with error details.
2
Deduplication: Apply business keys plus event timestamps to identify and remove duplicate records, keeping the most recent or authoritative version.
3
Standardization: Normalize field names, date formats, and reference data across sources. Map
US, USA, and United States to a single value.4
SCD Type 2 Handling: For Slowly Changing Dimensions, maintain versioned records with
valid_from, valid_to, and is_current flags.Silver Layer Latency Targets
STREAMING
5-15 min
→
BATCH
1-24 hrs
"Silver is your system of record for analytical work. It should be stable, well documented, and resistant to frequent breaking changes. If Silver schemas change weekly, downstream teams cannot build reliable pipelines."
💡 Key Takeaways
✓Silver applies business agnostic transformations: schema validation, deduplication, type casting, and naming standardization across sources
✓Typical streaming Silver latency is 5 to 15 minutes at p99, while batch refreshes occur hourly to daily depending on use case
✓Central platform teams own most Silver tables to ensure consistent cleaning rules and prevent conflicting interpretations of source data
✓Slowly Changing Dimensions in Silver use Type 2 with <code>valid_from</code>, <code>valid_to</code>, and <code>is_current</code> flags to support time travel queries
✓Silver becomes the default analytical source of record, so schema stability is critical; frequent breaking changes cascade to all downstream consumers
📌 Examples
1A Silver customer table consolidates three Bronze sources with inconsistent schemas. It maps <code>customer_id</code>, <code>custId</code>, and <code>user_id</code> to a standard <code>customer_uuid</code>, handles 15 different country code formats (US, USA, United States), and deduplicates based on the most recent <code>updated_at</code> timestamp.
2When a source system starts emitting duplicate events due to a retry bug, Bronze stores all copies. Silver deduplication logic uses a composite key of <code>event_id</code> and <code>event_timestamp</code> to keep only unique events, preventing downstream Gold metrics from double counting transactions.