Data Modeling & Schema DesignSlowly Changing Dimensions (SCD)Medium⏱️ ~3 min

SCD Type 1 vs Type 2: The Storage vs History Tradeoff

Type 1: Overwrite in Place Type 1 is the simplest approach. When an attribute changes, you simply update the existing row. If customer Alice moves from Seattle to Portland, you update her single Customer row with the new city. Done. This keeps tables small and queries trivial. Your 100 million customer dimension stays at 100 million rows forever. Joins are straightforward, no filtering by date ranges or flags. At Walmart scale processing 50,000 orders per second, simpler joins mean faster queries and lower compute costs. The brutal tradeoff: you lose all history. Every historical fact now points to current attribute values. All of Alice's past Seattle orders will now appear to have come from Portland in your reports. For truly corrective changes like fixing a misspelled name or invalid phone number, this is perfect. For analytically meaningful changes, it's a disaster waiting to happen. Type 2: Full Version History Type 2 creates a new row for every change. Each version gets a surrogate key, an effective start timestamp, an effective end timestamp, and a current flag. When Alice moves cities, you don't touch her old row. Instead, you close it by setting its effective end to today and current flag to false, then insert a new row with a new surrogate key starting today. Facts capture the surrogate key that was current when the event happened. Alice's Seattle order from yesterday points to surrogate key 12345, which forever shows Seattle. Her Portland order tomorrow points to new surrogate key 12346, which shows Portland. Perfect historical accuracy.
❗ Remember: A dimension that would hold 200 million products might grow to 4 billion rows over several years if products average 20 versions each. This isn't theoretical – this is actual production scale at large retailers.
Queries become more complex. To get current customers, you filter WHERE current_flag = true. For historical analysis, you join facts to dimensions WHERE fact.event_time BETWEEN dim.effective_start AND dim.effective_end. These temporal joins are expensive and require careful indexing. When to Choose Each: Use Type 1 for purely corrective fields where history doesn't matter: data quality fixes, standardizing formats, correcting errors. Also use it when you genuinely never need historical values, though be careful assuming this. Use Type 2 when time based questions are common: churn analysis by past subscription plan at Netflix, revenue trends by customer segment at time of purchase, understanding how product categorization changes affected sales. Accept the storage and complexity cost as the price of analytical accuracy.
💡 Key Takeaways
Type 1 keeps dimension tables at their natural size but destroys all history, suitable only for corrections or truly non analytical attributes
Type 2 can multiply table size by 10 to 50 times over several years, with a 200 million product dimension growing to 4 billion rows at 20 versions per product
Type 2 requires surrogate keys, temporal attributes like effective start and effective end, and current flag for efficient querying
Temporal joins in Type 2 require indexing on effective dates and careful query planning to avoid table scans on billion row dimensions
The choice is irreversible in practice: switching from Type 1 to Type 2 after you've lost history means you can never recover that analytical capability
📌 Examples
Type 1 use case: correcting a misspelled customer name from 'Alise' to 'Alice' where the typo has no analytical meaning
Type 2 use case: tracking customer loyalty tier changes to analyze 'What tier were customers in when they made their largest purchase?'
Production reality at Target: Customer dimension with 50 million customers averaging 3 versions over 2 years reaches 150 million rows
Query cost difference: Type 1 simple join on customer_id takes 2 seconds, Type 2 temporal join with date range can take 30 seconds without proper indexing
← Back to Slowly Changing Dimensions (SCD) Overview