Data Modeling & Schema Design • Slowly Changing Dimensions (SCD)Easy⏱️ ~2 min
What Are Slowly Changing Dimensions (SCD)?
The Core Problem:
Imagine you run analytics for an ecommerce platform with 50 million customers. Yesterday, customer Alice made a purchase while she was in the Gold loyalty tier and lived in Seattle. Today, she moved to Portland and got upgraded to Platinum. When you analyze her historical orders next week, should her yesterday's order show Gold or Platinum? Seattle or Portland?
Slowly Changing Dimensions (SCD) are design patterns that solve exactly this problem. A dimension is an entity you analyze by, such as Customer, Product, Store, or Subscription Plan. Facts are the events you're measuring, like orders, page views, or payments. When dimension attributes change over time, you need a strategy for preserving the correct historical meaning of your facts.
Why This Matters:
At scale, these decisions affect billions of rows and millions of dollars in business decisions. Retailers like Walmart process tens of thousands of orders per second. Customer profiles change at maybe 1% of rows per day, but those changes are analytically critical. If you overwrite Alice's old address, every historical analysis will incorrectly show all her past orders came from Portland.
⚠️ Common Pitfall: Many teams default to overwriting in place because it's simpler. This works until someone asks "What was our revenue by customer segment last quarter using the segments they actually had then?" and you realize you've lost that history forever.
SCD patterns define a contract between your data ingestion, storage, and reporting layers about how change is modeled and what "historical truth" means. The pattern you choose determines storage costs, query complexity, and what questions your analysts can answer.💡 Key Takeaways
•Dimensions are entities like Customer or Product that facts reference via keys, while facts are events like orders or payments
•Changes to dimension attributes create a choice: preserve history or overwrite, affecting all historical analysis downstream
•At enterprise scale with 1% daily change rate on 50 million customers, you're handling 500,000 dimension updates per day
•The SCD pattern you choose is a fundamental contract that affects storage size, query complexity, and analytical capabilities
•Wrong choice can mean losing critical history or exploding storage costs from billions of unnecessary version rows
📌 Examples
A customer moves from Seattle to Portland: do historical orders show old city or new city?
A product changes from category Electronics to Home Goods: does last year's revenue analysis use old or new category?
A subscription plan price increases from $9.99 to $14.99: what price do you show for renewals from last month?