Data Modeling & Schema Design • Slowly Changing Dimensions (SCD)Medium⏱️ ~3 min
Alternative SCD Types: Type 3, Type 4, and Hybrid Patterns
Type 3: Limited Column History
Type 3 adds extra columns to store a small amount of history, typically just the previous value. Instead of creating new rows, you have columns like current_city, previous_city, and city_change_date. When a customer moves from Seattle to Portland, you shift Seattle into previous_city, set Portland as current_city, and record the change date.
This keeps tables narrow and queries simple. Your 100 million customer dimension stays at 100 million rows. You can answer limited historical questions like "How many customers changed cities this month?" without complex temporal joins. The brutal limitation is you can only track one prior value, or maybe two or three if you add more columns.
Type 3 works when you have specific, narrow questions about recent changes. A subscription business might track current_plan, previous_plan, and downgrade_date to analyze recent downgrades for retention campaigns. But if you later want to know "What plan did this customer have 18 months ago?" you're out of luck unless that happens to be the previous value.
Type 4: Separate History Table
Type 4 splits current and historical data into two tables. The main dimension table holds only current rows, staying lean and fast. A separate history table stores all versions using a Type 2 structure with surrogate keys and effective dates.
This reduces contention on the main table. Queries that only need current state, which might be 80% of your workload, avoid scanning billions of historical rows. Historical analysis queries the history table. Some implementations also keep a small current snapshot in the main table for fast lookups.
The cost is complexity. You're managing two tables that must stay synchronized. When a dimension entity changes, you must insert a new current row in the main table (or update the existing one) and simultaneously insert a closed version into the history table. This requires coordinated writes and makes the ETL logic more fragile. If synchronization breaks, your current and historical views diverge.
Type 6: Hybrid Current and Historical
Type 6 combines Type 1, Type 2, and Type 3 behaviors into a single table structure. Each Type 2 version row includes both historical attributes frozen at that point in time and current attributes that get updated in place across all versions of the business key.
For example, product rows might have historical_category frozen per version but current_category updated across all versions when a product recategorization happens. This supports both "What category was this product in when sold?" and "Given today's category structure, how would we classify all historical sales?" questions.
This is powerful but confusing. Analysts must understand which columns are historical and which are current. ETL must update current columns across multiple rows when those attributes change. This pattern appears in advanced analytics where business users need to restate history under current taxonomies, common in financial reporting or organizational hierarchy changes.
Event Sourced Alternative:
Instead of maintaining SCD dimensions, some systems store all state changes as immutable events and reconstruct dimension state by replaying events. A customer_address_changed event at timestamp T becomes a version in your dimension by querying all events up to T.
This shifts complexity from write time to read time. Writes are simple append only inserts. Reads must aggregate events, which can be expensive but can be accelerated with materialized views or caches. This pattern fits well with event streaming architectures like those at Netflix but requires sophisticated query engines.
💡 Key Takeaways
•Type 3 keeps table size constant by storing only previous value in additional columns, suitable for narrow questions like recent plan downgrades but loses deep history
•Type 4 splits current and historical rows into separate tables, optimizing 80% of queries that only need current state but requiring synchronized writes across two tables
•Type 6 hybrid stores both historical attributes frozen per version and current attributes updated across all versions, supporting restated history under current taxonomies
•Event sourced alternative makes writes simple with append only events but shifts complexity to reads that must aggregate events, requiring materialized views for performance
•Choice depends on query patterns: if most queries need only current state, Type 4 or Type 3 reduce scan costs; if deep temporal analysis is common, pure Type 2 is simpler
📌 Examples
Type 3 use case: subscription service tracks current_plan, previous_plan, and plan_change_date to power retention dashboards for recent downgrades
Type 4 at scale: Dimension with 100 million current customers and 500 million historical versions, 80% of queries hit only the 100 million row current table
Type 6 financial example: Product has historical_category frozen per sale and current_category updated globally, enabling 'Restate last year sales under new category structure' reports
Event sourced pattern: Customer dimension materialized view aggregates address_changed, tier_upgraded, and profile_updated events, refreshed every 5 minutes for near real time queries