Database DesignNormalization vs DenormalizationEasy⏱️ ~2 min

Normalization vs Denormalization: Core Definitions and Trade-offs

Normalization organizes data into minimal, logically independent entities linked by keys so each fact is stored exactly once. Think of it as breaking data into clean, separate tables: users, posts, comments, each with its own ID. When you need to display a post with author info, you join them. This optimizes for correctness and write efficiency because updates touch one place. Change a username? Update one row in the users table and it reflects everywhere. Denormalization takes the opposite approach: duplicate or embed the subset of fields a query needs directly where that query reads. Instead of joining three tables to show a feed item, you store author name, post text, and comment count together in a precomputed feed row. This eliminates joins at request time, turning what would be multiple network calls into a single lookup. The fundamental trade-off is read latency versus write complexity. Normalized models keep writes cheap (typically under 5 milliseconds for a single row update) but reads may require multiple joins. If each cross shard lookup adds 5 to 10 milliseconds, five joins can push your p50 latency to 25 to 50 milliseconds and p99 above 200 milliseconds under load. Denormalized reads are often single digit milliseconds from cache, but writes fan out. If a user has 300 followers, one post triggers 300 feed row writes. In production, most large systems mix both. The normalized store is the source of truth; denormalized projections are products of truth you regenerate via change streams. Meta maintains normalized social graph entities but serves feeds from denormalized per user timelines. Pinterest stores normalized pins and boards but materializes per user homefeed indexes to avoid joins on the hot read path. The choice depends on measured access patterns, latency Service Level Objectives (SLOs), and acceptable staleness budgets.
💡 Key Takeaways
Normalization stores each fact once, optimizes for write efficiency (single 5ms update) and correctness, but reads requiring joins across shards can add 5 to 10 milliseconds per hop, pushing p99 latency above 200 milliseconds
Denormalization duplicates data along access paths to eliminate joins, achieving single digit millisecond read latency from cache, but write amplification scales with fan-out (300 followers means 300 writes per post)
Storage cost multiplier for denormalization is typically 3 to 10 times the normalized size once you include replicas and indexes; at $20 to $50 per terabyte per month for SSD, a 50 terabyte denormalized store costs $1,000 to $2,500 monthly per replica
Production systems treat normalized data as source of truth and denormalized projections as derived products regenerated via change data capture streams with eventual consistency
The decision hinges on access pattern measurements: normalize for write heavy workloads with strict invariants (financial transactions), denormalize for read heavy endpoints with tight latency SLOs (feeds serving 90% plus reads with p99 under 1 second)
📌 Examples
Meta social graph: normalized users, posts, edges as source of truth; denormalized per user feed rows with embedded author name, post snippet, ranking features to serve 250 million users with 500 items each (25 terabytes per replica) achieving p50 under 100 to 200 milliseconds
Pinterest homefeed: normalized pin, board, user entities; denormalized homefeed index per user with precomputed ranking features serves 400 million monthly active users with 72 billion daily reads at 98% cache hit rate, keeping origin queries under 14,000 per second globally
← Back to Normalization vs Denormalization Overview
Normalization vs Denormalization: Core Definitions and Trade-offs | Normalization vs Denormalization - System Overflow