Learn→Data Modeling & Schema Design→Normalization vs Denormalization Trade-offs→3 of 5

Data Modeling & Schema Design • Normalization vs Denormalization Trade-offsMedium⏱️ ~3 min

The CQRS Pattern: Bridging Normalized Writes and Denormalized Reads

The Architecture Pattern:

Command Query Responsibility Segregation (CQRS) separates your write model from your read model. The write model is normalized and enforces all business invariants. It is the source of truth. The read model is a set of denormalized projections optimized for specific query patterns. You get the integrity benefits of normalization where it matters most (writes) and the performance benefits of denormalization where it matters most (reads).

Here is how it works in practice. Your order service writes to a normalized PostgreSQL database with tables for customers, orders, order_items, products, and inventory. Each write goes through strict validation and transactional logic. Simultaneously, the database emits change events to a streaming platform like Kafka through Change Data Capture. Stream processors consume these events and update denormalized read stores: a product_details cache in Redis, an orders_by_customer view in Cassandra, and a search index in Elasticsearch.

The Real World Flow:

Consider LinkedIn's member profiles. The canonical profile data (name, headline, experience, skills) lives in a normalized relational store. Updates go through this write path with full validation. But when you view someone's profile or search for members, those queries hit denormalized stores. The profile view service reads from a key-value store with embedded, prejoined data. The search service queries an Elasticsearch index with flattened member documents. Both are kept in sync with the normalized source through streaming pipelines.

This architecture enables LinkedIn to handle millions of profile reads per second with p95 latencies under 100 milliseconds, while maintaining strong consistency for writes. The streaming layer processes updates with typical lag under 5 seconds, though it can spike to minutes during heavy load or partial outages.

⚠️ Common Pitfall: Teams often underestimate the operational complexity of CQRS. You now have multiple databases, streaming pipelines, schema evolution across stores, and lag monitoring. This pays off at scale but adds significant overhead for smaller systems.
Implementation Details:

Successful CQRS implementations monitor replication lag religiously. Metrics like "percentage of read models updated within 60 seconds" or "maximum CDC pipeline delay" are critical SLOs. When lag exceeds thresholds, systems may fall back to the normalized store for reads, trading latency for correctness, or show users a "data may be slightly stale" indicator.

The write side remains simple. A single normalized database with ACID transactions. The complexity lives in the read side and the pipelines connecting them. Stream processors must handle idempotent upserts, out of order events, schema changes, and backfills when new read models are introduced. Tools like Debezium for CDC, Kafka for streaming, and Flink or Kafka Streams for transformation are common.

When to Use CQRS:

CQRS makes sense when your read and write workloads have fundamentally different characteristics. If writes are infrequent but require strong guarantees, while reads are high volume and latency sensitive, CQRS gives you the best of both worlds. It is overkill for simple CRUD applications where a normalized schema with caching is sufficient.

💡 Key Takeaways

✓Separates normalized write model (source of truth with ACID guarantees) from denormalized read models (optimized for specific query patterns and high throughput)

✓Uses Change Data Capture and streaming platforms like Kafka to propagate changes from write to read stores, typically with lag under 5 seconds in healthy systems

✓Enables handling millions of reads per second with p95 latencies under 100 milliseconds while keeping writes fast and transactional at 10,000+ operations per second

✓Requires operational investment in monitoring replication lag, handling schema evolution across multiple stores, and managing idempotent stream processing logic

✓Best suited for systems with asymmetric workloads where read and write patterns differ significantly in volume, latency requirements, or access patterns

📌 Interview Tips

1LinkedIn member profiles: Normalized relational store for writes with full validation. Denormalized key-value store and Elasticsearch index for reads. Updates flow through Kafka with sub 5 second lag.

2Ecommerce order history: Write orders to normalized PostgreSQL. Stream changes to Cassandra table keyed by customer_id with embedded order and product details. Serve 'my orders' page in 20ms without joins.

3Financial transactions: Append to immutable transaction log (normalized). Materialize account balances and analytics aggregates in denormalized stores for dashboards and reporting.

← Back to Normalization vs Denormalization Trade-offs Overview