Database Design • Choosing Databases by Use CaseHard⏱️ ~3 min
Polyglot Persistence: When and How to Use Multiple Databases
Most large scale systems use multiple databases optimized for different use cases, a pattern called polyglot persistence. The challenge is balancing performance optimization against operational complexity and maintaining data consistency across databases.
Netflix architecture demonstrates strategic polyglot persistence: Cassandra stores 2.5 trillion viewing history records daily (write optimized), MySQL handles billing and subscriptions (ACID required), Elasticsearch powers content search (full text queries), Redis caches user sessions (sub millisecond reads), and S3 stores video files (petabyte scale objects). Each database choice optimizes specific workloads, but Netflix employs 50+ engineers just for data infrastructure. The trade-off is clear: performance and scalability versus operational burden.
Data consistency across databases becomes the critical challenge. When a user cancels their Netflix subscription, that change must propagate to the billing database (MySQL), update cached session state (Redis), and potentially affect content recommendations (Cassandra). Netflix uses event driven architecture with Kafka: subscription changes publish events that multiple services consume, updating their respective databases. This eventual consistency means a canceled user might see content recommendations for 1 to 2 seconds until all systems synchronize. The alternative, distributed transactions across databases using two phase commit, would add 100+ milliseconds of latency and create a single point of failure.
Start with one database and add others only when clear performance or scale requirements justify operational complexity. Uber began with PostgreSQL for everything, added Redis when caching became critical for ride pricing latency, then added Cassandra when trip history exceeded PostgreSQL write throughput. Each addition required new monitoring, backup strategies, and team expertise. A startup using five databases from day one likely over engineered their solution and will spend more time on database operations than building features.
💡 Key Takeaways
•Operational complexity scales exponentially: two databases require understanding interaction patterns, five databases mean separate monitoring, backup, scaling strategies, and on call expertise for each technology
•Data consistency requires architectural patterns: event driven architecture via Kafka enables eventual consistency across databases in 1 to 2 seconds, two phase commit ensures immediate consistency but adds 100ms+ latency and failure coordination
•Cost of multiple databases includes hidden expenses: Netflix employs 50+ data infrastructure engineers, licensing fees for multiple systems, increased cloud egress between databases in different availability zones ($0.01 per GB adds up at petabyte scale)
•Migration timing matters: start with PostgreSQL plus Redis for caching, add specialized databases only when specific pain points emerge (write throughput ceiling, search quality issues), premature optimization wastes engineering time
•Each database requires separate disaster recovery: backing up PostgreSQL differs from Cassandra snapshots, coordinating point in time recovery across five databases during outage adds 30 to 60 minutes versus single database restore
📌 Examples
Uber evolution shows measured approach: started with PostgreSQL for all data (2010 to 2012), added Redis when real time pricing required sub 5ms cache lookups (2012), introduced Cassandra when trip history writes exceeded 100,000 per second (2014), each addition solved specific bottleneck
Airbnb polyglot architecture by feature: PostgreSQL for reservations and payments (ACID critical for money movement), MySQL for user profiles (familiar operational model), DynamoDB for session storage (global scale, managed service), S3 for photos (petabyte scale), Elasticsearch for location search
Pinterest changed course: initially used multiple databases (MySQL, Redis, Cassandra, HBase), consolidated to fewer systems (MySQL, Redis, S3) because operational complexity exceeded benefits, now focuses on scaling databases they understand deeply