Database DesignChoosing Databases by Use CaseMedium⏱️ ~3 min

Database Selection Framework: Core Decision Factors

Definition
Choosing the right database requires evaluating four critical dimensions: data model, scale requirements, consistency needs, and operational complexity. This framework helps you move beyond technology hype to make data-driven decisions.
Data Model Analysis: First, analyze your data model. PostgreSQL excels when you need complex queries involving multiple tables, strong consistency, and relational integrity. MongoDB fits when your data is naturally hierarchical and access patterns favor document retrieval. Redis serves as the caching layer when you need sub millisecond access to frequently read data. Each database type has sweet spots that match specific data shapes. Scale Assessment: Second, assess scale requirements honestly. A single PostgreSQL instance handles 10,000 to 50,000 queries per second for typical Online Transaction Processing (OLTP) workloads. Many applications never exceed this. Horizontal scaling adds complexity; only pay that cost when single node capacity is genuinely insufficient. If your traffic projections show 100,000+ QPS or petabyte scale data, distributed systems become necessary. Total Cost of Ownership: Third, consider total cost of ownership. Managed services (Amazon RDS, Cloud SQL) trade higher dollar costs for lower operational burden. Self hosted databases require expertise in backup, replication, failover, and security. Team capabilities matter: a team experienced with PostgreSQL will be more productive than one learning Cassandra, even if Cassandra is theoretically better suited to the workload.
💡 Key Takeaways
Data model mismatch causes technical debt: forcing unstructured content into PostgreSQL requires expensive migrations later, while using MongoDB for financial transactions loses ACID guarantees you need
Performance numbers vary dramatically: Redis < 1ms latency versus PostgreSQL 5 to 50ms versus BigQuery 1 to 30 seconds, choosing wrong database adds latency you cannot optimize away
Consistency trade-offs are permanent: strong consistency adds 50ms+ for cross region coordination, eventual consistency risks showing stale data for seconds after writes, no configuration fixes this fundamental difference
Operational complexity scales exponentially: two databases require understanding their interaction patterns, five databases mean your team spends more time on database operations than feature development
Cloud managed services cost 2x to 3x more than self hosted but eliminate on call burden: Amazon Aurora costs $200 monthly versus $100 for self managed PostgreSQL on EC2, but includes automated backups, failover, and patching
📌 Interview Tips
1Uber evaluates databases per use case: MySQL for ride transactions (ACID required), Cassandra for trip history (high write throughput), Redis for real-time pricing (sub millisecond latency), Elasticsearch for location search (geospatial queries)
2Discord migrated from MongoDB to Cassandra when message history exceeded 100 million users: MongoDB sharding became operationally complex, Cassandra's write optimized architecture handles append only messages at petabyte scale
3Segment migrated from MongoDB to PostgreSQL despite scaling challenges because data integrity bugs from eventual consistency cost more than scaling effort, showing consistency requirements trump performance
← Back to Choosing Databases by Use Case Overview
Database Selection Framework: Core Decision Factors | Choosing Databases by Use Case - System Overflow