Database Migration Strategies and Avoiding Common Anti Patterns

Migration Risk and Strategy:

Database migrations are high risk operations that can cause extended outages if executed poorly. Understanding proven migration patterns and recognizing anti patterns prevents costly mistakes that plague production systems.

Strangler Pattern:

The strangler pattern gradually migrates traffic from old to new database, minimizing risk. Discord migrated from MongoDB to ScyllaDB (Cassandra compatible) for message storage serving 100 million users by first dual writing new messages to both databases while keeping reads on MongoDB, then slowly shifting read traffic to ScyllaDB partition by partition over 3 months, finally backfilling historical messages. This approach allowed validating data consistency and performance at each step, rolling back if issues emerged. The alternative big bang migration (switching everything overnight) works only for small datasets or during extended maintenance windows that large scale services cannot afford.

Change Data Capture (CDC):

Change Data Capture (CDC) tools like Debezium stream database changes in real time during migration. Segment migrated from MongoDB to PostgreSQL for data integrity using CDC to replicate every write from MongoDB to PostgreSQL, running both databases in parallel for 2 weeks comparing results. When query results matched 99.99%, they cut over to PostgreSQL. CDC adds operational complexity (running Kafka cluster for event streaming) but provides safety net for testing new database under production load before commitment.

Common Anti-Patterns:

Common anti patterns cause repeated failures. Using relational database for everything seems safe but eventually hits scaling limits: write throughput ceiling at 10,000 operations per second forces expensive vertical scaling or complex sharding logic that PostgreSQL does not handle natively. Opposite extreme, using NoSQL for everything, loses ACID guarantees when you need them. Segment learned this migrating from MongoDB to PostgreSQL because eventual consistency caused data integrity bugs worth more than scaling challenges. Over engineering for scale you do not have wastes resources: Cassandra makes no sense for 1,000 users and 100GB of data when managed PostgreSQL costs $50 monthly versus Cassandra cluster needing 3+ nodes at $300+ monthly just for basic redundancy.

💡 Key Takeaways

✓Migration risk increases with data volume and traffic: migrating 1TB with 10,000 QPS (queries per second) allows testing in shadow mode, migrating 100TB with 1 million QPS requires months of planning and incremental rollout to avoid outages

✓Dual write phase catches inconsistencies early: running old and new databases in parallel for 1 to 4 weeks exposes edge cases like different transaction semantics or query result ordering before users are affected

✓Rollback strategy is mandatory: every migration needs instant fallback to old database if new system fails, requiring keeping old database operational until new one proves stable over weeks, effectively doubling infrastructure cost during transition

✓Anti pattern timing matters: using relational database for everything works until 100,000 writes per second ceiling hits (typical PostgreSQL limit), then migration takes 6+ months while scaling problems compound, better to plan migration at 50,000 writes per second

✓Over engineering costs compound: startup choosing Cassandra for 1,000 users wastes $3,000+ monthly on unnecessary 3 node cluster versus $200 for managed PostgreSQL, plus months of team learning curve that could build features instead

📌 Interview Tips

1GitHub migrated from MySQL to Vitess (MySQL sharding layer) using strangler pattern: added Vitess proxy in front of MySQL routing some tables to new shards, gradually moved tables over 18 months, maintained ability to rollback any table instantly if issues emerged

2Uber migrated from PostgreSQL to MySQL cluster by cluster: kept PostgreSQL for some regions while testing MySQL in others, ran both databases simultaneously for 6 months comparing query performance and data consistency before full migration

3Segment anti pattern example: used MongoDB for analytics data pipeline because schema flexibility seemed beneficial, eventual consistency caused data integrity bugs costing more than PostgreSQL scaling effort, spent 4 months migrating to PostgreSQL learning ACID matters for financial data

← Back to Choosing Databases by Use Case Overview