Distributed Systems Primitives • Distributed Transactions (2PC, Saga)Easy⏱️ ~3 min
What Are Distributed Transactions and Why Are They Hard?
Definition
A distributed transaction spans multiple databases, services, or partitions while guaranteeing atomicity: either all operations succeed together or all fail together. The challenge is achieving this guarantee when any participant can crash independently and network partitions can occur mid-transaction.
💡 Key Takeaways
✓Commit latency equals approximately two network round-trips plus disk flush time at the slowest participant; cross-region deployments typically see 100+ milliseconds end to end latency due to 50 to 150 milliseconds inter-region round-trip times.
✓Blocking failure mode: coordinator crash after prepare leaves participants in-doubt, holding locks indefinitely until recovery or timeout policies fire, causing cascading latency and reduced availability.
✓Overall availability is the product of participant availabilities; two services at 99.5% each yield roughly 99.0% combined, making 2PC unsuitable for systems requiring five nines availability.
✓Google Spanner uses 2PC with Paxos replication and TrueTime to achieve global consistency with typical single-region commits in tens of milliseconds, demonstrating viability when combined with quorum replication and synchronized clocks.
✓Tail latency amplification: with N participants, the 99th percentile commit time is bounded by the slowest participant's disk flush and network delays, often reaching hundreds of milliseconds across zones or regions.
✓Best suited for few participants, short lived operations in single-region or low-latency environments where invariants cannot be violated even temporarily, such as financial ledger transfers.
📌 Interview Tips
1Google Spanner executing a global 2PC: within a single region, commits take tens of milliseconds; multi-region commits add 50 to 150 milliseconds inter-region round-trip time plus a few milliseconds commit wait for TrueTime uncertainty, totaling 100+ milliseconds for cross-continental transactions.
2Amazon DynamoDB Transactions: regional ACID writes across up to dozens of items, adding single-digit to tens of milliseconds under normal load, but rising significantly under hot partition contention; no cross-region transactional atomicity is provided.
3Bank ledger transfer: moving $100 from account A (database X) to account B (database Y) using 2PC ensures either both debits and credits succeed or neither does, preventing money creation or loss even if network or process failures occur mid-transaction.