Distributed Systems PrimitivesDistributed Transactions (2PC, Saga)Easy⏱️ ~3 min

What Are Distributed Transactions and Why Are They Hard?

Definition
A distributed transaction spans multiple databases, services, or partitions while guaranteeing atomicity: either all operations succeed together or all fail together. The challenge is achieving this guarantee when any participant can crash independently and network partitions can occur mid-transaction.
Why Distributed Transactions Are Hard: In a single database, transactions use local locks and a write-ahead log to guarantee atomicity. If the process crashes, recovery replays the log. But when a transaction spans multiple independent systems, there is no shared log or lock manager. Each participant can fail independently, and the network can partition between any two nodes. The fundamental problem: how do you ensure all participants commit or all abort when you cannot trust that messages will arrive or that nodes will stay alive? Two-Phase Commit (2PC): The Classic Solution Two-Phase Commit solves this by using a coordinator to orchestrate two rounds of communication. In the prepare phase, the coordinator asks each participant to vote on whether they can commit. Participants write their vote to stable storage (0.5 to 5 milliseconds on SSDs) and acquire all necessary locks. If all vote yes, the coordinator enters the commit phase and instructs everyone to finalize. Otherwise, it instructs an abort. This provides full ACID atomicity across distributed resources. The Blocking Problem: The major downside is that 2PC is a blocking protocol. If the coordinator crashes after participants have prepared but before issuing the commit decision, participants remain in an in-doubt state, holding locks and waiting for recovery. Latency depends on the slowest participant: single region commits typically take tens of milliseconds, but cross-region scenarios (50 to 150 milliseconds round-trip) push commit latency above 100 milliseconds. Google Spanner demonstrates 2PC at planetary scale by combining it with Paxos replicated participants and TrueTime for bounded clock uncertainty.
💡 Key Takeaways
Commit latency equals approximately two network round-trips plus disk flush time at the slowest participant; cross-region deployments typically see 100+ milliseconds end to end latency due to 50 to 150 milliseconds inter-region round-trip times.
Blocking failure mode: coordinator crash after prepare leaves participants in-doubt, holding locks indefinitely until recovery or timeout policies fire, causing cascading latency and reduced availability.
Overall availability is the product of participant availabilities; two services at 99.5% each yield roughly 99.0% combined, making 2PC unsuitable for systems requiring five nines availability.
Google Spanner uses 2PC with Paxos replication and TrueTime to achieve global consistency with typical single-region commits in tens of milliseconds, demonstrating viability when combined with quorum replication and synchronized clocks.
Tail latency amplification: with N participants, the 99th percentile commit time is bounded by the slowest participant's disk flush and network delays, often reaching hundreds of milliseconds across zones or regions.
Best suited for few participants, short lived operations in single-region or low-latency environments where invariants cannot be violated even temporarily, such as financial ledger transfers.
📌 Interview Tips
1Google Spanner executing a global 2PC: within a single region, commits take tens of milliseconds; multi-region commits add 50 to 150 milliseconds inter-region round-trip time plus a few milliseconds commit wait for TrueTime uncertainty, totaling 100+ milliseconds for cross-continental transactions.
2Amazon DynamoDB Transactions: regional ACID writes across up to dozens of items, adding single-digit to tens of milliseconds under normal load, but rising significantly under hot partition contention; no cross-region transactional atomicity is provided.
3Bank ledger transfer: moving $100 from account A (database X) to account B (database Y) using 2PC ensures either both debits and credits succeed or neither does, preventing money creation or loss even if network or process failures occur mid-transaction.
← Back to Distributed Transactions (2PC, Saga) Overview