# Distributed Transactions: When One Machine Isn't Enough

> **Series:** System Design · Distributed Systems — Pillar 8 of 8

## Systems Design

| # | Post | What it covers |
|---|------|----------------|
| 00 | [Distributed Systems: What Happens When Machines Disagree](/distributed-systems-what-happens-when-machines-disagree) | Twenty concepts covering network partitions, consensus, clocks, distributed transactions, CDC, erasure coding, and observability. The final pillar. |
| 01 | [Network Partitions: The Failure Mode You Can't Design Away](/network-partitions-the-failure-mode-you-cant-design-away) | Network partitions are inevitable. Learn what happens when nodes can't communicate, how systems choose between availability and consistency, and what that means in practice. |
| 02 | [Split-Brain: When Two Nodes Both Think They're the Leader](/split-brain-when-two-nodes-both-think-theyre-the-leader) | Split-brain occurs when two nodes both believe they're the primary. Learn how it happens, why it causes data corruption, and how STONITH and fencing prevent it. |
| 03 | [Heartbeats: How Nodes Know Their Peers Are Alive](/heartbeats-how-nodes-know-their-peers-are-alive) | Heartbeats let nodes detect peer failures. Learn how timeouts, phi accrual failure detectors, and the tradeoff between false positives and detection speed work. |
| 04 | [Leader Election: Agreeing on Who's in Charge](/leader-election-agreeing-on-whos-in-charge) | Leader election coordinates which node acts as primary. Learn the bully algorithm, Raft-based election, and why exactly-one-leader guarantees are hard to achieve. |
| 05 | [Consensus Algorithms: Agreeing on a Value Across Failures](/consensus-algorithms-agreeing-on-a-value-across-failures) | Consensus lets distributed nodes agree on a value despite failures. Learn what FLP impossibility means, what Paxos and Raft provide, and where consensus is used. |
| 06 | [Quorum: How Many Nodes Must Agree?](/quorum-how-many-nodes-must-agree) | Quorum determines how many nodes must agree for an operation to succeed. Learn how R + W > N ensures consistency in distributed databases like Cassandra and DynamoDB. |
| 07 | [Paxos: The Algorithm That Started It All](/paxos-the-algorithm-that-started-it-all) | Paxos is the foundational distributed consensus algorithm. Learn how its two phases work, why it's hard to implement, and what systems use it in production. |
| 08 | [Raft: Consensus Made Understandable](/raft-consensus-made-understandable) | Raft makes distributed consensus understandable. Learn how leader election, log replication, and safety work in the algorithm that powers etcd, CockroachDB, and TiKV. |
| 09 | [Gossip Protocol: Decentralised Cluster Communication](/gossip-protocol-decentralised-cluster-communication) | Gossip protocols propagate information across a cluster without a central coordinator. Learn how epidemic spreading works and where it's used in production. |
| 10 | [Logical Clocks: When Physical Time Isn't Enough](/logical-clocks-when-physical-time-isnt-enough) | Physical clocks drift and can't establish event order in distributed systems. Logical clocks track causality instead. Learn why this matters and how it works. |
| 11 | [Lamport Timestamps: Ordering Events Without a Global Clock](/lamport-timestamps-ordering-events-without-a-global-clock) | Lamport timestamps assign logical counters to events to establish causal order in distributed systems. Learn how they work and what they can and can't tell you. |
| 12 | [Vector Clocks: Knowing When Events Are Truly Concurrent](/vector-clocks-knowing-when-events-are-truly-concurrent) | Vector clocks detect causality and concurrency in distributed systems. Learn how they work, how Dynamo uses them for conflict detection, and their limitations. |
| 13 | **Distributed Transactions: When One Machine Isn't Enough** ← you are here | Distributed transactions are hard. Learn why cross-service atomicity is expensive, when to use it, and when eventual consistency is the right alternative. |
| 14 | [Two-Phase Commit: Coordinating a Distributed Decision](/two-phase-commit-coordinating-a-distributed-decision) | 2PC ensures distributed atomicity through prepare and commit phases. Learn how it works, the coordinator failure problem, and why it's rarely used in modern systems. |
| 15 | [Three-Phase Commit: Solving 2PC's Blocking Problem](/three-phase-commit-solving-2pcs-blocking-problem) | 3PC adds a pre-commit phase to eliminate 2PC's blocking problem. Learn how it works, what assumptions it requires, and why it's rarely used in production. |
| 16 | [Delivery Semantics: What Does "Delivered" Actually Mean?](/delivery-semantics-what-does-delivered-actually-mean) | Message delivery guarantees define system reliability. Learn what at-most-once, at-least-once, and exactly-once mean, what they cost, and when each is appropriate. |
| 17 | [Change Data Capture: Streaming Your Database in Real Time](/change-data-capture-streaming-your-database-in-real-time) | CDC streams database changes in real time by reading the write-ahead log. Learn how Debezium works, what CDC enables, and when to use it. |
| 18 | [Erasure Coding: Fault Tolerance Without Full Replication](/erasure-coding-fault-tolerance-without-full-replication) | Erasure coding stores data across nodes using math, not full replication. Learn how Reed-Solomon works, how S3 uses it, and when it beats 3x replication. |
| 19 | [Merkle Trees: Efficiently Finding What's Different](/merkle-trees-efficiently-finding-whats-different) | Merkle trees efficiently detect which parts of a large dataset differ between nodes. Learn how Bitcoin, Cassandra, and Git use them for verification and anti-entropy. |
| 20 | [Observability: Understanding Your System at Runtime](/observability-understanding-your-system-at-runtime) | Logs, metrics, and distributed traces are how you understand a system at runtime. Learn what each pillar provides, the tools involved, and how they work together. |
| 21 | [Distributed Systems: Wrap-Up](/distributed-systems-wrap-up) | A recap of all 20 distributed systems concepts and the complete URL shortener architecture spanning all 8 pillars. The final post in the series. |

---

# Distributed Transactions: When One Machine Isn't Enough

## The problem

A user upgrades their URL shortener plan from Free to Pro. This must atomically:
1. Charge their credit card (Stripe API)
2. Update their plan in the User Service database
3. Increase their link quota in the Link Service database

In a single-database monolith, this is one transaction — either all three succeed or all three roll back. The database guarantees atomicity.

In a microservices architecture with three separate systems (Stripe, User DB, Link DB), there is no single transaction manager. If step 2 succeeds and step 3 fails, the user has paid for Pro but their quota wasn't increased. If step 1 succeeds and step 2 fails, the user was charged but their account shows Free tier. Both are bad.

This is the distributed transaction problem: ensuring atomicity (all-or-nothing) across multiple independent data stores or services.

---

## The core idea

A distributed transaction provides ACID guarantees across multiple nodes or services. It's much harder than a local transaction because it requires coordinating the commit or abort decision across participants that can each independently fail or become unreachable. The fundamental difficulty: how do you commit atomically when the network might drop your commit message?

---

## The analogy: signing a contract with multiple parties

A business deal requires signatures from Company A, Company B, and a lawyer as witness. The rule: the deal is finalised only when all three sign; if any refuses or is unreachable, the deal is null and void.

In person, this is easy — everyone signs in the same room simultaneously. Over distance, it's hard: Company A signs and sends to Company B, which signs and sends to the lawyer. What if Company B's courier is delayed? Company A has signed; Company B hasn't. The deal is in limbo. Who decides whether to proceed?

A distributed transaction coordinator is the resolution: it collects "ready to sign" confirmations from all parties before telling anyone to finalise.

---

## When distributed transactions are needed vs avoidable

**Genuinely needed:**
- Financial operations that must be consistent across multiple stores (charge + grant access must be atomic)
- Cross-service invariants that must be maintained (inventory count in one service and order record in another)
- Database migrations that span shards (moving a record from shard A to shard B must not leave it in both or neither)

**Avoidable with eventual consistency:**
- Most microservices communication (Saga pattern + Outbox — Pillar 7, posts 09–10)
- Analytics, notifications, search index updates
- Any operation where "we'll fix inconsistency later" is acceptable

**The key insight:** most operations that seem to require distributed transactions actually don't, if you're willing to accept brief eventual inconsistency and design idempotent recovery. The Saga pattern (Pillar 7, post 09) handles most cross-service coordination without distributed transactions.

---

## The approaches

### 2PC (Two-Phase Commit)

The classical protocol. A coordinator node:
1. **Prepare phase:** asks all participants "can you commit?" Each votes yes or no and locks its resources
2. **Commit phase:** if all vote yes, coordinator sends commit to all; if any votes no, sends abort to all

Guarantees atomicity. Problems: blocking on coordinator failure (covered in post 14), lock contention (participants hold locks during the prepare phase, waiting for the coordinator's decision).

### Saga pattern (Pillar 7, post 09)

Not a true distributed transaction — uses local transactions with compensating actions. Eventual consistency: the system may be inconsistent for a brief period. The right choice for most microservices cross-service operations.

### Distributed databases with native support (CockroachDB, Spanner)

CockroachDB and Google Spanner provide distributed ACID transactions natively. They use consensus (Raft/Paxos) to ensure all shards agree on the transaction outcome. This is the correct tool when you need true distributed atomicity without implementing it yourself.

CockroachDB executes transactions across multiple shards using a protocol similar to 2PC, but with Raft providing the durability guarantee at each shard — eliminating the "coordinator crashes, transaction is stuck" problem of classic 2PC.

### XA transactions (distributed database standard)

XA is a standard for distributed transactions used by traditional enterprise databases. Supported by MySQL, PostgreSQL, Oracle. Allows transactions that span multiple database servers with a two-phase commit.

In practice, XA is rarely used in modern microservices — the coordination overhead is high and it requires all participants to be XA-aware. Saga patterns are almost universally preferred.

---

## The real cost of distributed transactions

**Lock contention.** In 2PC, all participants hold locks on affected rows from the prepare phase until the commit message arrives. If the coordinator is slow or crashes, locks are held indefinitely — blocking all other reads and writes to those rows.

**Coordinator availability.** The transaction coordinator is a single point of failure. If it crashes after participants voted "yes" but before sending commit, the transaction is stuck in "prepared" state until the coordinator recovers. This is the 2PC blocking problem (covered in post 14).

**Latency.** At minimum, a distributed transaction requires two round trips (prepare + commit) between the coordinator and all participants. For cross-region transactions (London, Singapore, São Paulo), this is 3 × cross-continent RTT = 300–600ms per transaction.

**Operational complexity.** Stuck transactions ("in-doubt" transactions) require manual intervention or automated recovery. Every DBA who has worked with distributed databases has encountered stuck 2PC transactions that had to be manually resolved.

---

## The one thing to remember

> **Distributed transactions are expensive and fragile: they require coordination across multiple independent systems, hold locks during the coordination period, and block if the coordinator fails.** Most cross-service operations that appear to need distributed transactions can be handled with the Saga pattern (eventual consistency with compensating actions) at lower cost and higher availability. Reserve distributed transactions for cases where eventual consistency genuinely isn't acceptable — and prefer databases (CockroachDB, Spanner) that implement them natively over rolling your own 2PC coordinator.

---

*← Previous: **[Vector Clocks](/vector-clocks-knowing-when-events-are-truly-concurrent)** — extending Lamport timestamps to detect concurrency: when two events are causally independent, vector clocks make that explicit.*

*→ Next: **[Two-Phase Commit](/two-phase-commit-coordinating-a-distributed-decision)** — the classical distributed transaction protocol in detail.*

