Series: System Design · Distributed Systems — Pillar 8 of 8

Systems Design

#	Post	What it covers
00	Distributed Systems: What Happens When Machines Disagree	Twenty concepts covering network partitions, consensus, clocks, distributed transactions, CDC, erasure coding, and observability. The final pillar.
01	Network Partitions: The Failure Mode You Can't Design Away	Network partitions are inevitable. Learn what happens when nodes can't communicate, how systems choose between availability and consistency, and what that means in practice.
02	Split-Brain: When Two Nodes Both Think They're the Leader	Split-brain occurs when two nodes both believe they're the primary. Learn how it happens, why it causes data corruption, and how STONITH and fencing prevent it.
03	Heartbeats: How Nodes Know Their Peers Are Alive	Heartbeats let nodes detect peer failures. Learn how timeouts, phi accrual failure detectors, and the tradeoff between false positives and detection speed work.
04	Leader Election: Agreeing on Who's in Charge	Leader election coordinates which node acts as primary. Learn the bully algorithm, Raft-based election, and why exactly-one-leader guarantees are hard to achieve.
05	Consensus Algorithms: Agreeing on a Value Across Failures	Consensus lets distributed nodes agree on a value despite failures. Learn what FLP impossibility means, what Paxos and Raft provide, and where consensus is used.
06	Quorum: How Many Nodes Must Agree?	Quorum determines how many nodes must agree for an operation to succeed. Learn how R + W > N ensures consistency in distributed databases like Cassandra and DynamoDB.
07	Paxos: The Algorithm That Started It All	Paxos is the foundational distributed consensus algorithm. Learn how its two phases work, why it's hard to implement, and what systems use it in production.
08	Raft: Consensus Made Understandable	Raft makes distributed consensus understandable. Learn how leader election, log replication, and safety work in the algorithm that powers etcd, CockroachDB, and TiKV.
09	Gossip Protocol: Decentralised Cluster Communication	Gossip protocols propagate information across a cluster without a central coordinator. Learn how epidemic spreading works and where it's used in production.
10	Logical Clocks: When Physical Time Isn't Enough	Physical clocks drift and can't establish event order in distributed systems. Logical clocks track causality instead. Learn why this matters and how it works.
11	Lamport Timestamps: Ordering Events Without a Global Clock	Lamport timestamps assign logical counters to events to establish causal order in distributed systems. Learn how they work and what they can and can't tell you.
12	Vector Clocks: Knowing When Events Are Truly Concurrent	Vector clocks detect causality and concurrency in distributed systems. Learn how they work, how Dynamo uses them for conflict detection, and their limitations.
13	Distributed Transactions: When One Machine Isn't Enough	Distributed transactions are hard. Learn why cross-service atomicity is expensive, when to use it, and when eventual consistency is the right alternative.
14	Two-Phase Commit: Coordinating a Distributed Decision	2PC ensures distributed atomicity through prepare and commit phases. Learn how it works, the coordinator failure problem, and why it's rarely used in modern systems.
15	Three-Phase Commit: Solving 2PC's Blocking Problem ← you are here	3PC adds a pre-commit phase to eliminate 2PC's blocking problem. Learn how it works, what assumptions it requires, and why it's rarely used in production.
16	Delivery Semantics: What Does "Delivered" Actually Mean?	Message delivery guarantees define system reliability. Learn what at-most-once, at-least-once, and exactly-once mean, what they cost, and when each is appropriate.
17	Change Data Capture: Streaming Your Database in Real Time	CDC streams database changes in real time by reading the write-ahead log. Learn how Debezium works, what CDC enables, and when to use it.
18	Erasure Coding: Fault Tolerance Without Full Replication	Erasure coding stores data across nodes using math, not full replication. Learn how Reed-Solomon works, how S3 uses it, and when it beats 3x replication.
19	Merkle Trees: Efficiently Finding What's Different	Merkle trees efficiently detect which parts of a large dataset differ between nodes. Learn how Bitcoin, Cassandra, and Git use them for verification and anti-entropy.
20	Observability: Understanding Your System at Runtime	Logs, metrics, and distributed traces are how you understand a system at runtime. Learn what each pillar provides, the tools involved, and how they work together.
21	Distributed Systems: Wrap-Up	A recap of all 20 distributed systems concepts and the complete URL shortener architecture spanning all 8 pillars. The final post in the series.

Three-Phase Commit: Solving 2PC's Blocking Problem

The problem

Two-Phase Commit blocks when the coordinator crashes after participants vote Yes but before sending the commit message. Participants hold locks indefinitely, waiting for a coordinator that might take minutes to restart.

The root cause: in 2PC, once a participant votes Yes, it has no way to make a safe independent decision. It doesn't know whether the coordinator committed or aborted — and different independent decisions by different participants would violate atomicity.

Three-Phase Commit (3PC) adds an intermediate phase that gives participants enough information to make a safe independent decision if the coordinator fails — eliminating the blocking problem.

The core idea

3PC adds a pre-commit phase between prepare and commit. After collecting all Yes votes, the coordinator sends a Pre-Commit message before the final Commit. This gives participants two crucial properties: (1) they know all other participants have voted Yes, and (2) if the coordinator fails, they can safely complete the commit independently without risking disagreement.

The analogy: the wedding with a dress rehearsal

2PC: the officiant asks "do you?" privately, collects answers, then announces the result. Problem: if the officiant collapses mid-announcement, parties don't know if they're married.

3PC: adds a rehearsal. Before the public ceremony, the officiant gathers both parties and says "I've confirmed you both said yes, and we're about to proceed — are you both still ready?" Only if both confirm does the final ceremony happen. Now if the officiant collapses mid-ceremony, either party knows: (1) we both confirmed readiness, (2) we can safely complete the ceremony ourselves.

The pre-commit is the rehearsal confirmation.

How 3PC works

Phase 1: Prepare (same as 2PC)

Coordinator sends Prepare to all participants. Each votes Yes or No.

If any votes No → Coordinator sends Abort to all (same as 2PC).

Phase 2: Pre-Commit (new phase)

If all voted Yes → Coordinator sends Pre-Commit to all participants.

Each participant that receives Pre-Commit:

Records it durably (survives crash)
Acknowledges to coordinator
Now knows: all participants voted Yes

The key property: a participant in Pre-Commit state knows it's safe to commit if the coordinator fails — because it knows every other participant also voted Yes.

Phase 3: Commit

Coordinator receives acknowledgements from all participants → sends Commit.

Each participant commits and releases locks.

Handling coordinator failure

Coordinator fails in Phase 1 (before Pre-Commit): participants that haven't received Pre-Commit can safely abort — no Pre-Commit means not all votes were Yes, so aborting is safe.

Coordinator fails in Phase 2 (after Pre-Commit to some, before all):

Participants that received Pre-Commit know all voted Yes → they elect a new coordinator from among themselves and complete the commit
Participants that didn't receive Pre-Commit abort
Wait — these decisions must agree! They don't, if the network partitioned the Pre-Commit messages inconsistently

This is where 3PC's limitation emerges.

The key limitation: network partitions

3PC assumes fail-stop failures (nodes crash cleanly) and no network partitions. In reality, network partitions are common (post 01). A partition can cause 3PC to violate atomicity:

Coordinator sends Pre-Commit to A and B, but not C (partition)
Coordinator then crashes

A and B elect a new coordinator among themselves
Both are in Pre-Commit state → they commit
C times out in Prepared state → aborts

Result: A and B committed; C aborted → INCONSISTENT

Under a network partition, 3PC is unsafe: participants that received Pre-Commit commit, while participants that didn't abort — and they're making these independent decisions without knowing about each other's state.

This is why 3PC is almost never used in practice. The real world has network partitions. Under partitions, 3PC provides atomicity guarantees that are weaker than 2PC's (which at least blocks rather than making inconsistent decisions).

Why 3PC is rarely used

Network partitions make it unsafe. As described above — partition during Phase 2 can cause inconsistent commit/abort decisions.

Paxos-based commit is better. A coordinator that uses consensus (Paxos/Raft) to durably record its decision before sending it to participants solves the blocking problem correctly, even under partitions. This is what CockroachDB and Spanner do — they don't implement 3PC.

Saga patterns avoid the problem entirely. Most microservices don't need distributed atomicity at all if the system is designed with Sagas.

The one thing to remember

3PC eliminates 2PC's blocking problem by adding a pre-commit phase that tells participants "all have voted yes" before the final commit — allowing them to complete the commit independently if the coordinator fails. The protocol works under crash failures, but breaks under network partitions (the more realistic failure mode), making inconsistent commit/abort decisions when Pre-Commit is delivered to some participants but not others. This is why 3PC is a useful theoretical concept but is almost never implemented in production systems, which use consensus-based commit (CockroachDB, Spanner) or Saga patterns instead.

← Previous: Two-Phase Commit — the classical distributed transaction protocol in detail.

→ Next: Delivery Semantics — at-most-once, at-least-once, and exactly-once: what each guarantees and what it costs.

Three-Phase Commit: Solving 2PC's Blocking Problem

Systems Design

Three-Phase Commit: Solving 2PC's Blocking Problem

The problem

The core idea

The analogy: the wedding with a dress rehearsal