# Vector Clocks: Knowing When Events Are Truly Concurrent

> **Series:** System Design · Distributed Systems — Pillar 8 of 8

## Systems Design

| # | Post | What it covers |
|---|------|----------------|
| 00 | [Distributed Systems: What Happens When Machines Disagree](/distributed-systems-what-happens-when-machines-disagree) | Twenty concepts covering network partitions, consensus, clocks, distributed transactions, CDC, erasure coding, and observability. The final pillar. |
| 01 | [Network Partitions: The Failure Mode You Can't Design Away](/network-partitions-the-failure-mode-you-cant-design-away) | Network partitions are inevitable. Learn what happens when nodes can't communicate, how systems choose between availability and consistency, and what that means in practice. |
| 02 | [Split-Brain: When Two Nodes Both Think They're the Leader](/split-brain-when-two-nodes-both-think-theyre-the-leader) | Split-brain occurs when two nodes both believe they're the primary. Learn how it happens, why it causes data corruption, and how STONITH and fencing prevent it. |
| 03 | [Heartbeats: How Nodes Know Their Peers Are Alive](/heartbeats-how-nodes-know-their-peers-are-alive) | Heartbeats let nodes detect peer failures. Learn how timeouts, phi accrual failure detectors, and the tradeoff between false positives and detection speed work. |
| 04 | [Leader Election: Agreeing on Who's in Charge](/leader-election-agreeing-on-whos-in-charge) | Leader election coordinates which node acts as primary. Learn the bully algorithm, Raft-based election, and why exactly-one-leader guarantees are hard to achieve. |
| 05 | [Consensus Algorithms: Agreeing on a Value Across Failures](/consensus-algorithms-agreeing-on-a-value-across-failures) | Consensus lets distributed nodes agree on a value despite failures. Learn what FLP impossibility means, what Paxos and Raft provide, and where consensus is used. |
| 06 | [Quorum: How Many Nodes Must Agree?](/quorum-how-many-nodes-must-agree) | Quorum determines how many nodes must agree for an operation to succeed. Learn how R + W > N ensures consistency in distributed databases like Cassandra and DynamoDB. |
| 07 | [Paxos: The Algorithm That Started It All](/paxos-the-algorithm-that-started-it-all) | Paxos is the foundational distributed consensus algorithm. Learn how its two phases work, why it's hard to implement, and what systems use it in production. |
| 08 | [Raft: Consensus Made Understandable](/raft-consensus-made-understandable) | Raft makes distributed consensus understandable. Learn how leader election, log replication, and safety work in the algorithm that powers etcd, CockroachDB, and TiKV. |
| 09 | [Gossip Protocol: Decentralised Cluster Communication](/gossip-protocol-decentralised-cluster-communication) | Gossip protocols propagate information across a cluster without a central coordinator. Learn how epidemic spreading works and where it's used in production. |
| 10 | [Logical Clocks: When Physical Time Isn't Enough](/logical-clocks-when-physical-time-isnt-enough) | Physical clocks drift and can't establish event order in distributed systems. Logical clocks track causality instead. Learn why this matters and how it works. |
| 11 | [Lamport Timestamps: Ordering Events Without a Global Clock](/lamport-timestamps-ordering-events-without-a-global-clock) | Lamport timestamps assign logical counters to events to establish causal order in distributed systems. Learn how they work and what they can and can't tell you. |
| 12 | **Vector Clocks: Knowing When Events Are Truly Concurrent** ← you are here | Vector clocks detect causality and concurrency in distributed systems. Learn how they work, how Dynamo uses them for conflict detection, and their limitations. |
| 13 | [Distributed Transactions: When One Machine Isn't Enough](/distributed-transactions-when-one-machine-isnt-enough) | Distributed transactions are hard. Learn why cross-service atomicity is expensive, when to use it, and when eventual consistency is the right alternative. |
| 14 | [Two-Phase Commit: Coordinating a Distributed Decision](/two-phase-commit-coordinating-a-distributed-decision) | 2PC ensures distributed atomicity through prepare and commit phases. Learn how it works, the coordinator failure problem, and why it's rarely used in modern systems. |
| 15 | [Three-Phase Commit: Solving 2PC's Blocking Problem](/three-phase-commit-solving-2pcs-blocking-problem) | 3PC adds a pre-commit phase to eliminate 2PC's blocking problem. Learn how it works, what assumptions it requires, and why it's rarely used in production. |
| 16 | [Delivery Semantics: What Does "Delivered" Actually Mean?](/delivery-semantics-what-does-delivered-actually-mean) | Message delivery guarantees define system reliability. Learn what at-most-once, at-least-once, and exactly-once mean, what they cost, and when each is appropriate. |
| 17 | [Change Data Capture: Streaming Your Database in Real Time](/change-data-capture-streaming-your-database-in-real-time) | CDC streams database changes in real time by reading the write-ahead log. Learn how Debezium works, what CDC enables, and when to use it. |
| 18 | [Erasure Coding: Fault Tolerance Without Full Replication](/erasure-coding-fault-tolerance-without-full-replication) | Erasure coding stores data across nodes using math, not full replication. Learn how Reed-Solomon works, how S3 uses it, and when it beats 3x replication. |
| 19 | [Merkle Trees: Efficiently Finding What's Different](/merkle-trees-efficiently-finding-whats-different) | Merkle trees efficiently detect which parts of a large dataset differ between nodes. Learn how Bitcoin, Cassandra, and Git use them for verification and anti-entropy. |
| 20 | [Observability: Understanding Your System at Runtime](/observability-understanding-your-system-at-runtime) | Logs, metrics, and distributed traces are how you understand a system at runtime. Learn what each pillar provides, the tools involved, and how they work together. |
| 21 | [Distributed Systems: Wrap-Up](/distributed-systems-wrap-up) | A recap of all 20 distributed systems concepts and the complete URL shortener architecture spanning all 8 pillars. The final post in the series. |

---

# Vector Clocks: Knowing When Events Are Truly Concurrent

## The problem

Lamport timestamps tell you: if A → B, then ts(A) < ts(B). What they don't tell you: if ts(A) < ts(B), does that mean A caused B, or were they concurrent?

Consider two users simultaneously editing the same link's destination URL on different servers. Lamport timestamps will order them (one will have a lower timestamp than the other), but this ordering is arbitrary — neither write caused the other. They're concurrent. Silently picking the "higher timestamp" write and discarding the other loses data without the application knowing a conflict occurred.

Vector clocks solve this: they can distinguish between "A happened-before B" and "A and B are concurrent and both happened independently." This distinction is the basis of conflict detection in distributed databases.

---

## The core idea

A vector clock is a list of counters, one per process, that tracks each process's knowledge of every other process's logical time. Comparing two vector clocks tells you definitively: A happened-before B, B happened-before A, or A and B are concurrent.

---

## The analogy: tracking who heard what from whom

Three gossips — Alice, Bob, and Carol — each keep a tally of how many things each person has told them:

- Alice's tally: [Alice: 3, Bob: 2, Carol: 1] — "I've made 3 statements, heard 2 from Bob, 1 from Carol"
- Bob's tally: [Alice: 2, Bob: 4, Carol: 2]

When Alice shares her tally with Bob, Bob updates each entry to the maximum of their two tallies. Bob now knows everything Alice knows.

If Bob's tally for a dimension is higher than Alice's for the same dimension, Bob has information Alice doesn't. If both are lower in some dimensions, their states are incomparable — they've heard different things and both have newer information in some respects. That's concurrency.

---

## How vector clocks work

Each process `i` maintains a vector `V` of length N (one entry per process). `V[j]` = the number of events process `i` knows about from process `j`.

**On a local event at process i:** `V[i] += 1`

**On sending a message from process i:** `V[i] += 1`, send message with current `V`

**On receiving a message at process i with vector `W`:** `V[j] = max(V[j], W[j])` for all j, then `V[i] += 1`

### Comparing vector clocks

Vector clock A **happened-before** B (A → B) iff: `A[i] ≤ B[i]` for all i, and `A[j] < B[j]` for at least one j.

Vectors A and B are **concurrent** (A ‖ B) iff: neither A → B nor B → A. This means A has a higher counter in some dimension and B has a higher counter in another.

```
Process A    Process B    Process C
V_A=[0,0,0]  V_B=[0,0,0]  V_C=[0,0,0]

A has event: V_A=[1,0,0]
A sends to B with V=[1,0,0]

B receives:  V_B = max([0,0,0],[1,0,0]) + B++ = [1,1,0]
B has event: V_B=[1,2,0]

A has event (concurrent, no communication): V_A=[2,0,0]

Now compare V_A=[2,0,0] and V_B=[1,2,0]:
  V_A[0]=2 > V_B[0]=1 → A has something B doesn't
  V_A[1]=0 < V_B[1]=2 → B has something A doesn't
→ CONCURRENT: neither happened-before the other ✓
```

### Application: Dynamo and conflict detection

Amazon Dynamo (and Riak, which is an open-source Dynamo implementation) uses version vectors (a variant of vector clocks) for conflict detection on writes.

When a client reads a key, Dynamo returns the value along with its version vector (a "context"). When the client writes back, it includes this context. Dynamo uses the context to determine if the write is a successor to the current value (no conflict) or concurrent with it (conflict).

```
Initial: key="x7Kp2", value="https://old.com", VC=[A:1, B:0]

Client 1 reads (gets VC=[A:1, B:0]), updates to "https://v2.com"
Writes with context VC=[A:1, B:0] → stored on Server A
  Server A: value="https://v2.com", VC=[A:2, B:0]

Client 2 (concurrently, read the old value) updates to "https://v3.com"
Writes with context VC=[A:1, B:0] → stored on Server B
  Server B: value="https://v3.com", VC=[A:1, B:1]

Reconciliation:
  [A:2, B:0] vs [A:1, B:1]: concurrent (A has A:2 > B:0, B has B:1 > A:0)
  → CONFLICT: surface both values to the application for resolution
```

In Dynamo, conflicting versions are returned to the next reader as a list ("siblings"). The application (or the client library) resolves the conflict and writes back a merged version.

### The vector clock size problem

A vector clock has one entry per process. With 100 processes, each vector clock is 100 entries. At scale, this becomes expensive to store and transmit.

Dynamo addresses this with **dotted version vectors** (a more compact representation) or **version vectors** (track client IDs rather than server IDs). Riak and newer Dynamo implementations use these to avoid unbounded clock growth.

---

## Vector clocks vs Lamport timestamps

| | Lamport Timestamps | Vector Clocks |
|---|---|---|
| **Detects A → B** | Yes | Yes |
| **Detects concurrency** | No | Yes |
| **Size** | 1 integer | N integers (one per process) |
| **Use case** | Total order, simple causality tracking | Conflict detection, concurrent write detection |

---

## The one thing to remember

> **Vector clocks track per-process knowledge so that comparing two events tells you definitively whether one happened-before the other or they're concurrent.** A happened-before B means A's every counter ≤ B's corresponding counter (and at least one is strictly less). Concurrent means each has a higher counter in at least one dimension — neither subsumes the other. This is the mechanism behind conflict detection in Dynamo and Riak: concurrent writes are surfaced to the application for resolution, not silently discarded.

---

*← Previous: **[Lamport Timestamps](/lamport-timestamps-ordering-events-without-a-global-clock)** — the simplest logical clock: assigning monotonic integers to events to capture causal order.*

*→ Next: **[Distributed Transactions](/distributed-transactions-when-one-machine-isnt-enough)** — ensuring atomicity across multiple nodes or services when a single ACID transaction isn't possible.*

