# Distributed Cache: Spreading Cache Across a Cluster

> **Series:** System Design · Caching — Pillar 5 of 8

## Systems Design

| # | Post | What it covers |
|---|------|----------------|
| 00 | [Caching: The Fastest Database Query Is the One You Don't Make](/caching-the-fastest-database-query-is-the-one-you-dont-make) | Caching is one of the most impactful and error-prone tools in system design. Six concepts covering the full lifecycle of a production cache layer. |
| 01 | [Caching: Storing Results Closer to Where They're Needed](/caching-storing-results-closer-to-where-theyre-needed) | Caching stores expensive results closer to the reader. Learn how it works, the main patterns, and when it hurts more than it helps. |
| 02 | [Cache Invalidation: Knowing When the Copy Is Wrong](/cache-invalidation-knowing-when-the-copy-is-wrong) | Cache invalidation is notoriously difficult. Learn the main strategies, when each applies, and how to avoid serving stale data at scale. |
| 03 | **Distributed Cache: Spreading Cache Across a Cluster** ← you are here | A single cache node is a bottleneck and a SPOF. Learn how distributed caches partition data, replicate for availability, and handle node failures. |
| 04 | [Cache Eviction Policies: What Gets Thrown Out When the Cache Is Full](/cache-eviction-policies-what-gets-thrown-out-when-the-cache-is-full) | When a cache fills up, something must go. Learn how LRU, LFU, FIFO, and TTL-based eviction work and how to choose the right policy for your data. |
| 05 | [Cache Stampede: When Expiry Triggers a Database Avalanche](/cache-stampede-when-expiry-triggers-a-database-avalanche) | When a hot cache entry expires, hundreds of servers query the database simultaneously. Learn how cache stampedes happen and how to prevent them. |
| 06 | [Cache Warming: Starting Hot Instead of Cold](/cache-warming-starting-hot-instead-of-cold) | A cold cache causes database overload on startup. Learn how to warm caches proactively using predictive loading, lazy warming, and scheduled jobs. |
| 07 | [Caching: Wrap-Up](/caching-wrap-up) | A recap of all 6 caching concepts: what caching is, invalidation strategies, distributed caches, eviction policies, stampedes, and warming. How they connect. |

---

# Distributed Cache: Spreading Cache Across a Cluster

## The problem

Your URL shortener's Redis instance handles two hundred thousand reads per second. It's running on a large memory-optimised VM with 64 GB of RAM and the latency is excellent — sub-millisecond for most operations.

Then three problems arrive together. First, the dataset grows beyond 64 GB — you'd need to evict hot entries to make room. Second, the Redis instance becomes a single point of failure — when it restarts for a maintenance window, every app server hits the database simultaneously and the database falls over. Third, the Redis instance's network interface saturates — it can't push data fast enough to serve all the app servers even though CPU and memory are fine.

You can't scale a single cache node vertically forever. At some point you need multiple nodes, data spread across them, and enough redundancy that no single node failure takes down your cache layer.

That's a distributed cache.

---

## The core idea

A distributed cache is a cache layer spread across multiple nodes, where the dataset is too large or too hot for a single node. Data is partitioned across nodes so each node holds a subset. Nodes are typically replicated so that a standby can take over when a primary fails. The system appears to the application as a single cache, even though requests are routed to different nodes depending on the key.

---

## The analogy: a library system across multiple branches

A single library branch can only hold so many books. A city library system solves this by spreading the collection across branches — the central library, the west branch, the university branch. When you need a book, the catalogue tells you which branch holds it. If you want a popular title, the central library might have three copies at different locations so one is always available even if one branch is closed.

The distributed cache is the library system. Each branch is a cache node. The consistent hashing ring (or slot table) is the catalogue. Replication is the multiple copies of popular titles. The city's single catalogue hides the fact that the books live in different buildings.

---

## How distributed caching works

### Partitioning: spreading data across nodes

The core challenge: given a key like `url:x7Kp2`, which node holds it?

**Naive modular hashing:**
```
node_index = hash(key) % num_nodes
```
Simple, but catastrophic during scaling: adding or removing a node changes `num_nodes`, remapping nearly every key to a different node. The cache is effectively invalidated — everything is a miss until re-populated.

**Consistent hashing** (covered in detail in Pillar 4, post 08) solves this. Keys and nodes are placed on a hash ring. A key maps to the first node clockwise from its position on the ring. Adding or removing a node only remaps the keys that neighbour that node — typically 1/N of the total keys, where N is the number of nodes.

```
Hash ring (simplified):
  Node A: handles keys hashing to 0–30%
  Node B: handles keys hashing to 30–60%
  Node C: handles keys hashing to 60–100%

  url:x7Kp2 → hash 42% → Node B
  url:aB3cD → hash 71% → Node C
  user:9821  → hash 18% → Node A
```

**Redis Cluster's slot-based approach:** Redis Cluster divides the keyspace into 16,384 hash slots. Each node owns a contiguous range of slots. `CLUSTER KEYSLOT key` returns the slot for any key. When you add a node, slots are migrated from existing nodes to the new one — only the migrated slots see a miss period.

```
Node 1: slots 0–5460
Node 2: slots 5461–10922
Node 3: slots 10923–16383

hash_slot = CRC16(key) % 16384
```

### Replication: surviving node failures

Partitioning gives you scale. Replication gives you availability. Each primary node has one or more replicas that receive a copy of every write.

```
Write flow (Redis Cluster):
  Client → Node B (primary for slot 7,234)
  Node B → writes to memory
  Node B → replicates async to Node B-replica
  Node B → returns OK to client

Read flow:
  Client → Node B or Node B-replica (reads can be served by replicas)
```

Replication in Redis is asynchronous by default — the primary acknowledges the write before the replica confirms receipt. This means a small window of data loss is possible if the primary fails before replication completes. For cache data (which can be re-populated from the source), this is usually acceptable.

**Automatic failover:** Redis Cluster nodes gossip with each other. When a primary is unreachable for a configurable timeout, the cluster promotes a replica to primary automatically. During the failover window (typically one to five seconds), requests to that partition fail or return stale data.

### Client routing

Application code doesn't need to know which node holds a key. The Redis Cluster client handles routing:

```python
# Client transparently routes to the correct node
redis_cluster.get("url:x7Kp2")   # → internally routes to Node B
redis_cluster.get("user:9821")    # → internally routes to Node A
```

If a key has moved (due to slot migration or failover), the node returns a MOVED or ASK redirect, and the client retries against the correct node. Smart clients cache the slot map locally for efficiency.

### Memcached's approach

Memcached distributes differently — it has no built-in clustering. Instead, client-side consistent hashing distributes keys across a pool of independent Memcached nodes. There's no replication, no failover, and no coordination between nodes. If a node fails, all its keys are cache misses until re-populated. Simpler operationally, but no high availability.

---

## Redis Sentinel vs Redis Cluster

Two common Redis multi-node setups with different goals:

**Redis Sentinel** — single primary, one or more replicas, Sentinel processes monitor and handle automatic failover. The full dataset fits on one node; replicas are for HA, not scale-out. Read replicas can serve read traffic.

```
Sentinel setup:
  Primary (full dataset) ← replicates to → Replica 1
                         ← replicates to → Replica 2
  Sentinel 1, 2, 3 monitor primaries and vote on failover
```

**Redis Cluster** — dataset sharded across multiple primaries, each with replicas. Scales horizontally both for memory and throughput. More complex to operate.

```
Cluster setup:
  Primary A (slots 0–5460)    + Replica A
  Primary B (slots 5461–10922) + Replica B
  Primary C (slots 10923–16383) + Replica C
```

Use Sentinel when your dataset fits on one node and you need HA. Use Cluster when your dataset or throughput exceeds what a single node can handle.

---

## Tradeoffs

**Consistency during partitioning.** A distributed cache is eventually consistent — replica lag means reads from replicas can be slightly behind the primary. For cache data, this is usually fine. For data where consistency matters (e.g., a rate limiter's counter), reads should go to the primary.

**Operational complexity.** A distributed cache cluster is significantly more complex to operate than a single node. Slot migrations, failover testing, monitoring each node, handling split-brain scenarios — the ops burden is real.

**Cross-slot operations.** Redis multi-key operations (MGET, MSET, transactions, Lua scripts) only work if all keys hash to the same slot. Hash tags (`{user:9821}:links` and `{user:9821}:profile` hash the same because of `{}`) force multiple keys to the same slot, but must be used carefully to avoid hotspots.

**Network bandwidth.** Each cache node has its own network interface. Distributing keys across nodes also distributes network load — a distributed cache naturally avoids the single-node bandwidth saturation problem.

---

## When to use it / when not to

**Use a distributed cache when:**
- Dataset exceeds single-node memory
- Request throughput exceeds single-node capacity
- Single-node failure is unacceptable (high availability requirement)

**Use a single node (with sentinel for HA) when:**
- Dataset fits comfortably on one node
- You want simpler operations
- Cross-key atomic operations are frequent (easier on single node)

**In the URL shortener at scale:** a Redis Cluster with 6 nodes (3 primaries + 3 replicas) handles destination URL caching. The three primaries share the keyspace and each absorb roughly one-third of the read load. If any primary fails, its replica promotes in under five seconds — users experience at most a brief redirect failure, not a total outage.

---

## The one thing to remember

> **A distributed cache partitions data across nodes for scale and replicates nodes for availability — but it adds coordination complexity that a single node doesn't have.** Cross-slot operations break, failover takes seconds, replica lag means reads can be stale, and slot migrations require careful planning. Use a distributed cache when you've genuinely outgrown a single node; use Sentinel for HA without the full cluster complexity until you need it.

---

*← Previous: **[Cache Invalidation](/cache-invalidation-knowing-when-the-copy-is-wrong)** — storing data in a cache is the easy part; knowing when to remove or update it is where most cache bugs live.*

*→ Next: **[Cache Eviction Policies](/cache-eviction-policies-what-gets-thrown-out-when-the-cache-is-full)** — a cache has finite memory; when it fills, something has to be removed — the policy determines what.*

