# Cache Stampede: When Expiry Triggers a Database Avalanche

> **Series:** System Design · Caching — Pillar 5 of 8

## Systems Design

| # | Post | What it covers |
|---|------|----------------|
| 00 | [Caching: The Fastest Database Query Is the One You Don't Make](/caching-the-fastest-database-query-is-the-one-you-dont-make) | Caching is one of the most impactful and error-prone tools in system design. Six concepts covering the full lifecycle of a production cache layer. |
| 01 | [Caching: Storing Results Closer to Where They're Needed](/caching-storing-results-closer-to-where-theyre-needed) | Caching stores expensive results closer to the reader. Learn how it works, the main patterns, and when it hurts more than it helps. |
| 02 | [Cache Invalidation: Knowing When the Copy Is Wrong](/cache-invalidation-knowing-when-the-copy-is-wrong) | Cache invalidation is notoriously difficult. Learn the main strategies, when each applies, and how to avoid serving stale data at scale. |
| 03 | [Distributed Cache: Spreading Cache Across a Cluster](/distributed-cache-spreading-cache-across-a-cluster) | A single cache node is a bottleneck and a SPOF. Learn how distributed caches partition data, replicate for availability, and handle node failures. |
| 04 | [Cache Eviction Policies: What Gets Thrown Out When the Cache Is Full](/cache-eviction-policies-what-gets-thrown-out-when-the-cache-is-full) | When a cache fills up, something must go. Learn how LRU, LFU, FIFO, and TTL-based eviction work and how to choose the right policy for your data. |
| 05 | **Cache Stampede: When Expiry Triggers a Database Avalanche** ← you are here | When a hot cache entry expires, hundreds of servers query the database simultaneously. Learn how cache stampedes happen and how to prevent them. |
| 06 | [Cache Warming: Starting Hot Instead of Cold](/cache-warming-starting-hot-instead-of-cold) | A cold cache causes database overload on startup. Learn how to warm caches proactively using predictive loading, lazy warming, and scheduled jobs. |
| 07 | [Caching: Wrap-Up](/caching-wrap-up) | A recap of all 6 caching concepts: what caching is, invalidation strategies, distributed caches, eviction policies, stampedes, and warming. How they connect. |

---

# Cache Stampede: When Expiry Triggers a Database Avalanche

## The problem

A short link goes viral — a million users share it in an hour. Your Redis cache handles it easily. The TTL is set to one hour.

Exactly one hour after the link first went viral, the cache entry expires. In the milliseconds after expiry, ten thousand app servers simultaneously try to resolve `sho.rt/x7Kp2`. Every one of them checks the cache. Every one of them gets a miss. Every one of them queries the database for the destination URL. Ten thousand database queries fire at once for the same key.

Your database, which normally handles this key by serving it from cache, is suddenly asked to answer the same query ten thousand times simultaneously. CPU spikes. Connection pool exhausts. Query latency skyrockets. Other queries time out. The platform degrades under the weight of a single cache entry expiring.

Then — ten seconds later — all ten thousand servers have populated their local copy. The cache is hot again. The crisis is over. Until the next expiry.

This is a cache stampede, also called the thundering herd problem.

---

## The core idea

A cache stampede occurs when a popular cache entry expires (or is invalidated) and multiple concurrent requests, all missing in the cache simultaneously, rush to the origin system to recompute or refetch the data. The origin system receives a burst of identical requests it wouldn't normally see — potentially many times its steady-state load.

The problem is self-reinforcing: the database is overwhelmed, queries are slow, the cache takes longer to repopulate, more requests accumulate behind the miss, and the database is under sustained load until the cache recovers.

---

## The analogy: a sold-out concert ticket going back on sale

A popular concert sells out. Ten thousand fans are watching the ticket site, refreshing every second. One ticket gets returned. The moment it appears available, all ten thousand fans try to buy it simultaneously. The ticket site's payment backend, normally handling one transaction per second, suddenly receives ten thousand concurrent checkout requests. It falls over.

That single ticket re-listing is the cache expiry. The ten thousand fans are the app servers. The overwhelmed payment backend is the database. None of the fans are being unreasonable — each is independently doing the sensible thing. The problem is coordination, or rather the absence of it.

---

## How a cache stampede unfolds

```
t=0:    url:x7Kp2 added to cache, TTL=3600s
t=3600: TTL expires. Entry removed from cache.

t=3600.001:
  Server 1:  GET url:x7Kp2 → MISS
  Server 2:  GET url:x7Kp2 → MISS
  Server 3:  GET url:x7Kp2 → MISS
  ...
  Server N:  GET url:x7Kp2 → MISS

  All N servers query the database simultaneously.
  DB receives N identical queries.

t=3600.050:
  Server 1 gets DB response, writes to cache
  Servers 2..N also get DB response (query already in flight)
  Servers 2..N also write to cache (redundant)

Result: N database queries for 1 key. Cache repopulated.
        During t=3600.001–3600.050: database load spike,
        other queries delayed.
```

The stampede is worst when:
- The expired entry was extremely hot (many concurrent readers)
- Recomputation is slow (the database query takes 100ms, not 1ms)
- Many app server instances are running

---

## Mitigation strategies

### 1. Mutex / lock (cache lock)

When a cache miss occurs, only one server is allowed to fetch from the database. Others wait for the result.

```python
def get_destination(short_code):
    cached = redis.get(f"url:{short_code}")
    if cached:
        return cached

    # Cache miss — acquire a lock
    lock_key = f"lock:url:{short_code}"
    acquired = redis.set(lock_key, "1", nx=True, ex=5)  # NX = only if not exists

    if acquired:
        # This server fetches from DB
        try:
            url = db.query("SELECT destination FROM links WHERE code = ?", short_code)
            redis.setex(f"url:{short_code}", 3600, url)
            return url
        finally:
            redis.delete(lock_key)
    else:
        # Another server is fetching — wait briefly and retry
        time.sleep(0.05)
        return get_destination(short_code)  # retry — will likely hit cache
```

**Strengths:** Only one database query fires per miss, no matter how many concurrent waiters.

**Weaknesses:** Waiters are blocked during the fetch. If the fetch is slow, all waiters queue up. If the lock holder crashes before releasing the lock, waiters are stuck until the lock TTL expires. Adds latency for the first request.

---

### 2. Probabilistic early expiry (PER)

Before the TTL expires, proactively recompute the value with increasing probability as expiry approaches. No single server holds a lock — the cache is refreshed slightly before it expires.

The XFetch algorithm:
```python
def get_destination_xfetch(short_code, beta=1.0):
    result = redis.get(f"url:{short_code}")
    if not result:
        return fetch_and_cache(short_code)

    ttl = redis.ttl(f"url:{short_code}")
    delta = compute_time_for_last_refresh(short_code)

    # Decide whether to proactively refresh
    if -delta * beta * math.log(random.random()) > ttl:
        # Probabilistically recompute before expiry
        return fetch_and_cache(short_code)

    return result
```

As TTL approaches zero, the probability of proactive recomputation increases. One server eventually triggers a refresh while the old value is still in cache — so all other servers continue to get cache hits during the recomputation. When the entry expires, it has already been refreshed.

**Strengths:** No lock contention. No blocked waiters. Cache never truly misses for hot keys.

**Weaknesses:** Slightly more complex. Requires storing the time the entry was last fetched. `beta` is a tuning parameter. Can cause unnecessary recomputation if the TTL is very long.

---

### 3. Stale-while-revalidate

Return the stale cached value immediately, and trigger a background recomputation. The entry is never truly "expired" from the client's perspective — it returns instantly even when stale.

```python
def get_destination_swr(short_code):
    cached = cache.get(f"url:{short_code}")

    if cached:
        if cached.is_stale() and not currently_refreshing(short_code):
            # Return stale immediately, refresh in background
            trigger_background_refresh(short_code)
        return cached.value

    # Hard miss — no choice but to block
    return fetch_and_cache(short_code)
```

HTTP caches use this as `Cache-Control: stale-while-revalidate=<seconds>` — serve the stale response for up to N seconds while fetching a fresh one in the background.

**Strengths:** Zero latency for stale reads. No blocked requests. Transparent to the caller.

**Weaknesses:** Callers receive stale data. Requires a background refresh mechanism. Hard misses (no value at all) still block.

---

### 4. TTL jitter

If many cache entries have the same TTL and were populated at the same time (e.g., during a deployment or warm-up), they all expire simultaneously. A stampede hits many keys at once.

Add randomness to TTL:
```python
base_ttl = 3600
jitter = random.randint(-300, 300)  # ±5 minutes
redis.setex(f"url:{short_code}", base_ttl + jitter, url)
```

This spreads expiry across a window, so misses are distributed over time rather than simultaneous.

**Strengths:** Trivial to implement. Prevents correlated expiry.

**Weaknesses:** Doesn't prevent stampede on a single very hot key — just prevents simultaneous stampedes across many keys.

---

### 5. Cache warming (proactive population)

The cleanest solution for predictable hot entries: populate the cache before expiry, not after. This is covered in detail in the next post — but the key insight is that a stampede on expiry is a symptom of reactive cache population. Proactive warming eliminates it.

---

## Tradeoffs

**Mutex vs stale-while-revalidate.** Mutex guarantees fresh data but adds latency for waiting requests. Stale-while-revalidate returns immediately but serves stale data. For URL redirects, stale-while-revalidate is usually preferable — a user getting a one-second-stale redirect is fine; a user waiting 200ms while a lock is held is not.

**Complexity vs resilience.** TTL jitter is trivial. Probabilistic early expiry is more complex but more robust. For very hot keys, invest in the more sophisticated approach.

---

## When to worry about it / when not to

**Cache stampedes matter when:**
- A small number of keys get extremely high traffic (power-law distribution)
- Recomputation from the origin is slow or expensive
- Many app server instances run concurrently
- TTLs are short relative to recomputation time

**Cache stampedes don't matter much when:**
- Traffic is evenly distributed across many keys
- Recomputation is fast (milliseconds) and the database handles it fine
- You have only a few app servers

**In the URL shortener:** the top 0.01% of links (viral links) are stampede candidates. Mitigation: stale-while-revalidate for all URL cache reads (serve stale, refresh in background); TTL jitter on initial population; mutex as a fallback for hard misses.

---

## The one thing to remember

> **A cache stampede is a coordination failure: many servers doing the sensible thing independently produces an outcome that would be catastrophic if anticipated.** The simplest mitigations — TTL jitter and stale-while-revalidate — handle most cases. For truly hot keys where freshness matters, combine a distributed lock (one server fetches, others wait) with probabilistic early recomputation (refresh before expiry). The worst thing you can do is nothing: a single viral link expiring at exactly the wrong moment can take down a database that has been serving millions of requests per second.

---

*← Previous: **[Cache Eviction Policies](/cache-eviction-policies-what-gets-thrown-out-when-the-cache-is-full)** — a cache has finite memory; when it fills, something has to be removed — the policy determines what.*

*→ Next: **[Cache Warming](/cache-warming-starting-hot-instead-of-cold)** — proactively populating the cache before traffic arrives so you never start cold.*