# Cache Warming: Starting Hot Instead of Cold

> **Series:** System Design · Caching — Pillar 5 of 8

## Systems Design

| # | Post | What it covers |
|---|------|----------------|
| 00 | [Caching: The Fastest Database Query Is the One You Don't Make](/caching-the-fastest-database-query-is-the-one-you-dont-make) | Caching is one of the most impactful and error-prone tools in system design. Six concepts covering the full lifecycle of a production cache layer. |
| 01 | [Caching: Storing Results Closer to Where They're Needed](/caching-storing-results-closer-to-where-theyre-needed) | Caching stores expensive results closer to the reader. Learn how it works, the main patterns, and when it hurts more than it helps. |
| 02 | [Cache Invalidation: Knowing When the Copy Is Wrong](/cache-invalidation-knowing-when-the-copy-is-wrong) | Cache invalidation is notoriously difficult. Learn the main strategies, when each applies, and how to avoid serving stale data at scale. |
| 03 | [Distributed Cache: Spreading Cache Across a Cluster](/distributed-cache-spreading-cache-across-a-cluster) | A single cache node is a bottleneck and a SPOF. Learn how distributed caches partition data, replicate for availability, and handle node failures. |
| 04 | [Cache Eviction Policies: What Gets Thrown Out When the Cache Is Full](/cache-eviction-policies-what-gets-thrown-out-when-the-cache-is-full) | When a cache fills up, something must go. Learn how LRU, LFU, FIFO, and TTL-based eviction work and how to choose the right policy for your data. |
| 05 | [Cache Stampede: When Expiry Triggers a Database Avalanche](/cache-stampede-when-expiry-triggers-a-database-avalanche) | When a hot cache entry expires, hundreds of servers query the database simultaneously. Learn how cache stampedes happen and how to prevent them. |
| 06 | **Cache Warming: Starting Hot Instead of Cold** ← you are here | A cold cache causes database overload on startup. Learn how to warm caches proactively using predictive loading, lazy warming, and scheduled jobs. |
| 07 | [Caching: Wrap-Up](/caching-wrap-up) | A recap of all 6 caching concepts: what caching is, invalidation strategies, distributed caches, eviction policies, stampedes, and warming. How they connect. |

---

# Cache Warming: Starting Hot Instead of Cold

## The problem

You deploy a new version of your URL shortener. The deployment restarts all app servers and replaces the Redis cluster. The cache is empty.

Traffic immediately hits at full volume — one hundred thousand requests per second. Every single request is a cache miss. Every request hits the database. PostgreSQL, which normally handles three thousand database queries per second (because the cache absorbs the other ninety-seven thousand), suddenly receives one hundred thousand queries per second. It falls over within thirty seconds.

You roll back. The database recovers. The cache repopulates. Everything is fine.

The cache wasn't buggy. The deployment wasn't wrong. The problem was simpler: you started cold. Your system was designed to operate with a warm cache, but you gave it a cold one and expected it to absorb full traffic instantly.

This is the cold start problem. Cache warming is the solution.

---

## The core idea

Cache warming is the practice of pre-populating a cache with the data most likely to be requested before exposing the system to production traffic. Instead of waiting for organic traffic to fill the cache (lazy population), you proactively load the cache with your working set — the hot data — so that when traffic arrives, the hit ratio is already high.

A warm cache can also mean preserving cache state across restarts, so the cache is never cold in the first place.

---

## The analogy: a restaurant mise en place

A professional kitchen doesn't wait for a dinner order before starting to prepare ingredients. Before service begins, the chef performs mise en place — chopping vegetables, reducing sauces, portioning proteins — so that when the first order comes in, the kitchen can plate a dish in minutes, not hours.

A cold cache is a kitchen at 6pm with nothing prepared. Cache warming is the mise en place. The kitchen is ready before the first guest orders, not trying to catch up while guests are already seated.

The warm kitchen doesn't prepare every dish on the menu — just the most frequently ordered ones. Similarly, cache warming doesn't load the entire dataset — just the working set.

---

## Warming strategies

### 1. Predictive pre-loading

Analyse historical access patterns to identify the hot working set, then load those entries into the cache before traffic is routed to the system.

```python
# Warming script run before deployment completes
def warm_url_cache():
    # Get the top 100k most-accessed links from the analytics DB
    hot_links = analytics_db.query("""
        SELECT short_code, destination
        FROM links
        ORDER BY click_count_24h DESC
        LIMIT 100000
    """)

    pipeline = redis.pipeline()
    for link in hot_links:
        pipeline.setex(f"url:{link.short_code}", 3600, link.destination)

    pipeline.execute()
    print(f"Warmed {len(hot_links)} entries")
```

**Strengths:** Cache is warm before the first production request. Hit ratio starts high and stays high.

**Weaknesses:** Requires knowing what the hot set is. Adds time to deployment (warming 100k entries takes seconds; warming 10M takes minutes). The hot set in history may not perfectly match the hot set at deployment time.

**When to use:** High-traffic systems where a cold start would overload the database; systems with predictable hot sets (popular links, product catalogue, landing pages).

---

### 2. Traffic shadowing / replay

Capture production traffic and replay it against the new deployment (with the cache, before real traffic is sent). The cache warms organically from real requests.

```
Production traffic → Load balancer → Production servers (live)
                                  ↘ New servers (dark/shadow)
                                    Cache warms from replayed traffic
                                    Once hit ratio reaches threshold → switch
```

Tools like `tcpreplay`, service meshes, or application-level traffic mirroring can duplicate requests to the shadow deployment.

**Strengths:** Cache warms from real traffic patterns — the hot set that matters. No need to predict the working set.

**Weaknesses:** Complex infrastructure. Care needed to avoid double-processing side-effectful requests (writes, analytics events). Requires enough shadow traffic volume to warm the cache in a reasonable time.

---

### 3. Gradual traffic ramp (canary warm-up)

Route a small percentage of traffic to the new deployment. The cache warms gradually. As hit ratio improves, ramp traffic up.

```
t=0:   5% traffic to new deployment.  Cache: 0% warm.  DB load: manageable.
t=5m:  10% traffic to new deployment. Cache: ~40% warm. DB load: normal.
t=15m: 25% traffic.                    Cache: ~70% warm.
t=30m: 100% traffic.                   Cache: ~95% warm.
```

The database never sees a full cold-start load spike — it absorbs a gradually increasing fraction of real traffic as the cache warms.

**Strengths:** Safe. No complex infrastructure. Real traffic warms the cache naturally.

**Weaknesses:** Takes time. The warming period can be minutes to tens of minutes depending on traffic volume and hot set size. Users on the new deployment during warm-up see slightly higher latency.

---

### 4. Cache snapshot and restore

Persist the cache state to disk (or object storage) periodically. On startup, restore from the latest snapshot before accepting traffic.

Redis supports this natively via RDB (Redis Database) snapshots and AOF (Append-Only File) persistence. On restart, Redis loads the snapshot and is immediately warm.

```
Normal operation:
  Redis saves RDB snapshot every 60s to /var/lib/redis/dump.rdb

On restart:
  Redis loads dump.rdb → cache is populated with last checkpoint
  App servers start routing traffic → high hit ratio immediately

New cluster provisioning:
  Copy dump.rdb from old cluster to new cluster
  Start Redis → warm immediately
```

**Strengths:** The cache is as warm as it was at the last snapshot. No pre-loading script required. Deployments that restart in-place are fast.

**Weaknesses:** Snapshot load time grows with dataset size. A 10 GB snapshot takes 30–60 seconds to load. Data between last snapshot and restart is lost (cold for that delta). For a new cluster replacing an old one, copying the snapshot adds operational complexity.

**When to use:** Restarts of existing cache nodes; as a safety net alongside other warming strategies.

---

### 5. Lazy warming with database-side protection

Accept that the cache starts cold, but protect the database during warm-up with explicit rate limiting or a circuit breaker on cache misses.

```python
def get_destination(short_code):
    cached = redis.get(f"url:{short_code}")
    if cached:
        return cached

    # Rate-limit database queries during cold start
    if warmup_in_progress and not db_query_limiter.allow():
        # Return a "warming" response or serve from a secondary slower cache
        raise CacheWarmingError("Cache warming in progress, retry shortly")

    url = db.query(...)
    redis.setex(...)
    return url
```

**Strengths:** Simple. No pre-loading infrastructure.

**Weaknesses:** Users experience errors or degraded responses during warm-up. Only appropriate for systems where brief degradation is tolerable.

---

## Measuring warming progress

Track warm-up progress via the cache hit ratio. A simple monitoring loop:

```python
def monitor_warmup():
    while True:
        info = redis.info("stats")
        hits = info["keyspace_hits"]
        misses = info["keyspace_misses"]
        total = hits + misses
        hit_ratio = hits / total if total > 0 else 0

        print(f"Hit ratio: {hit_ratio:.1%} ({hits} hits, {misses} misses)")

        if hit_ratio > 0.95:
            print("Cache sufficiently warm — routing production traffic")
            break

        time.sleep(10)
```

Don't route full production traffic until the hit ratio reaches an acceptable threshold (typically 90–95%).

---

## Tradeoffs

**Pre-loading time vs cold exposure.** Predictive pre-loading adds minutes to deployments but ensures the cache is warm before traffic arrives. Gradual ramp is safer but extends the warming window. Snapshot restore is fast but requires snapshot infrastructure.

**Accuracy of the hot set prediction.** Historical patterns predict future patterns well for stable workloads. For bursty or unpredictable workloads (a viral event, a product launch), the hot set may shift overnight and historical data misleads.

**Memory during warming.** Pre-loading scripts can populate entries that never get requested (predicting the wrong hot set). These entries waste memory and may evict genuinely hot entries if memory is tight. Prefer warming a smaller, higher-confidence hot set over loading everything.

---

## When it matters / when it doesn't

**Cache warming is critical when:**
- Cache miss cost is high (slow origin queries, expensive computations)
- Traffic is high-volume with a small hot set
- Deployments happen frequently (multiple times per day)
- Database cannot absorb even brief full-traffic cold load

**Cache warming is less important when:**
- Cache miss cost is low (fast origin, light database)
- Traffic volume is modest
- Deployments are infrequent and can happen during off-peak hours
- The database scales to handle full traffic without the cache (cache is an optimisation, not a load shield)

**In the URL shortener:** warm the top 500k links (by 24h click volume) before each deployment using a pre-loading script. Monitor hit ratio. Route canary traffic at 5% until the ratio exceeds 90%, then ramp to 100%. Redis persistence (RDB) is enabled so node restarts within the cluster recover from snapshot.

---

## The one thing to remember

> **A cache that starts empty will cause the same traffic it's designed to absorb to hit the database directly — possibly all at once.** Always have a warming strategy proportional to your cache dependency. For low-traffic systems, lazy warming (organic miss-and-populate) is fine. For high-traffic systems where the cache is a load shield, treat warm-up as part of the deployment process: load the hot set before cutting traffic over, or ramp traffic gradually while the cache builds. A cold cache on a high-traffic system isn't an edge case — it's a guaranteed incident waiting to happen.

---

*← Previous: **[Cache Stampede](/cache-stampede-when-expiry-triggers-a-database-avalanche)** — what happens when a hot cache entry expires and every server rushes to repopulate it at the same time.*

*→ Next: **[Pillar 5 Wrap-up](/caching-wrap-up)** — tying together everything in the caching pillar and bridging to scalability & infrastructure.*

