# Caching: The Fastest Database Query Is the One You Don't Make

> **Series:** System Design · Caching — Pillar 5 of 8

## Systems Design

| # | Post | What it covers |
|---|------|----------------|
| 00 | **Caching: The Fastest Database Query Is the One You Don't Make** ← you are here | Caching is one of the most impactful and error-prone tools in system design. Six concepts covering the full lifecycle of a production cache layer. |
| 01 | [Caching: Storing Results Closer to Where They're Needed](/caching-storing-results-closer-to-where-theyre-needed) | Caching stores expensive results closer to the reader. Learn how it works, the main patterns, and when it hurts more than it helps. |
| 02 | [Cache Invalidation: Knowing When the Copy Is Wrong](/cache-invalidation-knowing-when-the-copy-is-wrong) | Cache invalidation is notoriously difficult. Learn the main strategies, when each applies, and how to avoid serving stale data at scale. |
| 03 | [Distributed Cache: Spreading Cache Across a Cluster](/distributed-cache-spreading-cache-across-a-cluster) | A single cache node is a bottleneck and a SPOF. Learn how distributed caches partition data, replicate for availability, and handle node failures. |
| 04 | [Cache Eviction Policies: What Gets Thrown Out When the Cache Is Full](/cache-eviction-policies-what-gets-thrown-out-when-the-cache-is-full) | When a cache fills up, something must go. Learn how LRU, LFU, FIFO, and TTL-based eviction work and how to choose the right policy for your data. |
| 05 | [Cache Stampede: When Expiry Triggers a Database Avalanche](/cache-stampede-when-expiry-triggers-a-database-avalanche) | When a hot cache entry expires, hundreds of servers query the database simultaneously. Learn how cache stampedes happen and how to prevent them. |
| 06 | [Cache Warming: Starting Hot Instead of Cold](/cache-warming-starting-hot-instead-of-cold) | A cold cache causes database overload on startup. Learn how to warm caches proactively using predictive loading, lazy warming, and scheduled jobs. |
| 07 | [Caching: Wrap-Up](/caching-wrap-up) | A recap of all 6 caching concepts: what caching is, invalidation strategies, distributed caches, eviction policies, stampedes, and warming. How they connect. |

---

# Caching: The Fastest Database Query Is the One You Don't Make

## The scenario

Your URL shortener is handling steady traffic. The redirect critical path — receive a request, look up the destination URL, return a 302 — takes about 12 milliseconds. That 12ms is mostly waiting: waiting for the database query to complete, waiting for the network round-trip, waiting for the connection pool to hand over a slot.

The destination URL for `sho.rt/x7Kp2` is `https://example.com/some-long-path`. It has been the same value since the link was created six months ago. It will probably be the same value six months from now. Every millisecond of those 12ms is paid on every single redirect — including the five million redirects for the same link over the past week.

You're doing the same database round-trip five million times to retrieve the same unchanging answer.

The observation that unlocks enormous performance improvements is this: **most data in most systems is read far more often than it is written, and most reads ask for data that was recently or frequently requested.** If you store the answer to a frequent question somewhere faster than the database, you can serve the question without asking the database at all.

That's caching. It's the highest-leverage performance tool in distributed systems, and it's also where some of the most common and costly production incidents originate. This pillar covers both sides.

Six concepts across the full lifecycle of a production cache:

- **What caching is** — the patterns, the hit ratio, and why it works
- **Cache invalidation** — knowing when the stored copy is wrong
- **Distributed caching** — scaling the cache itself across multiple nodes
- **Eviction policies** — what gets discarded when memory fills
- **Cache stampede** — what happens when many servers miss simultaneously
- **Cache warming** — ensuring the cache is never cold when traffic arrives

**TL;DR:** A cache stores copies of frequently read data closer to the reader, trading consistency for speed. The fundamental questions are always: how stale can the data get before it causes a problem, and how does the cache learn when to update? TTL is the simplest invalidation mechanism but is blunt — active invalidation keeps caches fresh but adds coupling. Distributed caches partition data for scale and replicate for availability, but add operational complexity. Eviction policies determine what gets discarded under memory pressure — LFU outperforms LRU for systems with a small hot set. Cache stampedes happen when popular entries expire under load and every server rushes to the database simultaneously; stale-while-revalidate and distributed locks prevent them. Cache warming ensures a new deployment starts with a hot cache rather than overwhelming the database on cold start.

---

## What this pillar covers

### Caching — the foundation

The cache-aside pattern, the difference between a cache hit and a cache miss, the hit ratio as a measure of cache health, and where caches exist at different layers of the stack (browser, CDN, application, distributed cache, database). The tradeoffs — consistency vs speed, memory vs hit ratio — that will appear in every subsequent post.

**Best mental model:** a notepad on your desk for a colleague's phone extension. The binder two floors down is authoritative and current. The notepad is fast and convenient but might be out of date if the extension changed and you haven't crossed out the old number.

---

### Cache Invalidation — the hard part

Phil Karlton's famous quip exists because invalidation is genuinely difficult. The strategies — TTL, active purge on write, write-through, event-driven via CDC — each trade consistency for complexity and coupling in different ways. The invalidation race condition that lets a stale value overwrite a fresh cache entry. Why TTL should always be a backstop even when you have active invalidation.

**Best mental model:** a printed train timetable. It was accurate on print day. Every subsequent service change makes it staler. Invalidation is the set of strategies for when and how to replace it: reprint quarterly (TTL), reprint only changed pages on change (active purge), push notifications to commuters' phones (event-driven).

---

### Distributed Cache — scaling the cache layer

When a single cache node's memory or throughput isn't enough, you need a cluster. Consistent hashing and Redis Cluster's slot-based partitioning distribute keys across nodes so adding or removing a node doesn't invalidate the entire cache. Replication provides availability — a replica promotes to primary automatically when the primary fails. The difference between Redis Sentinel (HA without sharding) and Redis Cluster (both).

**Best mental model:** a city library system across multiple branches. The catalogue (consistent hashing / slot table) tells you which branch holds a given book. If a branch is closed, the reserve copy at another branch is available. No single branch failure shuts down the entire library.

---

### Cache Eviction Policies — when memory fills

A cache has finite memory. When it fills, something must be discarded. LRU (least recently used) evicts the entry not accessed in the longest time — good for recency-driven workloads. LFU (least frequently used) evicts the entry accessed least overall — better for stable hot sets and power-law distributions. FIFO is simple but ignores access patterns. How Redis implements each via the `maxmemory-policy` setting, and why you should always set `maxmemory` explicitly.

**Best mental model:** books on a small shelf. LRU removes the one you haven't touched in the longest time. LFU removes the one you've referenced least overall. FIFO removes the one that's been on the shelf longest — which could be your most-read classic.

---

### Cache Stampede — the thundering herd

When a hot cache entry expires, every server that was relying on it discovers the miss simultaneously and rushes to the database. The database receives N identical queries for the same key — potentially bringing it down. The three mitigations: TTL jitter (spread expiry times to prevent correlated expiry), stale-while-revalidate (serve stale, refresh in background), and distributed locking (only one server fetches, others wait). Probabilistic early expiry as a more sophisticated alternative.

**Best mental model:** a sold-out concert ticket going back on sale. The moment it reappears, ten thousand refreshing fans attempt to buy it simultaneously. Each is individually reasonable; together they overwhelm the payment system.

---

### Cache Warming — starting hot

A cold cache causes the traffic it normally absorbs to hit the database directly. For high-traffic systems that depend on the cache as a load shield, a cold start can overwhelm the database in seconds. Warming strategies: predictive pre-loading (load the hot set before traffic arrives), traffic shadowing, gradual canary ramp, and Redis snapshot restore. How to measure warming progress via hit ratio and when to consider the cache sufficiently warm.

**Best mental model:** restaurant mise en place. A professional kitchen doesn't wait for the first order before preparing ingredients. Everything is prepped before service begins, so the kitchen is ready to plate immediately — not scrambling to catch up while guests are waiting.

---

## The URL shortener at this stage

Entering Pillar 5, the URL shortener looks like this:

- **PostgreSQL** — links, users, organisation data (relational core)
- **Cassandra** — click events (wide-column, high write throughput)
- **InfluxDB** — metrics and aggregate analytics (time series)
- **Elasticsearch** — full-text search over links and destinations
- **Pinecone** — semantic similarity for link recommendations
- **S3** — user-uploaded QR codes and profile images
- **Redis** — cache layer (treated as a black box until now)

Pillar 5 opens the Redis black box. By the end, you'll understand every decision behind that Redis layer — how it's partitioned, how it handles failures, what gets evicted under memory pressure, how it survives a viral link's cache entry expiring at the worst possible moment, and how deployments start without overloading the database.

---

*→ Next: **[Caching](/caching-storing-results-closer-to-where-theyre-needed)** — the fundamentals of storing results closer to the reader*

