Series: System Design · Caching — Pillar 5 of 8

Systems Design

#	Post	What it covers
00	Caching: The Fastest Database Query Is the One You Don't Make	Caching is one of the most impactful and error-prone tools in system design. Six concepts covering the full lifecycle of a production cache layer.
01	Caching: Storing Results Closer to Where They're Needed	Caching stores expensive results closer to the reader. Learn how it works, the main patterns, and when it hurts more than it helps.
02	Cache Invalidation: Knowing When the Copy Is Wrong	Cache invalidation is notoriously difficult. Learn the main strategies, when each applies, and how to avoid serving stale data at scale.
03	Distributed Cache: Spreading Cache Across a Cluster	A single cache node is a bottleneck and a SPOF. Learn how distributed caches partition data, replicate for availability, and handle node failures.
04	Cache Eviction Policies: What Gets Thrown Out When the Cache Is Full	When a cache fills up, something must go. Learn how LRU, LFU, FIFO, and TTL-based eviction work and how to choose the right policy for your data.
05	Cache Stampede: When Expiry Triggers a Database Avalanche	When a hot cache entry expires, hundreds of servers query the database simultaneously. Learn how cache stampedes happen and how to prevent them.
06	Cache Warming: Starting Hot Instead of Cold	A cold cache causes database overload on startup. Learn how to warm caches proactively using predictive loading, lazy warming, and scheduled jobs.
07	Caching: Wrap-Up ← you are here	A recap of all 6 caching concepts: what caching is, invalidation strategies, distributed caches, eviction policies, stampedes, and warming. How they connect.

Caching: Wrap-Up

Six concepts across one theme. This post ties them together — not as a list of definitions, but as the connected set of decisions you'll make every time you introduce or operate a cache layer in a real system.

The one thing to remember from each post

Caching — a cache is a faster copy of slower data. The fundamental tradeoff is consistency for speed. The questions that matter: how often does this data change, how stale can it get before it causes a problem, and how will the cache know when to update?

Cache Invalidation — cache invalidation is a distributed consistency problem. TTL is a blunt backstop that every cache should have. Active invalidation (purge on write) gives immediate consistency but couples the write path to the cache. Event-driven invalidation decouples them but adds infrastructure and eventual-consistency lag. Use TTL as a fallback even when you have active invalidation — it catches bugs in your invalidation logic.

Distributed Cache — a distributed cache partitions data across nodes for scale and replicates nodes for availability. Consistent hashing (or Redis Cluster's slot table) ensures adding or removing a node only remaps a fraction of keys. Use Sentinel for HA without full cluster complexity; use Cluster when your dataset or throughput genuinely exceeds one node.

Cache Eviction Policies — eviction policy matters most when your working set exceeds your cache size. LRU keeps recently-used entries; LFU keeps frequently-used ones. For power-law access distributions (a small number of hot keys dominating traffic), LFU outperforms LRU. Always set maxmemory explicitly on Redis.

Cache Stampede — a cache stampede is a coordination failure: many servers independently doing the sensible thing produces a database avalanche. TTL jitter prevents correlated expiry across many keys. Stale-while-revalidate prevents blocking during recomputation. A distributed lock prevents multiple servers from hitting the origin simultaneously. For critical hot keys, combine all three.

Cache Warming — a cold cache causes the traffic it's designed to absorb to hit the database directly. Pre-load the hot working set before routing traffic. Ramp traffic gradually if pre-loading is impractical. Enable Redis persistence so restarts recover from snapshot. The investment in warming is proportional to how much you depend on the cache as a load shield.

How they connect

These concepts don't operate independently. Most caching incidents involve several of them interacting:

Caching + cache invalidation are inseparable. You cannot design a cache without deciding how staleness is managed. Every TTL decision is an invalidation decision. Post 01 introduces the tradeoff; post 02 is where you actually solve it.

Cache invalidation + cache stampede are the two most dangerous caching failure modes — and they're related. Aggressive invalidation (very short TTL, frequent purges) reduces staleness but increases miss rate, which increases stampede risk. Stale-while-revalidate mitigates both simultaneously: it reduces the effective miss rate (never hard-blocks on a stale read) while keeping data reasonably fresh.

Distributed cache + eviction policies interact at scale. In a distributed cache, eviction happens independently per node. A key can be hot on node A and cold on node B — the key is always routed to the same node, so node A may evict it while the same key on a different shard is never touched. Monitor eviction metrics per node, not just globally. Hotspot keys (all traffic on one shard) require hash tags or explicit load balancing.

Cache stampede + cache warming address the same underlying problem from different angles. A stampede is a reactive cold start — the cache was warm, an entry expired, it became cold. Cache warming is proactive cold start prevention — never allow the cache to start cold. For a new deployment, cache warming prevents the initial stampede. For ongoing operations, stale-while-revalidate and probabilistic early expiry prevent per-key stampedes.

Eviction policies + cache warming interact during warm-up. A pre-loading script populates a large number of entries. If the cache is memory-constrained, early entries may be evicted before they're ever read — particularly under LRU, where entries loaded at the start of the warm-up script are the "least recently used" by the time the script finishes. Use LFU during warm-up (entries accessed by traffic get frequency counters; unaccessed pre-loaded entries are evicted first). Or load only the highest-confidence hot set rather than an overly broad working set.

End-to-end: the URL shortener's caching architecture

The URL shortener started with PostgreSQL. By the end of Pillar 4, it had PostgreSQL, Redis, Cassandra, InfluxDB, Elasticsearch, Pinecone, and S3. Redis was referenced throughout as a cache — but treated as a black box. Pillar 5 opened that black box.

Here's what the full caching layer looks like:

Redirect critical path (sho.rt/x7Kp2 → 302 to destination):

  Request
    → App Server
    → Redis Cluster (url:{short_code})
        Hit (99%+): return destination → 302 redirect
        Miss: → PostgreSQL → populate Redis → 302 redirect

  Cache policy:
    - allkeys-lfu eviction (power-law access distribution)
    - TTL: base 3600s ± 300s jitter (prevent correlated expiry)
    - Stale-while-revalidate: serve stale, refresh in background
    - Distributed lock on hard miss (prevent stampede)

  Distribution:
    - Redis Cluster: 3 primaries + 3 replicas
    - Consistent hashing via slot table (16,384 slots)
    - Automatic failover via cluster consensus

  Invalidation:
    - Active invalidation: on destination update, delete key immediately
    - TTL as backstop: stale entries expire within 1 hour regardless
    - Event-driven (via CDC) for multi-service invalidation

  Warming:
    - Pre-deployment: load top 500k links by 24h click volume
    - Canary traffic: ramp from 5% → 100% as hit ratio exceeds 90%
    - Redis RDB persistence: node restarts recover from last snapshot
    - Target hit ratio: ≥ 98% under normal operation

Other caches in the platform:

User session / auth token cache:
  - allkeys-lru (recency matters — active users stay warm)
  - TTL: 15 minutes (matches access token expiry)
  - No persistence (sessions are soft state — regenerate on miss)
  - Single Redis Sentinel setup (dataset fits one node)

Analytics aggregates (click counts for dashboards):
  - TTL only: 5 minutes (brief staleness is acceptable)
  - No active invalidation (aggregates recomputed on schedule)
  - Read-through pattern (cache populates from TimescaleDB on miss)

Rate limiting counters:
  - No eviction (volatile-ttl, keys expire naturally by window)
  - Primary only (no replica reads — counter accuracy matters)
  - Lua script for atomic increment + expiry check

The decision tree

Before adding a cache, work through this:

Is this data read much more often than it's written?
  No → caching may not help (high invalidation overhead)
  Yes → continue

Can brief staleness be tolerated?
  No → active invalidation required; consider write-through
  Yes → TTL alone may be sufficient

How often does the data change?
  Rarely → long TTL, cache-aside
  Frequently → short TTL + active invalidation + stale-while-revalidate

Is the dataset larger than one node?
  No → single Redis with Sentinel for HA
  Yes → Redis Cluster with consistent hashing

What's the access pattern?
  Power-law (small hot set) → LFU eviction
  Recency-driven (recent = relevant) → LRU eviction

Is the cache a load shield (DB can't handle cold traffic)?
  No → lazy warming (organic miss-and-populate) is fine
  Yes → pre-load hot set before routing traffic; canary ramp

Are there concurrent high-traffic readers on the same key?
  No → standard TTL + cache-aside
  Yes → stale-while-revalidate + TTL jitter + distributed lock on miss

What you can now do

After this pillar, you can:

Implement cache-aside, read-through, write-through, and write-behind patterns and choose among them for a given use case
Design a TTL strategy that balances hit ratio, staleness, and correlated expiry risk
Choose between Redis Sentinel and Redis Cluster based on dataset size and HA requirements
Configure eviction policies in Redis (maxmemory-policy) and explain why LFU outperforms LRU for power-law workloads
Recognise a cache stampede before it happens and apply stale-while-revalidate, TTL jitter, and distributed locks to prevent it
Design a cache warm-up strategy for deployments where the database cannot absorb cold-cache traffic

Up next: Pillar 6 — Scalability & Infrastructure

Caching is one of the fastest ways to reduce load on a system — but it's not the only one. Pillar 6 covers the infrastructure layer that sits above individual services: load balancers distributing traffic across servers, rate limiters protecting services from overload, reverse proxies abstracting backend topology, and the compression, checksums, and probabilistic data structures that make large-scale systems operationally tractable.

**← Previous: Cache Warming — proactively populating the cache before traffic arrives so you never start cold.*

Nine concepts: Client-Server Architecture, Load Balancing, Load Balancing Algorithms, Rate Limiting, Proxy vs Reverse Proxy, Data Compression, Checksums, Bloom Filter, and HyperLogLog.

*← Previous: Cache Warming — proactively populating the cache before traffic arrives so you never start cold.*This is the end of the Caching pillar. Continue to Pillar 6 — Scalability & Infrastructure →

Caching: Wrap-Up

Systems Design

Caching: Wrap-Up

The one thing to remember from each post

How they connect

End-to-end: the URL shortener's caching architecture

The decision tree

What you can now do

Up next: Pillar 6 — Scalability & Infrastructure

Comments

Systems Design

More from this blog

Docker & Kubernetes: What They Are, Why They Matter, and How to Get Started

Introduction to Rancher: Wrangling Kubernetes Clusters at Scale

Networking Fundamentals: A Beginner's Guide to How the Internet Actually Works

Distributed Systems: Wrap-Up

Observability: Understanding Your System at Runtime

Command Palette

Systems Design

Caching: Wrap-Up

The one thing to remember from each post

How they connect

End-to-end: the URL shortener's caching architecture

The decision tree

What you can now do

Up next: Pillar 6 — Scalability & Infrastructure

Comments

Systems Design

More from this blog