Series: System Design · Scalability & Infrastructure — Pillar 6 of 8

Systems Design

#	Post	What it covers
00	Scalability & Infrastructure: The Layer Between Your Code and the Internet ← you are here	Nine concepts covering load balancing, rate limiting, proxies, compression, and probabilistic data structures that keep large systems fast and reliable.
01	Client-Server Architecture: The Model Everything Else Builds On	Client-server is the foundational model for distributed systems. Learn what clients and servers know, where state lives, and how the model scales.
02	Load Balancing: Distributing Traffic Across Servers	Load balancers distribute traffic across servers for scale and availability. Learn how they work, what types exist, and what they require of backend servers.
03	Load Balancing Algorithms: How Traffic Is Distributed	Round robin, least connections, IP hash, weighted — each algorithm makes different tradeoffs. Learn how to choose the right one for your workload.
04	Rate Limiting: Protecting Services from Overload	Rate limiting protects services from overload and abuse. Learn how token bucket, leaky bucket, and sliding window algorithms work and when to use each.
05	Proxy vs Reverse Proxy: Which Way Does It Face?	Forward proxies protect clients; reverse proxies protect servers. Learn how each works, what Nginx and Cloudflare do, and when you need which.
06	Data Compression: Smaller, Faster, Cheaper	Compression reduces bandwidth and storage costs. Learn how Gzip, Brotli, LZ4, and zstd work, where to apply them, and the CPU tradeoffs involved.
07	Checksums: Detecting Corruption Before It Becomes a Catastrophe	Checksums detect silent data corruption in transit and storage. Learn how CRC32, MD5, and SHA-256 work and where to apply them in distributed systems.
08	Bloom Filters: Answering "Have I Seen This?" Without Storing Everything	A Bloom filter answers "have I seen this?" in constant memory. Learn how they work, why false positives are acceptable, and where they're used in production.
09	HyperLogLog: Counting Distinct Items Without Storing Them	HyperLogLog counts distinct values in ~1.5 KB of memory with <2% error. Learn how it works and why Redis, BigQuery, and Postgres use it.
10	Scalability & Infrastructure: Wrap-Up	A recap of all 9 scalability concepts: load balancing, rate limiting, proxies, compression, checksums, Bloom filters, and HyperLogLog. How they fit together.

Scalability & Infrastructure: The Layer Between Your Code and the Internet

The scenario

Your URL shortener has users. Real ones. One morning a tech newsletter links to your platform. Traffic spikes to forty times normal volume in ten minutes. One IP address hammers the redirect API trying to scrape link destinations — five thousand requests per second from a single host. A misconfigured upstream service starts sending malformed payloads. And the response time for every API call is twenty milliseconds longer than it was last week, even though no application code has changed.

The application code is fine. The database is fine. The problem is the infrastructure layer between your users and your application: how traffic is distributed, how abuse is controlled, how requests are routed, and how data moves efficiently across the network.

This pillar covers the systems and mechanisms that handle these concerns — not inside your application, but around it. Nine concepts:

Client-Server Architecture — the foundational model all the other concepts build on
Load Balancing — distributing traffic across servers so no single instance is overwhelmed
Load Balancing Algorithms — the strategies that determine how traffic is distributed
Rate Limiting — protecting services from abuse and overload
Proxy vs Reverse Proxy — the intermediaries that sit between clients and servers
Data Compression — reducing the size of data in transit and at rest
Checksums — detecting data corruption in transit and storage
Bloom Filter — a probabilistic structure that answers "have I seen this before?" in constant space
HyperLogLog — a probabilistic structure that counts distinct values without storing them

TL;DR: Load balancing distributes traffic for scale and availability; the algorithm choice determines fairness, stickiness, and health-awareness. Rate limiting protects services from abusive or runaway clients without human intervention. Reverse proxies provide a clean separation between the public internet and backend services. Compression trades CPU for network bandwidth — almost always worth it for text payloads. Checksums detect data corruption silently introduced by storage, network, or software bugs. Bloom filters and HyperLogLog are probabilistic data structures that answer common questions (have I seen this? how many distinct items?) at constant memory cost, by accepting a small, bounded error rate.

What this pillar covers

Client-Server Architecture — the foundation

Before load balancers, proxies, and rate limiters, there is the basic model: clients make requests, servers respond. Understanding the model deeply — what clients know, what servers know, where state lives, how scale changes the model — is the foundation for everything else in this pillar.

Best mental model: a restaurant. Customers (clients) sit at tables and place orders. Waitstaff (the network) carry requests to the kitchen (servers) and return with responses. The kitchen doesn't know which table ordered what — the waiter coordinates that. As the restaurant gets busier, you add more kitchen stations (servers) and more waitstaff (load balancers and proxies).

Load Balancing — distributing traffic

A single server has a capacity ceiling. A load balancer sits in front of a pool of servers and distributes incoming requests across them. When traffic exceeds one server's capacity, add more servers — the load balancer handles the distribution. When a server fails, the load balancer stops sending traffic to it. Load balancing is the mechanism that makes horizontal scaling work.

Best mental model: a supermarket with multiple checkout lanes. A lane manager (load balancer) watches all lanes and directs each new shopper to the least-busy lane. No single cashier is overwhelmed while others are idle.

Load Balancing Algorithms — how traffic is distributed

Round robin, weighted round robin, least connections, least response time, IP hash, consistent hashing — each algorithm optimises for different properties. Round robin is simple and fair under uniform request costs. Least connections adapts to variable request duration. IP hash provides session stickiness. Consistent hashing minimises cache invalidation when servers are added or removed.

Best mental model: different lane assignment rules for the supermarket. "Next available lane" is round robin. "Lane with fewest shoppers" is least connections. "Always lane 3 for shoppers whose loyalty card ends in 7" is hash-based routing.

Rate Limiting — protecting services

A service that accepts any volume of requests from any client is a service that can be overwhelmed by accident or by malice. Rate limiting caps the request rate per client (or per API key, per IP, per endpoint) and rejects requests that exceed the limit. Token bucket, leaky bucket, fixed window, and sliding window are the algorithms behind rate limiters.

Best mental model: a nightclub with a doorman. The doorman allows a maximum of N people per minute regardless of who's asking. Those over the limit are turned away (or asked to wait). The doorman doesn't care why they're coming — just how many.

Proxy vs Reverse Proxy — the intermediaries

A forward proxy sits on the client side: it makes requests on behalf of clients, hiding client identity from servers. A reverse proxy sits on the server side: it accepts requests on behalf of servers, hiding backend topology from clients. Reverse proxies are ubiquitous — Nginx, HAProxy, Cloudflare, and AWS ALB are all reverse proxies. They enable TLS termination, load balancing, caching, authentication, and request routing without the application code knowing.

Best mental model: a corporate mail room. An outgoing mail room (forward proxy) sends letters on behalf of employees — recipients see the mail room's return address. An incoming reception desk (reverse proxy) receives all mail addressed to the company and routes it to the right department — senders don't know which desk or floor handled it.

Data Compression — smaller payloads, faster transfers

Network bandwidth is not free, and payload size directly affects latency and cost. Compression algorithms reduce the byte size of data in transit and at rest. Gzip and Brotli compress HTTP response bodies — most JSON, HTML, and CSS compress by 70–90%. zstd and LZ4 compress data at rest in databases and object stores. The tradeoff is CPU time for compression and decompression — almost always worth it for text data over any network.

Best mental model: packing a suitcase efficiently. The same clothes, compressed into a smaller case. You spend a bit of time folding and rolling (CPU), but the case fits in the overhead bin (bandwidth) rather than needing checked luggage (larger payload).

Checksums — detecting silent corruption

Data gets corrupted. A bit flips in transit. A storage device has a quiet write error. A network packet arrives with one byte wrong. None of these failures announce themselves — the receiving system has no idea unless it checks. A checksum is a short value computed from the data; recomputing it after transmission or storage and comparing to the stored value reveals corruption. MD5, SHA-256, CRC32, and Adler-32 are common algorithms for different tradeoffs of speed, collision resistance, and byte cost.

Best mental model: a bank's routing number check digit. The last digit of a routing number is calculated from the other digits. If a single digit is entered incorrectly, the check digit will be wrong — a silent error is caught before a transfer goes to the wrong account.

Bloom Filter — constant-space membership testing

A Bloom filter answers the question "have I seen this item before?" in constant memory and constant time, regardless of how many items have been seen. It can return false positives (saying "yes" when the answer is "no") but never false negatives (saying "no" when the answer is "yes"). Used by databases, CDNs, and deduplication systems to avoid expensive lookups for items almost certainly not in a set.

Best mental model: a quick pre-screening before a thorough background check. The pre-screen is fast and cheap but occasionally flags innocent candidates (false positive). If the pre-screen clears someone, they are definitely innocent (no false negatives) — no background check needed. Only flagged candidates proceed to the full check.

HyperLogLog — counting distinct items at scale

How many distinct IP addresses hit the platform today? How many unique users clicked a given link? Answering these questions exactly requires storing every IP address or user ID seen — O(n) memory. HyperLogLog answers these questions approximately in constant memory (~1.5 KB), with less than 2% error at any scale. Used by Redis, Google BigQuery, and analytics platforms to count distinct values in streams and large datasets.

Best mental model: counting attendees at a massive festival without a guest list. Instead of recording every name, you note the highest unique raffle number you see. Someone with raffle number 12,847 implies roughly 12,847 people attended — a reasonable estimate from a single observation.

The URL shortener at this stage

Entering Pillar 6, the URL shortener has a complete data layer. The infrastructure layer in front of it looks like this:

Internet
  ↓
DNS (Pillar 2) → routes sho.rt to load balancer IP
  ↓
Load balancer (Nginx / AWS ALB) → distributes to app servers
  ↓
Reverse proxy (terminates TLS, handles compression, routes by path)
  ↓
App servers (handles requests, checks rate limits, calls cache/DB)
  ↓
Redis ← Cache layer (Pillar 5)
  ↓
PostgreSQL + Cassandra + ... ← Data layer (Pillar 4)

By the end of this pillar, each of those infrastructure components will be fully described — what algorithm the load balancer uses, how rate limiting protects the redirect API, how Brotli compression reduces response size, how checksums protect stored link data, and how Bloom filters and HyperLogLog power the analytics pipeline efficiently.

→ Next: Client-Server Architecture — the foundational model

Scalability & Infrastructure: The Layer Between Your Code and the Internet

Systems Design

Scalability & Infrastructure: The Layer Between Your Code and the Internet

The scenario

What this pillar covers