Scalability & Infrastructure: Wrap-Up

Series: System Design · Scalability & Infrastructure — Pillar 6 of 8
Systems Design
| # | Post | What it covers |
|---|---|---|
| 00 | Scalability & Infrastructure: The Layer Between Your Code and the Internet | Nine concepts covering load balancing, rate limiting, proxies, compression, and probabilistic data structures that keep large systems fast and reliable. |
| 01 | Client-Server Architecture: The Model Everything Else Builds On | Client-server is the foundational model for distributed systems. Learn what clients and servers know, where state lives, and how the model scales. |
| 02 | Load Balancing: Distributing Traffic Across Servers | Load balancers distribute traffic across servers for scale and availability. Learn how they work, what types exist, and what they require of backend servers. |
| 03 | Load Balancing Algorithms: How Traffic Is Distributed | Round robin, least connections, IP hash, weighted — each algorithm makes different tradeoffs. Learn how to choose the right one for your workload. |
| 04 | Rate Limiting: Protecting Services from Overload | Rate limiting protects services from overload and abuse. Learn how token bucket, leaky bucket, and sliding window algorithms work and when to use each. |
| 05 | Proxy vs Reverse Proxy: Which Way Does It Face? | Forward proxies protect clients; reverse proxies protect servers. Learn how each works, what Nginx and Cloudflare do, and when you need which. |
| 06 | Data Compression: Smaller, Faster, Cheaper | Compression reduces bandwidth and storage costs. Learn how Gzip, Brotli, LZ4, and zstd work, where to apply them, and the CPU tradeoffs involved. |
| 07 | Checksums: Detecting Corruption Before It Becomes a Catastrophe | Checksums detect silent data corruption in transit and storage. Learn how CRC32, MD5, and SHA-256 work and where to apply them in distributed systems. |
| 08 | Bloom Filters: Answering "Have I Seen This?" Without Storing Everything | A Bloom filter answers "have I seen this?" in constant memory. Learn how they work, why false positives are acceptable, and where they're used in production. |
| 09 | HyperLogLog: Counting Distinct Items Without Storing Them | HyperLogLog counts distinct values in ~1.5 KB of memory with <2% error. Learn how it works and why Redis, BigQuery, and Postgres use it. |
| 10 | Scalability & Infrastructure: Wrap-Up ← you are here | A recap of all 9 scalability concepts: load balancing, rate limiting, proxies, compression, checksums, Bloom filters, and HyperLogLog. How they fit together. |
Scalability & Infrastructure: Wrap-Up
Nine concepts covering the infrastructure layer that sits between your code and the internet. This post ties them together — not as a glossary, but as a map of how they interact in a production system.
The one thing to remember from each post
Client-Server Architecture — clients initiate, servers respond. Stateless servers (all client state in the request or in an external store) are the foundation of horizontal scalability. Every other concept in this pillar exists to manage the client-server relationship at scale.
Load Balancing — a load balancer makes a pool of servers look like one, distributes requests, and removes unhealthy servers from rotation via health checks. Servers must be stateless — or the load balancer must maintain session stickiness, which trades resilience for state affinity.
Load Balancing Algorithms — round robin is simple and fair for uniform workloads; least connections adapts to variable request duration. Consistent hashing minimises cache invalidation when the server pool changes. Match the algorithm to your workload characteristics.
Rate Limiting — caps request rate per client, protecting services from abuse and runaway callers. Token bucket allows short bursts while enforcing a sustained rate. Use Redis for distributed rate limiting (global counter across all servers). Always return Retry-After.
Proxy vs Reverse Proxy — a forward proxy represents clients to servers (hiding client identity). A reverse proxy represents servers to clients (hiding backend topology). Nginx, Cloudflare, and AWS ALB are reverse proxies — they handle TLS termination, load balancing, routing, and compression before requests reach application code.
Data Compression — Gzip and Brotli compress HTTP text responses by 70–85%. Enable at the reverse proxy for all text content; skip already-compressed binary content. zstd is the modern choice for data at rest. Compression is almost always worth it for text over any network.
Checksums — compute a fingerprint before storage or transmission; recompute and compare on retrieval to detect silent corruption. CRC32 for speed-critical paths. MD5 for file integrity when adversarial forgery isn't a concern. SHA-256 for tamper detection.
Bloom Filter — a bit array with k hash functions that answers "definitely not present" or "probably present" in constant memory. No false negatives; controllable false positive rate. Used in LSM trees (skip SSTable reads), web crawlers (avoid re-crawling), and deduplication systems.
HyperLogLog — estimates the count of distinct items in constant memory (~12KB) with <2% error. Redis's PFADD/PFCOUNT/PFMERGE implement it natively. The right tool for "how many unique visitors/users/events?" in analytics at any scale.
How they connect
Reverse proxy + load balancing + rate limiting are the three concentric rings of traffic management. The reverse proxy is the outer ring — it terminates TLS, compresses responses, and provides the single public endpoint. Load balancing distributes traffic behind it. Rate limiting sits at the proxy (or inside the application) to protect backends from any single client consuming a disproportionate share.
Load balancing + load balancing algorithms + client-server architecture are inseparable. The algorithm determines distribution quality; the architecture determines what the algorithm can assume. Stateless backends with any-server routing → round robin or least connections. Cache-aware backends → consistent hashing. Stateful backends → IP hash or cookie-based stickiness.
Compression + checksums often appear together. HTTP responses are compressed (reduce bandwidth) and signed with checksums (detect corruption in transit). Object storage (S3) does both: Gzip or zstd compresses the stored object; the ETag (MD5) verifies integrity on download.
Bloom filter + HyperLogLog are the probabilistic pair. Bloom filter answers "have I seen this item?" — useful for deduplication (click uniqueness, URL crawl state). HyperLogLog answers "how many distinct items have I seen?" — useful for cardinality reporting (unique visitor counts, distinct query patterns). They answer different questions about the same kinds of streams.
End-to-end: the URL shortener's complete infrastructure layer
User's browser: sho.rt/x7Kp2
1. DNS resolution (Pillar 2):
sho.rt → 104.21.x.x (Cloudflare edge node)
2. Cloudflare edge (reverse proxy, forward proxy for edge cache):
- TLS termination (certificate held at edge)
- Brotli compression for all text responses
- DDoS protection, bot detection
- Rate limiting: 100 redirects/min per IP (token bucket via Cloudflare's edge)
- Cache: popular redirect responses cached at edge (TTL 60s)
- Cache hit (popular link): 302 returned from edge — 0 backend requests
- Cache miss: forward to origin
3. AWS ALB (reverse proxy + load balancer):
- Routes all requests to app server fleet
- Algorithm: least connections (mix of fast redirects and slow API calls)
- Health checks: GET /health every 10s, 3 failures → remove from rotation
- TLS terminated at Cloudflare; internal traffic over HTTP within VPC
4. App servers (stateless, horizontal fleet):
- Rate limit check: Redis INCR (token bucket, per API key)
- Redirect path:
a. BF.EXISTS hll:link:x7Kp2:today with ip → Bloom filter deduplication
b. PFADD hll:unique:x7Kp2:today with ip → HyperLogLog unique visitor count
c. GET url:x7Kp2 → Redis cache (hit: return URL, miss: query PostgreSQL)
d. 302 redirect
5. PostgreSQL primary (on cache miss):
- Index scan on links(short_code) = 'x7Kp2'
- Returned via PgBouncer connection pool
- CRC32 page checksums verify data integrity on each page read
6. S3 (QR codes, thumbnails):
- All uploads include Content-MD5 header for integrity verification
- All objects served via CloudFront with zstd compression for JSON exports
The decision tree
Before adding infrastructure components:
Do you need traffic to reach multiple backend servers?
→ Load balancer (with health checks)
→ Algorithm: round robin (uniform), least connections (variable duration),
consistent hashing (cache-aware or stateful affinity)
Do you need to protect services from client overload or abuse?
→ Rate limiter (token bucket for APIs; fixed window for coarse protection)
→ Place at: edge/CDN (DDoS), reverse proxy (per-service), app code (per-user-feature)
Do you need to hide backend topology from clients?
→ Reverse proxy (Nginx, Cloudflare, ALB)
→ It provides: TLS termination, routing, load balancing, caching, compression
Are you sending or storing text data over a network or to disk?
→ Enable compression (Brotli/Gzip for HTTP; zstd for storage)
→ Skip compression for already-compressed binary (images, video, archives)
Are you transmitting or storing data where silent corruption is a risk?
→ Add checksums (CRC32 for speed-critical; SHA-256 for tamper detection)
Do you need to answer "have I seen this item before?" at high volume?
→ Bloom filter (constant memory, no false negatives, tunable false positive rate)
Do you need to count distinct items in a stream or large dataset?
→ HyperLogLog (constant ~12KB memory, <2% error, merge-friendly)
What you can now do
After this pillar, you can:
- Design a load-balanced server fleet, select the appropriate algorithm, and explain the tradeoffs of each
- Implement rate limiting at the appropriate layer with the appropriate algorithm for the workload
- Explain what a reverse proxy does, what it hides, and why it's the right place to terminate TLS
- Enable and tune Gzip/Brotli compression in Nginx and explain the CPU-vs-bandwidth tradeoff
- Add checksums to data pipelines, explain CRC32 vs MD5 vs SHA-256, and articulate what corruption they can and can't detect
- Implement a Bloom filter (in Redis or a language library) for deduplication and explain false positive rate tradeoffs
- Use HyperLogLog (Redis
PFADD/PFCOUNT) for approximate distinct counting and explain why it's preferred over exact counting at scale
Up next: Pillar 7 — Architecture Patterns
The infrastructure layer manages how traffic flows. Pillar 7 covers how the application itself is structured — the patterns for dividing a system into services, for handling events asynchronously, for making services resilient to partial failures, and for evolving a system over time without rewriting it.
Twenty concepts: Monolithic, Microservices, Serverless, Event-Driven Architecture, Message Queues, Pub/Sub, CQRS, Event Sourcing, Saga, Outbox, Circuit Breaker, Bulkhead, Sidecar, Service Mesh, Service Discovery, Strangler Fig, Backend for Frontend, ETL Pipelines, Batch vs Stream Processing, and MapReduce.
**← Previous: HyperLogLog — counting distinct items without storing them; the probabilistic algorithm behind "how many unique users visited today?"*
*← Previous: HyperLogLog — counting distinct items without storing them; the probabilistic algorithm behind "how many unique users visited today?"*This is the end of the Scalability & Infrastructure pillar. Continue to Pillar 7 — Architecture Patterns →




