Serverless: Pay for What You Use, Not What You Provision

Series: System Design · Architecture Patterns — Pillar 7 of 8
Systems Design
| # | Post | What it covers |
|---|---|---|
| 00 | Architecture Patterns: How Systems Are Structured | Twenty patterns covering monoliths, microservices, events, resilience, deployment, and data processing. How to structure systems that survive growth. |
| 01 | Monolithic Architecture: The Default That Gets Abandoned Too Early | Monoliths are fast to build and easy to operate. Learn when they're the right choice, when they break down, and how to know the difference. |
| 02 | Microservices: The Architecture You Earn, Not Choose | Microservices enable independent scaling and team autonomy — but at significant cost. Learn what you actually get, what you pay, and when it's worth it. |
| 03 | Serverless: Pay for What You Use, Not What You Provision ← you are here | Serverless scales to zero and charges per invocation. Learn where it shines, where it fails, and how to design around cold starts and vendor lock-in. |
| 04 | Event-Driven Architecture: Decoupling Through Events | Event-driven systems communicate via events rather than direct calls. Learn how producers, consumers, and event brokers work — and the consistency tradeoffs involved. |
| 05 | Message Queues: Decoupling Produce from Consume | Message queues decouple producers and consumers, enable load levelling, and provide durability. Learn how they work and when to use Kafka vs SQS vs RabbitMQ. |
| 06 | Pub/Sub: Broadcasting Events to Multiple Consumers | Pub/sub decouples publishers from subscribers through topics. Learn how it differs from message queues and when to use Kafka, SNS, or Google Pub/Sub. |
| 07 | CQRS: When Reads and Writes Need Different Models | CQRS separates writes from reads so each can be optimised independently. Learn how it works, when it's worth the complexity, and when it isn't. |
| 08 | Event Sourcing: The Ledger, Not the Balance | Event sourcing stores state as a sequence of events. Learn how it works, what you get (audit log, time travel), and what it costs (complexity, schema evolution). |
| 09 | The Saga Pattern: Distributed Transactions Without Locks | The Saga pattern manages distributed transactions across services using compensating transactions. Learn choreography vs orchestration and when to use each. |
| 10 | The Outbox Pattern: Atomic Writes and Event Publishing | The Outbox pattern solves the dual-write problem — publishing an event and writing to a database atomically. Learn how it works using CDC or polling. |
| 11 | The Circuit Breaker: Stopping Cascading Failures | Circuit breakers prevent cascading failures by fast-failing calls to unhealthy dependencies. Learn the three states, how to configure them, and where to apply them. |
| 12 | The Bulkhead Pattern: Containing Failures Through Resource Isolation | Bulkheads isolate thread pools and connections per dependency so one failure can't exhaust resources needed by others. Learn how to apply them in practice. |
| 13 | The Sidecar Pattern: Cross-Cutting Concerns Without Code Changes | The sidecar pattern deploys a helper process alongside each service for logging, metrics, TLS, and service discovery — without modifying the service itself. |
| 14 | Service Mesh: A Programmable Network for Microservices | A service mesh handles service-to-service traffic, mTLS, circuit breaking, and observability via a fleet of sidecar proxies. Learn how it works and when to use it. |
| 15 | Service Discovery: Finding Services in a Dynamic Environment | Service discovery lets services find each other in dynamic environments. Learn client-side vs server-side discovery, health checks, and DNS vs registry approaches. |
| 16 | The Strangler Fig: Replacing a Legacy System Without Burning It Down | The Strangler Fig replaces a legacy system incrementally by routing specific functionality to new implementations while the old system keeps running. |
| 17 | Backend for Frontend: One API Per Client Type | BFF creates dedicated API backends per client type. Learn why one general API struggles to serve mobile and web well, and how BFF solves it. |
| 18 | ETL Pipelines: Moving Data from Operations to Analytics | ETL moves data from operational systems into analytical stores. Learn how pipelines work, what ELT is, and how to design reliable data movement at scale. |
| 19 | Batch vs Stream Processing: How Fresh Do Your Answers Need to Be? | Batch processes accumulate data then processes in bulk; streaming processes each event as it arrives. Learn the tradeoffs and when each is right. |
| 20 | MapReduce: Processing Petabytes in Parallel | MapReduce processes massive datasets in parallel by splitting work into map and reduce phases. Learn how it works and why Spark has largely replaced it. |
| 21 | Architecture Patterns: Wrap-Up | A recap of all 20 architecture patterns across decomposition, async communication, data patterns, resilience, and data processing. How they connect. |
Serverless: Pay for What You Use, Not What You Provision
The problem
Your URL shortener's link expiry feature needs to check every link once a day and send expiry notifications to users. The logic is simple: query links expiring in the next 24 hours, send an email per user, update a notified_at timestamp. It runs once a day, takes three minutes, and does nothing the other 23 hours and 57 minutes.
Running a dedicated server for this is wasteful — you're paying for compute that's idle 99.8% of the time. You could add it to the monolith, but then a bug in the expiry job can crash the entire application. You could make it a microservice, but managing a persistent service for a three-minute daily task is operational overhead out of proportion to the task's importance.
What you want is: "run this code at 2am daily, for exactly as long as it takes, and charge me only for those three minutes."
That's serverless.
The core idea
Serverless is a deployment model where you provide a function and a trigger; the platform manages all infrastructure. The function is invoked on demand, scales automatically from zero to thousands of concurrent executions, and you're billed per execution rather than per hour of server uptime. When not invoked, you pay nothing.
The analogy: a taxi vs a private car
Running a persistent server is like owning a car: you pay for it whether you drive it or not. Insurance, depreciation, parking — all fixed costs regardless of usage.
Serverless is like a taxi or ride-share: you pay only for trips taken. No standing cost. Scales instantly from 0 to however many trips you need. The downside: there's a delay when you hail a ride (cold start), you can't customise the car, and the fare per mile is higher than the per-mile cost of a car you already own and are using heavily.
For occasional, unpredictable trips, a taxi is cheaper. For daily commuting, a car wins.
How serverless works
The execution model
Cold starts
The most significant serverless operational concern. When a function hasn't been invoked recently, the platform must spin up a new execution environment: download the function package, initialise the runtime, run any global initialisation code. This adds 100ms–2s of latency before the function code runs.
Cold starts matter for:
Latency-sensitive endpoints: a user-facing API that occasionally cold-starts will have outlier latency spikes.
Functions with large runtimes: JVM (Java, Kotlin) cold starts are 500ms–3s. Node.js and Python cold starts are 50–300ms. Compiled binaries (Go, Rust) cold start in under 10ms.
Mitigations:
Provisioned concurrency (AWS Lambda): keep N instances always warm. You pay for the reserved capacity, but eliminate cold starts entirely.
Lightweight runtimes: prefer Node.js, Python, or Go for latency-sensitive functions.
Scheduled warm-up pings: invoke the function every few minutes to prevent it from going cold. Hacky but effective.
State and statelessness
Serverless functions are stateless by design. Each invocation may run on a different instance, with no memory of previous invocations. All persistent state must live in external storage: databases, caches, object stores.
The execution container may be reused across multiple invocations ("warm start"), but you cannot rely on this — write every function assuming it starts cold.
# ❌ Don't do this — counter is instance-local
counter = 0
def handle(event):
global counter
counter += 1 # won't be consistent across instances
return counter
# ✓ Do this — state in external store
def handle(event):
count = redis.incr("counter")
return count
What serverless is great for
Event-driven workloads: process S3 uploads, respond to Kafka messages, handle webhooks. Every event triggers a function invocation; the platform scales concurrency to match event volume.
Scheduled jobs: daily reports, hourly data processing, nightly cleanup. Define a cron schedule in the platform's console — no EC2 instance required.
Variable/spiky traffic: a marketing email goes out and traffic spikes 100x for 30 minutes, then drops to baseline. Serverless scales up automatically; you pay for the spike, not for reserved capacity that would otherwise sit idle.
Glue code / data pipelines: transform records between systems, fan out to multiple destinations, enrich events with lookups. Each step is a small function invoked by the previous step's output.
Edge computing: Cloudflare Workers run in 300+ edge locations globally, executing code close to users with < 1ms cold start. Ideal for response manipulation, auth, personalisation, and A/B testing at the CDN layer.
What serverless is poor for
Sustained high-throughput workloads: the redirect hot path in the URL shortener handles a million requests per second continuously. At Lambda pricing (~$0.0000002 per request), that's $1,728/month just in invocation fees — more expensive than dedicated instances at that scale. Long-running warm containers are cheaper for sustained load.
Long-running processes: Lambda has a 15-minute execution timeout. A function that streams results, maintains a WebSocket, or processes a large file in memory is a poor fit.
Latency-sensitive endpoints with cold start risk: without provisioned concurrency, a 1% cold start rate on a p99 latency target is visible to users.
Complex local development: emulating Lambda locally (AWS SAM, Serverless Framework) is imperfect. Local debugging has friction. Integration testing requires infrastructure mocks or a live AWS account.
The URL shortener's serverless uses
Tradeoffs
Operational simplicity vs platform dependency. Serverless eliminates server management entirely — no patching, no capacity planning, no autoscaling configuration. The cost is vendor lock-in: AWS Lambda functions aren't portable to GCP or Azure without rewriting.
Cost model inversion. Serverless is cheaper for infrequent or spiky workloads; more expensive for sustained high throughput. Calculate the crossover point for your specific workload — many teams don't and end up paying more than they would for EC2.
Debugging and observability. Distributed traces across Lambda invocations require investment. Cold starts are invisible in application metrics unless you specifically instrument them. Sampling-based tracing may miss rare failures.
The one thing to remember
Serverless is the right deployment model for workloads that are infrequent, bursty, or event-triggered — where paying for idle capacity is wasteful and operational simplicity outweighs the per-invocation cost premium. It's the wrong choice for sustained high-throughput workloads where the per-invocation fee exceeds reserved capacity cost, and for latency-sensitive endpoints where cold starts are unacceptable without provisioned concurrency. The "no servers" promise is real — but the cost model and operational constraints must be understood before committing.
← Previous: Microservices — when the monolith genuinely constrains you, here's what decomposing actually involves and what you're buying.
→ Next: Event-Driven Architecture — instead of services calling each other, they publish events and subscribe to events; a model that decouples producers from consumers at the cost of eventual consistency.




