Series: System Design · Data & Storage — Pillar 4 of 8

Systems Design

#	Post	What it covers
00	Data & Storage: Where Everything Lives	Where data lives shapes everything about a system. Nineteen concepts covering databases, indexing, sharding, replication, and the data structures underneath. (161 chars)
01	SQL vs NoSQL: Choosing the Right Database	SQL vs NoSQL isn't a simple choice. Learn what each type optimises for, when to use relational databases, and when NoSQL is the right call.
02	Database Indexing: The Highest-Leverage Performance Tool	Indexes are the highest-leverage database performance tool. Learn how they work, what they cost, and how to decide when to add one.
03	B-Trees & B+ Trees: The Data Structure Behind Database Indexes	Almost every database index is built on a B-tree or B+ tree. Learn how they work, why they're fast, and what this means for your queries.
04	LSM Trees: Why Some Databases Are Built for Writes	LSM trees power Cassandra, RocksDB, and LevelDB. Learn how they achieve massive write throughput and what they trade off to get it.
05	Denormalisation: Trading Storage for Speed	Denormalisation trades storage for read speed by pre-computing joins. Learn when it helps, when it hurts, and how to do it safely.
06	Database Sharding: Scaling Beyond a Single Node	Sharding splits a database across multiple nodes. Learn how it works, the strategies available, and the significant tradeoffs it introduces.
07	Data Partitioning: Choosing How to Divide Your Data ← you are here	Range, hash, and list partitioning each make different tradeoffs. Learn how to divide data effectively for queries, maintenance, and scale.
08	Consistent Hashing: Minimising Resharding Pain	Consistent hashing minimises data movement when nodes are added or removed. Learn how it works and why it's fundamental to distributed systems.
09	Replication & Read Replicas: Scaling Reads and Surviving Failures	Replication copies data across nodes for fault tolerance and read scaling. Learn how primary-replica setups work and when to use them.
10	Object Storage: Unlimited Scale for Large Binary Data	Object storage handles large binary files at unlimited scale. Learn how it works, why it replaced file servers, and when to use it.
11	Block vs File vs Object Storage: Three Models, Three Use Cases	Three storage models, three different use cases. Learn what block, file, and object storage optimise for and how to choose between them.
12	Distributed File Systems: File Storage Across Many Machines	Distributed file systems spread file storage across many machines. Learn how HDFS, Ceph, and GlusterFS work and when to use them.
13	Time Series Databases: Built for Metrics and Events	Time series databases handle append-heavy metric data far better than SQL. Learn how they work and when to use InfluxDB, Prometheus, or TimescaleDB.
14	Vector Databases: Semantic Search and AI Memory	Vector databases power semantic search, recommendations, and LLM memory. Learn how embeddings work, what ANN search is, and when to use one.
15	Full-Text Search Engines: Beyond SQL LIKE	Full-text search needs more than SQL LIKE. Learn how inverted indexes, relevance ranking, and Elasticsearch make text search fast and powerful.
16	Materialized Views: Pre-Computing Expensive Queries	Materialized views cache expensive query results as physical tables. Learn how they work, when to refresh them, and when to use them vs other approaches.
17	Query Optimisation: From Slow to Fast	Slow queries aren't always fixed by adding indexes. Learn how to read EXPLAIN output, understand query plans, and systematically make queries fast.
18	Connection Pooling: Managing the Hidden Bottleneck	Opening a database connection per request doesn't scale. Learn how connection pooling works, what PgBouncer does, and how to size your pool correctly.
19	Data & Storage: Wrap-Up	A recap of all 19 data storage concepts: SQL, NoSQL, indexing, sharding, replication, specialised databases, and how they connect in a real system.

Data Partitioning: Choosing How to Divide Your Data

The problem

Sharding explained why you might split data across multiple nodes. But the previous post skimmed over a crucial question: exactly how do you decide which row goes to which shard?

This isn't just a sharding question. PostgreSQL's native table partitioning, Cassandra's partition keys, Kafka's topic partitioning, and S3's key prefix strategies all involve the same fundamental choice. The strategy you choose determines:

Which queries are fast (partition-local) vs slow (cross-partition)
Whether data distributes evenly or creates hot spots
How expensive it is to add new partitions as data grows
How easily you can drop old data (for retention policies)

Getting the partitioning strategy wrong in a sharded system means resharding. Getting it wrong in a time-series database means your most recent partition absorbs all writes while historical partitions sit idle. The stakes are high enough that partitioning deserves its own careful treatment.

The core idea

Data partitioning divides a dataset into discrete subsets (partitions) based on a partition key. The choice of partition key and partitioning strategy determines the data's distribution, which queries benefit from partition pruning, and how the system behaves under growth and deletion.

The analogy: a filing cabinet with different organisation schemes

You have 10,000 customer documents to file. Three ways to organise them:

Range (alphabetical by surname): all A–F customers in drawer 1, G–M in drawer 2, N–Z in drawer 3. Finding all customers with surname "Smith" means opening one drawer. Adding a new customer is easy — just file in the right drawer. Problem: surnames aren't evenly distributed. Much more data for "S" than for "X".

Hash (by customer number): hash the customer number, assign to a drawer based on the hash result. Distribution is perfectly even — each drawer holds roughly the same number of files. Finding all customers with surname "Smith" requires checking every drawer (no partition pruning). Adding a new drawer means rehashing and redistributing.

List (by region): drawer 1 holds Sydney customers, drawer 2 holds Melbourne, drawer 3 holds Brisbane. Queries for "all Sydney customers" go to one drawer. Doesn't work for values not in the list; drawers can be very unequal if regions have different population sizes.

Each approach organises the same data differently. The right choice depends on how you need to access it.

How it works

Range partitioning

Data is assigned to partitions based on value ranges of the partition key.

-- PostgreSQL declarative partitioning by date range
CREATE TABLE clicks (
  id          BIGSERIAL,
  link_id     BIGINT,
  clicked_at  TIMESTAMPTZ NOT NULL,
  country     TEXT,
  device      TEXT
) PARTITION BY RANGE (clicked_at);

CREATE TABLE clicks_2025_q1 PARTITION OF clicks
  FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');

CREATE TABLE clicks_2025_q2 PARTITION OF clicks
  FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

CREATE TABLE clicks_2025_q3 PARTITION OF clicks
  FOR VALUES FROM ('2025-07-01') TO ('2025-10-01');

A query with WHERE clicked_at BETWEEN '2025-04-01' AND '2025-06-30' touches only the Q2 partition — the database prunes all others. This is partition pruning: the query planner eliminates partitions whose ranges don't overlap the filter.

Strengths:

Range queries are fast — partition pruning eliminates irrelevant partitions
Data retention is efficient — dropping old data means dropping an entire partition: DROP TABLE clicks_2024_q1, which is near-instant compared to a mass DELETE
New partitions are easy to add — just define the next range

Weaknesses:

Hotspot risk — in time-series data, all writes go to the current partition. Yesterday's partitions receive only reads; today's partition receives all writes
Uneven distribution — popular ranges accumulate more data. If January has 3x the clicks of July, the Q1 partition is 3x larger

Hash partitioning

Data is assigned to partitions based on a hash function applied to the partition key.

-- PostgreSQL hash partitioning on link_id
CREATE TABLE clicks (
  id          BIGSERIAL,
  link_id     BIGINT,
  clicked_at  TIMESTAMPTZ NOT NULL
) PARTITION BY HASH (link_id);

-- 4 partitions, modulus 4
CREATE TABLE clicks_p0 PARTITION OF clicks FOR VALUES WITH (modulus 4, remainder 0);
CREATE TABLE clicks_p1 PARTITION OF clicks FOR VALUES WITH (modulus 4, remainder 1);
CREATE TABLE clicks_p2 PARTITION OF clicks FOR VALUES WITH (modulus 4, remainder 2);
CREATE TABLE clicks_p3 PARTITION OF clicks FOR VALUES WITH (modulus 4, remainder 3);

link_id = 12345 goes to partition 12345 % 4 = 1. All writes for all links distribute evenly across partitions — no hotspots.

Strengths:

Even distribution — hash functions distribute keys uniformly across partitions
No write hotspots — new data distributes across all partitions simultaneously

Weaknesses:

No partition pruning for range queries — WHERE clicked_at > '2025-01-01' cannot be pruned because clicked_at isn't the partition key
Resharding is expensive — changing the number of partitions changes the modulus, which changes the assignment of most existing keys. Nearly every row must move
Data retention is complex — you can't drop "all clicks older than 90 days" by dropping a partition; you need a mass DELETE

List partitioning

Data is assigned to partitions based on specific values of the partition key.

-- Partition click events by region
CREATE TABLE clicks (
  id          BIGSERIAL,
  link_id     BIGINT,
  region      TEXT NOT NULL,
  clicked_at  TIMESTAMPTZ NOT NULL
) PARTITION BY LIST (region);

CREATE TABLE clicks_apac    PARTITION OF clicks FOR VALUES IN ('AU', 'JP', 'SG', 'IN');
CREATE TABLE clicks_emea    PARTITION OF clicks FOR VALUES IN ('GB', 'DE', 'FR', 'ZA');
CREATE TABLE clicks_americas PARTITION OF clicks FOR VALUES IN ('US', 'CA', 'BR', 'MX');

Queries filtered by region (WHERE region IN ('AU', 'JP')) touch only the APAC partition.

Strengths:

Logical grouping — data that naturally belongs together (by region, category, tenant) lives together
Partition pruning for equality and IN queries
Easy to add new regions without resharding

Weaknesses:

Manual maintenance — someone must add new partition values as new regions/categories emerge
Uneven distribution — regions with more data create larger partitions
Doesn't work for high-cardinality partition keys

Composite partitioning

Combine strategies for more control. A common pattern for time-series data:

-- First partition by time range (for efficient retention and range queries)
-- Then sub-partition by hash of link_id (to distribute write load within each time window)
CREATE TABLE clicks PARTITION BY RANGE (clicked_at);

CREATE TABLE clicks_2025_q3 PARTITION OF clicks
  FOR VALUES FROM ('2025-07-01') TO ('2025-10-01')
  PARTITION BY HASH (link_id);

CREATE TABLE clicks_2025_q3_p0 PARTITION OF clicks_2025_q3
  FOR VALUES WITH (modulus 4, remainder 0);
-- etc.

Range-then-hash gives you both retention efficiency (drop whole quarters) and write distribution (all four sub-partitions accept writes simultaneously within the current quarter).

Partitioning vs sharding

A common source of confusion: partitioning and sharding describe the same idea at different scales.

Partitioning (within a node): PostgreSQL's table partitioning divides a logical table into physical sub-tables on the same server. The benefit is partition pruning (faster queries) and partition-level maintenance (efficient DROP). There's no scaling of write throughput or storage — it's all on one machine.

Sharding (across nodes): distributes partitions across separate physical servers. Now each shard handles its own reads and writes independently. Write throughput and storage scale horizontally.

Sharding is partitioning distributed across a cluster. The partitioning strategies (range, hash, list) apply equally to both.

The tradeoffs

Query alignment: the partition key should match your most frequent query filter. If you partition by user_id but most queries filter by created_at, you lose partition pruning on your most important queries.

Write distribution vs query performance: hash partitioning distributes writes evenly but loses range query pruning. Range partitioning enables efficient range queries but creates write hotspots for time-series data. Pick the strategy that serves your dominant workload.

Data retention: range partitioning by time makes retention trivial — drop old partitions. Hash partitioning makes retention complex — mass DELETE across all partitions. If data has a TTL, range partitioning by time almost always wins.

Rebalancing cost: adding more partitions to a hash-partitioned table requires resharding (moving data). Range-partitioned tables simply add new ranges at the end. Plan for growth.

The one thing to remember

The partition key is a query-routing decision. Queries that include an equality or range filter on the partition key benefit from partition pruning — only the relevant partitions are read. Queries that don't include the partition key in their filter touch all partitions. Design your partitioning strategy around the filter your most important, most frequent queries actually use.

← Previous: Database Sharding: Scaling Beyond a Single Node — Sharding splits a database across multiple nodes. Learn how it works, the strategies available, and the significant t...

→ Next: Consistent Hashing: Minimising Resharding Pain — Consistent hashing minimises data movement when nodes are added or removed. Learn how it works and why it's fundament...

Data Partitioning: Choosing How to Divide Your Data

Systems Design

Data Partitioning: Choosing How to Divide Your Data

The problem

The core idea

The analogy: a filing cabinet with different organisation schemes

How it works

Range partitioning

Hash partitioning

List partitioning

Composite partitioning

Partitioning vs sharding

The tradeoffs

The one thing to remember

Comments

Systems Design

Understanding API Gateways: A Basic Overview

More from this blog

Docker & Kubernetes: What They Are, Why They Matter, and How to Get Started

Introduction to Rancher: Wrangling Kubernetes Clusters at Scale

Networking Fundamentals: A Beginner's Guide to How the Internet Actually Works

Distributed Systems: Wrap-Up

Observability: Understanding Your System at Runtime

Command Palette

Systems Design

Data Partitioning: Choosing How to Divide Your Data

The problem

The core idea

The analogy: a filing cabinet with different organisation schemes

How it works

Range partitioning

Hash partitioning

List partitioning

Composite partitioning

Partitioning vs sharding

The tradeoffs

The one thing to remember

Comments

Systems Design

Understanding API Gateways: A Basic Overview

More from this blog