# Block vs File vs Object Storage: Three Models, Three Use Cases

> **Series:** System Design · Data & Storage — Pillar 4 of 8

* * *

# Block vs File vs Object Storage: Three Models, Three Use Cases

## Systems Design

| # | Post | What it covers |
| --- | --- | --- |
| 00 | [Data & Storage: Where Everything Lives](/data-storage-where-everything-lives) | Where data lives shapes everything about a system. Nineteen concepts covering databases, indexing, sharding, replication, and the data structures underneath. (161 chars) |
| 01 | [SQL vs NoSQL: Choosing the Right Database](/sql-vs-nosql-choosing-the-right-database) | SQL vs NoSQL isn't a simple choice. Learn what each type optimises for, when to use relational databases, and when NoSQL is the right call. |
| 02 | [Database Indexing: The Highest-Leverage Performance Tool](/database-indexing-the-highest-leverage-performance-tool) | Indexes are the highest-leverage database performance tool. Learn how they work, what they cost, and how to decide when to add one. |
| 03 | [B-Trees & B+ Trees: The Data Structure Behind Database Indexes](/b-trees-b-trees-the-data-structure-behind-database-indexes) | Almost every database index is built on a B-tree or B+ tree. Learn how they work, why they're fast, and what this means for your queries. |
| 04 | [LSM Trees: Why Some Databases Are Built for Writes](/lsm-trees-why-some-databases-are-built-for-writes) | LSM trees power Cassandra, RocksDB, and LevelDB. Learn how they achieve massive write throughput and what they trade off to get it. |
| 05 | [Denormalisation: Trading Storage for Speed](/denormalisation-trading-storage-for-speed) | Denormalisation trades storage for read speed by pre-computing joins. Learn when it helps, when it hurts, and how to do it safely. |
| 06 | [Database Sharding: Scaling Beyond a Single Node](/database-sharding-scaling-beyond-a-single-node) | Sharding splits a database across multiple nodes. Learn how it works, the strategies available, and the significant tradeoffs it introduces. |
| 07 | [Data Partitioning: Choosing How to Divide Your Data](/data-partitioning-choosing-how-to-divide-your-data) | Range, hash, and list partitioning each make different tradeoffs. Learn how to divide data effectively for queries, maintenance, and scale. |
| 08 | [Consistent Hashing: Minimising Resharding Pain](/consistent-hashing-minimising-resharding-pain) | Consistent hashing minimises data movement when nodes are added or removed. Learn how it works and why it's fundamental to distributed systems. |
| 09 | [Replication & Read Replicas: Scaling Reads and Surviving Failures](/replication-read-replicas-scaling-reads-and-surviving-failures) | Replication copies data across nodes for fault tolerance and read scaling. Learn how primary-replica setups work and when to use them. |
| 10 | [Object Storage: Unlimited Scale for Large Binary Data](/object-storage-unlimited-scale-for-large-binary-data) | Object storage handles large binary files at unlimited scale. Learn how it works, why it replaced file servers, and when to use it. |
| 11 | **Block vs File vs Object Storage: Three Models, Three Use Cases** ← you are here | Three storage models, three different use cases. Learn what block, file, and object storage optimise for and how to choose between them. |
| 12 | [Distributed File Systems: File Storage Across Many Machines](/distributed-file-systems-file-storage-across-many-machines) | Distributed file systems spread file storage across many machines. Learn how HDFS, Ceph, and GlusterFS work and when to use them. |
| 13 | [Time Series Databases: Built for Metrics and Events](/time-series-databases-built-for-metrics-and-events) | Time series databases handle append-heavy metric data far better than SQL. Learn how they work and when to use InfluxDB, Prometheus, or TimescaleDB. |
| 14 | [Vector Databases: Semantic Search and AI Memory](/vector-databases-semantic-search-and-ai-memory) | Vector databases power semantic search, recommendations, and LLM memory. Learn how embeddings work, what ANN search is, and when to use one. |
| 15 | [Full-Text Search Engines: Beyond SQL LIKE](/full-text-search-engines-beyond-sql-like) | Full-text search needs more than SQL LIKE. Learn how inverted indexes, relevance ranking, and Elasticsearch make text search fast and powerful. |
| 16 | [Materialized Views: Pre-Computing Expensive Queries](/materialized-views-pre-computing-expensive-queries) | Materialized views cache expensive query results as physical tables. Learn how they work, when to refresh them, and when to use them vs other approaches. |
| 17 | [Query Optimisation: From Slow to Fast](/query-optimisation-from-slow-to-fast) | Slow queries aren't always fixed by adding indexes. Learn how to read EXPLAIN output, understand query plans, and systematically make queries fast. |
| 18 | [Connection Pooling: Managing the Hidden Bottleneck](/connection-pooling-managing-the-hidden-bottleneck) | Opening a database connection per request doesn't scale. Learn how connection pooling works, what PgBouncer does, and how to size your pool correctly. |
| 19 | [Data & Storage: Wrap-Up](/data-storage-wrap-up) | A recap of all 19 data storage concepts: SQL, NoSQL, indexing, sharding, replication, specialised databases, and how they connect in a real system. |


## The problem

You're provisioning infrastructure for the URL shortener platform:

- The database server needs fast, low-latency storage for PostgreSQL's data files
- The development team needs a shared volume where multiple services can read and write configuration files
- The CDN pipeline needs to store terabytes of generated thumbnails and QR codes that are uploaded once and read many times

Three requirements. All involve "storing data." But they need fundamentally different storage models. Using the wrong model for any of these will either break functionality or create serious performance problems.

This post is about understanding why these three storage models exist, what each one is designed for, and how to identify which one you need.

---

## The core idea

There are three fundamental storage models, each exposing a different interface to the application:

- **Block storage:** raw storage volumes presented as disk devices, managed by the OS
- **File storage:** a filesystem hierarchy accessible over a network, shared between multiple machines
- **Object storage:** a flat key-value store for arbitrary binary objects, accessed via HTTP

They're not interchangeable. Each model is optimised for different access patterns, different scale characteristics, and different consistency semantics.

---

## The analogy: three ways to store physical goods

**Block storage is a blank warehouse floor.** You're given a large, empty concrete floor. How you organise what goes on it is entirely up to you — you build your own shelving systems, labelling, access rules. Maximum flexibility, maximum control. The warehouse doesn't know or care what's on the floor.

**File storage is a furnished archive room.** Pre-organised with shelves, folders, labels, and a sign-in sheet. Anyone with a key can walk in, navigate the folder hierarchy, pick up a folder, modify it, put it back. Multiple people can access it simultaneously using the shared organisation system.

**Object storage is a postal warehouse.** You package your items, hand them over, and receive a tracking number. The warehouse decides how to store them internally. You retrieve by tracking number. There's no browsing, no "go to shelf 3 and look for the blue folder." Fast, scalable, optimised for parcels, not browsing.

---

## How each model works

### Block storage

Block storage presents raw storage volumes as "block devices" — the equivalent of virtual hard drives. The operating system sees a disk and manages it directly: creating filesystems, managing inodes, handling reads and writes at the block level.

```
Application → OS filesystem → Block device (EBS, persistent disk, SAN)
                ↑ The OS manages everything above this line
                ↓ Block storage only exposes raw read/write of fixed-size blocks
```

**How it works:**
- Storage is divided into fixed-size blocks (512 bytes to 4KB)
- The OS reads and writes blocks directly via a device driver
- The filesystem (ext4, XFS, NTFS) built on top manages files, directories, inodes
- Only one host can mount a block volume as read-write at a time (some systems support read-only multi-attach)

**Performance characteristics:**
- Lowest latency of any storage type — direct disk I/O, no network filesystem protocol overhead
- Predictable IOPS — cloud block storage (AWS EBS, GCP Persistent Disk) offers guaranteed I/O rates
- Supports random read/write patterns that databases require

**Cloud examples:** AWS EBS (Elastic Block Store), GCP Persistent Disk, Azure Managed Disks

**Best for:**
- Database storage (PostgreSQL, MySQL need block storage for their data files — the database manages file layout)
- Virtual machine OS disks
- Any workload requiring low-latency random I/O

**Not for:** sharing between multiple hosts, large-scale object storage, web-accessible content delivery

---

### File storage

File storage presents a filesystem over a network protocol. Multiple servers can mount the same file system share simultaneously and access a shared hierarchy of files and directories.

```
Server A ──┐
Server B ──┼── NFS/SMB → File storage server → Physical disks
Server C ──┘
           ↑ All servers see the same directory tree
```

**Protocols:**
- **NFS (Network File System):** Unix/Linux standard for shared storage. Most common in Linux-based cloud infrastructure.
- **SMB/CIFS:** Windows-native file sharing protocol. AWS FSx for Windows File Server.
- **AWS EFS (Elastic File System):** managed NFS over AWS infrastructure, scales automatically.

**How it works:**
- File storage appears as a regular filesystem directory
- Read/write operations go over the network to the storage server
- Multiple hosts can read and write to the same paths simultaneously
- POSIX semantics: `open()`, `read()`, `write()`, `stat()`, directory traversal

**Performance characteristics:**
- Higher latency than block storage (network round-trip per operation)
- Shared throughput — all clients compete for the same bandwidth
- Scales well for concurrent read-heavy workloads

**Best for:**
- Configuration files and scripts shared across application servers
- Home directories in multi-server environments
- Shared workspaces for compute jobs that need to read/write a common dataset
- Legacy applications that assume POSIX filesystem semantics

**Not for:** database storage (NFS latency and locking semantics are wrong for databases), very high-throughput random I/O, large-scale binary storage

**When NFS breaks:** file locking over NFS is unreliable. SQLite and other embedded databases that use file locking for concurrency control will corrupt data on NFS. PostgreSQL requires block storage — it manages its own file locking and cannot safely run on NFS.

---

### Object storage

Covered in depth in the previous post. The key contrast with block and file storage:

```
Application → HTTP PUT/GET → Object storage (S3, GCS, R2)
              ↑ No filesystem, no directories, no POSIX
              ↓ Just key → blob, at unlimited scale
```

**Performance characteristics:**
- Higher latency than block or file for individual small reads (HTTP overhead)
- Extremely high aggregate throughput — many parallel GETs/PUTs to different objects scale horizontally
- Optimised for large objects (MB to GB range) rather than small files

**Best for:**
- Large binary objects: images, video, audio, documents
- Static web assets served via CDN
- Backups, archives, data exports
- Machine learning datasets and model weights
- Any content that's written once and read many times

**Not for:** databases, shared mutable files, applications requiring POSIX semantics, frequently-updated small files

---

## Choosing between them

| Requirement | Block | File | Object |
|---|---|---|---|
| Database storage | ✓ Only option | ✗ Never | ✗ |
| VM OS disk | ✓ | ✗ | ✗ |
| Shared config across servers | — | ✓ | — |
| User-uploaded images/video | ✗ | — | ✓ Best |
| Build artifacts, backups | ✗ | — | ✓ |
| HPC shared datasets | — | ✓ | ✓ |
| Content delivery (CDN origin) | ✗ | ✗ | ✓ Best |
| Multi-host read-write access | ✗ | ✓ | ✓ (via API) |


**Decision heuristic:**

```plaintext
Does it need to be a database volume or VM disk?
  → Block storage

Does it need to be accessed by multiple servers simultaneously
  with POSIX filesystem semantics (directories, in-place writes)?
  → File storage (NFS/EFS)

Is it large binary content, uploaded or created once,
  read many times, needs HTTP access or CDN?
  → Object storage

When in doubt about large binary content: object storage.
```

* * *

## The tradeoffs

**Block:** lowest latency, highest performance for random I/O. Only one writer, not sharable across hosts without specialised software. Storage capacity must be pre-provisioned (you can't run `df` and get "unlimited").

**File:** shareable and familiar (POSIX semantics). Network latency is real, especially under many concurrent writers. File locking semantics over NFS are notoriously problematic. Not designed for large binary content.

**Object:** massively scalable, globally accessible, no provisioning required. HTTP overhead makes it slower for small files. No POSIX semantics — applications built for a filesystem can't use object storage without adaptation. No in-place modification.

* * *

## The one thing to remember

> **Block storage is for workloads that need a disk (databases, VM storage). File storage is for workloads that need a shared filesystem across multiple servers. Object storage is for workloads that need to store large binary content at scale, accessible via HTTP.** Using the wrong model doesn't just hurt performance — databases on NFS corrupt data, filesystems don't scale to billions of files, and databases storing binary blobs become enormous and slow. Match the storage model to the access pattern.

* * *

*← Previous:* [***Object Storage: Unlimited Scale for Large Binary Data***](/object-storage-unlimited-scale-for-large-binary-data) *— Object storage handles large binary files at unlimited scale. Learn how it works, why it replaced file servers, and w...*

*→ Next:* [***Distributed File Systems: File Storage Across Many Machines***](/distributed-file-systems-file-storage-across-many-machines) *— Distributed file systems spread file storage across many machines. Learn how HDFS, Ceph, and GlusterFS work and when...*