CONCEPT·LEVEL 07

Shard the Database

When one database can't keep up, slice the data across many.

01When the database becomes the ceiling

A single database has a maximum write throughput no matter how powerful the machine. Read replicas spread read load across copies of the data, but every write still lands on the primary. If write volume keeps growing, you hit a wall that no amount of vertical scaling can break through.

Sharding is horizontal scaling for databases. Think of it like splitting a large dictionary into two volumes: A through M and N through Z. Each volume is smaller and faster, and both can be consulted in parallel. Every shard holds a non-overlapping slice of the data, and the aggregate throughput grows almost linearly with the number of shards.

02How a shard router works

A shard router (sometimes called a coordinator) sits in front of the shards. Given a request's shard key (typically a hash of a user ID or account ID), it computes which shard owns that key and forwards the request there. The application calls the router as though it were a single database; the routing is transparent.

request(key) → hash(key) % N → shard[N] → response

03Consistent hashing

Naive modular hashing (key % N) has a nasty property: change N and almost every key maps to a different shard, forcing a full data migration. Consistent hashing arranges shards on a virtual ring. Adding or removing a shard only displaces the keys that were its immediate neighbours (typically 1/N of total keys), leaving the rest untouched.

💡Why consistent hashing matters in production

Without it, adding a shard means moving nearly all your data: hours of downtime. With it, adding a shard moves only a small fraction. This makes scaling up far less disruptive and is why every serious distributed database uses it.

04Trade-offs

Cross-shard queries (aggregations that need data from multiple shards) are expensive: you fan out to every shard and merge the results. Hot keys are another pitfall: if one user generates disproportionate traffic, their shard saturates even while the others are idle. Pick a shard key with high cardinality and even distribution to avoid both problems.

📋CHEATSHEET · QUICK REFERENCE

Pick a shard key with high cardinality and even distribution.
Aggregate capacity ≈ N × per-shard capacity.
If one shard is hot, the average looks fine but users on that shard suffer.

READY TO BUILD

One database can't keep up. Add a load balancer in front of multiple servers, then a shard router that deterministically partitions writes across multiple databases by key. More shards = more aggregate throughput.

▸ START EXERCISE ALL LEVELS

📖 Glossary

Components

CacheA sticky-note pad that remembers recent answers so you don't look them up again.

CDN (Content Delivery Network)Copies of your content cached at servers close to users around the world.

Circuit BreakerA switch that trips open when a downstream service fails too much, stopping the bleeding.

Dead-Letter Queue (DLQ)A separate queue where failed or unprocessable messages get parked for inspection.

Durable QueueA queue that writes messages to disk so they survive crashes and restarts.

KafkaA durable, ordered message log that many consumers can read independently.

Load BalancerA traffic cop that splits incoming requests evenly across multiple servers.

Message BrokerMiddleware that routes messages between services so they don't talk directly.

PrimaryThe main database that accepts all write operations and leads replication.

QueueA waiting line where tasks sit until a worker is free to handle them.

Read ReplicaA database copy that only handles reads, taking pressure off the write primary.

RedisAn in-memory data store that is extremely fast for caches, counters, and sessions.

ReplicaA copy of a database that can serve reads and take over if the primary fails.

Service MeshA sidecar layer that handles networking between services: routing, retries, and TLS.

ShardOne slice of a split database: a node that holds its own piece of the data.

SQS (Simple Queue Service)Amazon's managed queue service: produce messages, let workers pull and process them.

Concepts

Acknowledge (ACK)A reply that says 'I got your message': the signal that work was accepted.

At-Least-Once DeliveryEvery message arrives at least once, but sometimes twice, so handle that.

AvailabilityWhat percentage of time your system is up and serving requests correctly.

BackpressureA slow consumer signalling producers to slow down before the queue explodes.

BottleneckThe one slow part of your system that makes everything else wait.

Cache HitYou asked the cache and it had the answer: fast, cheap, and instant.

Cache MissThe cache didn't have the answer, so you had to ask the slower database.

CAP TheoremYou can only have two of three: consistency, availability, and partition tolerance.

CapacityThe maximum load a component can handle before it starts dropping requests.

Cascading FailureOne failing service causes its callers to fail, which causes their callers to fail: dominos.

Cold StartThe slow startup when a service or cache has no warm state yet.

Connection PoolA pre-made set of database connections ready to reuse instead of making new ones.

ConsumerThe worker that pulls tasks from a queue and actually processes them.

DecouplingComponents that don't depend on each other's availability to do their jobs.

DNS (Domain Name System)The internet's phone book: turns 'api.myapp.com' into an IP address.

DurabilityOnce data is saved, it stays saved: even if the server crashes a second later.

Eventual ConsistencyAll copies of your data will agree eventually: just not necessarily right now.

Eventual DurabilityYour write is acknowledged before it hits disk: it will be saved soon, just not yet.

Exactly-Once DeliveryEvery message arrives exactly one time: no duplicates, no losses.

Garbage Collection (GC)Automatic cleanup of unused memory, but it can pause your app while running.

Head-of-Line BlockingOne stuck request at the front of the line holds up everyone waiting behind it.

Health CheckAn endpoint that tells load balancers whether a server is ready to take traffic.

Hit RateWhat percentage of requests your cache answers without touching the database.

Horizontal ScalingAdd more machines instead of making one machine bigger.

Hot KeyOne cache or shard key that gets hammered with requests while all others sit idle.

IdempotencyThe property that makes doing something twice as safe as doing it once.

IdempotentDoing something once or a hundred times gives you exactly the same result.

In-FlightRequests that have been sent but are still waiting for a reply.

LatencyHow long it takes from clicking a button to seeing the result on screen.

Load SheddingDeliberately dropping some requests during overload so the rest can succeed.

MicroserviceA small, independently deployable service that owns one business function.

MiddlewareSoftware that sits between your app and the world, handling common plumbing tasks.

MonolithOne big application that does everything: one codebase, one deploy.

Network PartitionA network split where some nodes can't talk to others for a period of time.

ObservabilityBeing able to look inside your running system and understand what is happening.

p95 LatencyThe slowest response time that 95% of your users experience.

p99Only 1% of requests are slower than this: your worst typical performance.

PartitionA segment of data or infrastructure separated from others: by design or failure.

Poison PillA bad message that crashes every consumer that tries to process it.

ProducerThe part of the system that generates work and drops it into a queue.

Rate LimitingPutting a cap on how many requests a client can make in a given time window.

Read-HeavyYour system handles way more reads than writes: think 99% read, 1% write.

RetryTry again if the first attempt fails: transient issues often resolve themselves.

Round TripThe time for a message to travel to a server and for the reply to come back.

Service DiscoveryA registry that lets services find each other's addresses as they scale up and down.

Shard KeyThe field that decides which database slice stores each piece of data.

Single Point of FailureOne component whose failure brings the entire system to its knees.

SLA (Service Level Agreement)A written promise about how reliable and fast your service will be.

StatefulA service that remembers past interactions with a user or session.

StatelessA service with no memory: every request carries all the info it needs.

Strong ConsistencyAfter you write data, every reader instantly sees your change: no exceptions.

TCP (Transmission Control Protocol)The reliable internet delivery protocol that guarantees packets arrive in order.

ThroughputHow many requests your system can handle per second at full capacity.

Thundering HerdThousands of clients all retry at exactly the same moment, creating a stampede.

TimeoutGive up waiting for a slow response after a fixed amount of time.

TLS (Transport Layer Security)Encryption for data in transit so nobody can eavesdrop on your network calls.

Token BucketTokens refill at a fixed rate; each request spends one: run out and get rejected.

TTL (Time to Live)An expiry timer on cached data: after this many seconds, throw it out and refetch.

Vertical ScalingUpgrade one machine to be bigger and faster instead of adding more machines.

Write-Ahead Log (WAL)Write changes to a log before applying them so crashes can be recovered from.

Write-HeavyYour system writes data constantly: think logging, metrics, or IoT sensor streams.

Patterns

Blue-Green DeploymentDeploy to a second identical environment, then flip all traffic to it instantly.

Cache-AsideCheck cache first; on miss, fetch from DB and store the result for next time.

Canary DeploymentSend 5% of traffic to the new version first and watch for problems before going all-in.

ChoreographyServices react to events on their own: no conductor, just musicians watching each other.

Consistent HashingA way to map keys to nodes so adding or removing a node only moves a little data.

Event SourcingStore every change as an event log, then rebuild state by replaying those events.

Exponential BackoffWait longer after each failed retry: 1 s, then 2 s, then 4 s, then 8 s.

FailoverAutomatically switching to a backup when the main component breaks.

Fan-OutOne event triggers writes or notifications to many different services at once.

Fire-and-ForgetSend a message and don't wait: assume it will be processed eventually.

OrchestrationA central controller tells each service what to do and in what order.

Pub/Sub (Publish-Subscribe)Publishers broadcast events; subscribers listen for only the events they care about.

ReplicationMake exact copies of your database on multiple machines so nothing is lost if one breaks.

Round-RobinTake turns: send request 1 to Server A, request 2 to Server B, and repeat.

ShardingSplit a huge database into smaller pieces, each holding a different slice of data.

Write-AroundSkip the cache on writes: go straight to the database, let reads fill the cache.

Write-BehindWrite to cache instantly, then sync to database later in the background.

Write-ThroughWrite to both cache and database at the same time: both always agree.