CONCEPT·LEVEL 10

Replicate & Failover

One database is one outage waiting to happen: replicate to survive.

01Why replication

A single database is a single point of failure: when it goes down, every read and write fails instantly. Replication keeps copies of the same data on multiple nodes so that if one disappears, the others can keep serving traffic. Think of a replica like a backup singer who knows all the same songs: if the lead vocalist calls in sick, the show still goes on.

In Flow, you can click a database in the inspector and press Replicate. The original becomes the primary (handles all writes), and new copies become replicas (serve reads). They share a group badge showing they belong to the same replication set. Reads spread across all healthy members; writes go only to the primary.

02What you get (and what you don't)

Aggregate read capacity: The sum of all healthy members. Three replicas means 3× the read throughput of a single node.
Partial availability: If the primary fails, replicas keep serving reads: your app stays partially up rather than completely dead.
Write limitation: Writes still go to the primary. During a primary outage, writes fail until a replica is promoted to primary.
Replication lag: Every write propagates from primary to replicas with a small delay. A replica might briefly return a slightly stale value right after a write.

💡This level: scheduled outage

The primary database fails for a window mid-simulation. Without replicas, success rate craters. With replicas, reads continue flowing to the healthy copies and your SLA holds through the outage.

📋CHEATSHEET · QUICK REFERENCE

Replicate any database that holds important reads: even one extra copy buys survivability.
Replication ≠ free. Lag means a replica might serve a slightly older value.

READY TO BUILD

Mid-run, the database fails for a while. A single DB topology will lose every request during that window. Replicate the database so reads survive the outage.

▸ START EXERCISE ALL LEVELS

📖 Glossary

Components

CacheA sticky-note pad that remembers recent answers so you don't look them up again.

CDN (Content Delivery Network)Copies of your content cached at servers close to users around the world.

Circuit BreakerA switch that trips open when a downstream service fails too much, stopping the bleeding.

Dead-Letter Queue (DLQ)A separate queue where failed or unprocessable messages get parked for inspection.

Durable QueueA queue that writes messages to disk so they survive crashes and restarts.

KafkaA durable, ordered message log that many consumers can read independently.

Load BalancerA traffic cop that splits incoming requests evenly across multiple servers.

Message BrokerMiddleware that routes messages between services so they don't talk directly.

PrimaryThe main database that accepts all write operations and leads replication.

QueueA waiting line where tasks sit until a worker is free to handle them.

Read ReplicaA database copy that only handles reads, taking pressure off the write primary.

RedisAn in-memory data store that is extremely fast for caches, counters, and sessions.

ReplicaA copy of a database that can serve reads and take over if the primary fails.

Service MeshA sidecar layer that handles networking between services: routing, retries, and TLS.

ShardOne slice of a split database: a node that holds its own piece of the data.

SQS (Simple Queue Service)Amazon's managed queue service: produce messages, let workers pull and process them.

Concepts

Acknowledge (ACK)A reply that says 'I got your message': the signal that work was accepted.

At-Least-Once DeliveryEvery message arrives at least once, but sometimes twice, so handle that.

AvailabilityWhat percentage of time your system is up and serving requests correctly.

BackpressureA slow consumer signalling producers to slow down before the queue explodes.

BottleneckThe one slow part of your system that makes everything else wait.

Cache HitYou asked the cache and it had the answer: fast, cheap, and instant.

Cache MissThe cache didn't have the answer, so you had to ask the slower database.

CAP TheoremYou can only have two of three: consistency, availability, and partition tolerance.

CapacityThe maximum load a component can handle before it starts dropping requests.

Cascading FailureOne failing service causes its callers to fail, which causes their callers to fail: dominos.

Cold StartThe slow startup when a service or cache has no warm state yet.

Connection PoolA pre-made set of database connections ready to reuse instead of making new ones.

ConsumerThe worker that pulls tasks from a queue and actually processes them.

DecouplingComponents that don't depend on each other's availability to do their jobs.

DNS (Domain Name System)The internet's phone book: turns 'api.myapp.com' into an IP address.

DurabilityOnce data is saved, it stays saved: even if the server crashes a second later.

Eventual ConsistencyAll copies of your data will agree eventually: just not necessarily right now.

Eventual DurabilityYour write is acknowledged before it hits disk: it will be saved soon, just not yet.

Exactly-Once DeliveryEvery message arrives exactly one time: no duplicates, no losses.

Garbage Collection (GC)Automatic cleanup of unused memory, but it can pause your app while running.

Head-of-Line BlockingOne stuck request at the front of the line holds up everyone waiting behind it.

Health CheckAn endpoint that tells load balancers whether a server is ready to take traffic.

Hit RateWhat percentage of requests your cache answers without touching the database.

Horizontal ScalingAdd more machines instead of making one machine bigger.

Hot KeyOne cache or shard key that gets hammered with requests while all others sit idle.

IdempotencyThe property that makes doing something twice as safe as doing it once.

IdempotentDoing something once or a hundred times gives you exactly the same result.

In-FlightRequests that have been sent but are still waiting for a reply.

LatencyHow long it takes from clicking a button to seeing the result on screen.

Load SheddingDeliberately dropping some requests during overload so the rest can succeed.

MicroserviceA small, independently deployable service that owns one business function.

MiddlewareSoftware that sits between your app and the world, handling common plumbing tasks.

MonolithOne big application that does everything: one codebase, one deploy.

Network PartitionA network split where some nodes can't talk to others for a period of time.

ObservabilityBeing able to look inside your running system and understand what is happening.

p95 LatencyThe slowest response time that 95% of your users experience.

p99Only 1% of requests are slower than this: your worst typical performance.

PartitionA segment of data or infrastructure separated from others: by design or failure.

Poison PillA bad message that crashes every consumer that tries to process it.

ProducerThe part of the system that generates work and drops it into a queue.

Rate LimitingPutting a cap on how many requests a client can make in a given time window.

Read-HeavyYour system handles way more reads than writes: think 99% read, 1% write.

RetryTry again if the first attempt fails: transient issues often resolve themselves.

Round TripThe time for a message to travel to a server and for the reply to come back.

Service DiscoveryA registry that lets services find each other's addresses as they scale up and down.

Shard KeyThe field that decides which database slice stores each piece of data.

Single Point of FailureOne component whose failure brings the entire system to its knees.

SLA (Service Level Agreement)A written promise about how reliable and fast your service will be.

StatefulA service that remembers past interactions with a user or session.

StatelessA service with no memory: every request carries all the info it needs.

Strong ConsistencyAfter you write data, every reader instantly sees your change: no exceptions.

TCP (Transmission Control Protocol)The reliable internet delivery protocol that guarantees packets arrive in order.

ThroughputHow many requests your system can handle per second at full capacity.

Thundering HerdThousands of clients all retry at exactly the same moment, creating a stampede.

TimeoutGive up waiting for a slow response after a fixed amount of time.

TLS (Transport Layer Security)Encryption for data in transit so nobody can eavesdrop on your network calls.

Token BucketTokens refill at a fixed rate; each request spends one: run out and get rejected.

TTL (Time to Live)An expiry timer on cached data: after this many seconds, throw it out and refetch.

Vertical ScalingUpgrade one machine to be bigger and faster instead of adding more machines.

Write-Ahead Log (WAL)Write changes to a log before applying them so crashes can be recovered from.

Write-HeavyYour system writes data constantly: think logging, metrics, or IoT sensor streams.

Patterns

Blue-Green DeploymentDeploy to a second identical environment, then flip all traffic to it instantly.

Cache-AsideCheck cache first; on miss, fetch from DB and store the result for next time.

Canary DeploymentSend 5% of traffic to the new version first and watch for problems before going all-in.

ChoreographyServices react to events on their own: no conductor, just musicians watching each other.

Consistent HashingA way to map keys to nodes so adding or removing a node only moves a little data.

Event SourcingStore every change as an event log, then rebuild state by replaying those events.

Exponential BackoffWait longer after each failed retry: 1 s, then 2 s, then 4 s, then 8 s.

FailoverAutomatically switching to a backup when the main component breaks.

Fan-OutOne event triggers writes or notifications to many different services at once.

Fire-and-ForgetSend a message and don't wait: assume it will be processed eventually.

OrchestrationA central controller tells each service what to do and in what order.

Pub/Sub (Publish-Subscribe)Publishers broadcast events; subscribers listen for only the events they care about.

ReplicationMake exact copies of your database on multiple machines so nothing is lost if one breaks.

Round-RobinTake turns: send request 1 to Server A, request 2 to Server B, and repeat.

ShardingSplit a huge database into smaller pieces, each holding a different slice of data.

Write-AroundSkip the cache on writes: go straight to the database, let reads fill the cache.

Write-BehindWrite to cache instantly, then sync to database later in the background.

Write-ThroughWrite to both cache and database at the same time: both always agree.