~/articles/design-hotel-booking

◆◆◆Advancedasked at Airbnbasked at Bookingasked at Expedia

Design a Hotel / Airbnb Booking System

Q: How does Postgres prevent double-booking in a hotel reservation system?

A partial EXCLUDE USING gist constraint on the reservations table enforces the no-overlap invariant atomically. The constraint indexes unit_id with equality and daterange(check_in, check_out, '[)') with the overlap operator &&, filtered to WHERE (status = 'confirmed'). Any INSERT that would create an overlapping confirmed reservation fails with a constraint violation, regardless of concurrency.

Q: Why use Elasticsearch for search instead of querying Postgres directly?

The search path peaks at roughly 50,000 QPS — far beyond what a single Postgres instance can handle — while the booking write path sees only 500 QPS. Elasticsearch handles geo-distance filters, multi-filter queries, and date-range availability filters at that scale. Strong consistency is not required for search because the Redis hold and Postgres EXCLUDE constraint enforce correctness at booking time regardless of whether a search result is a few seconds stale.

Q: What is the reservation hold pattern, and why is the TTL set to 10 minutes?

When a guest proceeds to checkout, the Booking Service writes a Redis key with SET hold:{unit_id}:{check_in}:{check_out} using NX (set-if-not-exists) and EX 600 (10-minute TTL). The NX flag makes the operation atomic, so only one concurrent caller wins the hold. The 10-minute TTL is calibrated to cover the payment form interaction; if the guest abandons checkout or the saga crashes, the hold evaporates and the room becomes bookable again without any manual cleanup.

Q: What is the booking saga, and what compensating actions run on failure?

The saga sequences three steps: place the Redis hold, charge payment, then insert the reservation row in Postgres. If payment fails, the hold is deleted and the room returns to available. If the Postgres insert fails after a successful charge — possible when the EXCLUDE constraint fires because two partially-overlapping holds raced through — the saga refunds the payment. A mid-saga crash is recovered by TTL expiry: the hold releases within 10 minutes, and any payment taken before the crash is caught by an async refund job.

Q: How does an idempotency key prevent duplicate reservations on retry?

The client generates a UUID for the checkout session and sends it as idempotency_key on every confirm request. The reservations table stores this key with a UNIQUE constraint. On retry, the INSERT uses ON CONFLICT (idempotency_key) DO NOTHING RETURNING *; if no row is returned the previous insert succeeded and the server re-fetches the existing reservation. This guarantees exactly-once semantics even if the client retransmits due to a network timeout.

Search available listings and book date ranges without double-booking. Availability as a range problem, reservation holds, and the search vs transaction split.

21 min read2026-04-29Ironclad Academy

#interview #inventory #consistency #search

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The problem

Airbnb lists 7.7 million properties across 220 countries. When a traveler searches Paris for July 10–14, a system has to scan millions of listings, cross-reference live availability calendars, return ranked results in under 200ms, and then — if the guest proceeds — guarantee that no one else can book the same apartment for the same nights. Booking.com and Expedia face the same problem at comparable scale.

The inventory model is a date range, not a discrete slot. A hotel room or apartment is available for any contiguous stretch of nights not already reserved. That sounds simple, but it creates a hard concurrency puzzle: two guests can be looking at the same listing, both see it as available, both race to the payment screen. Only one can win — and the system must enforce this without double-charging or double-booking, even under concurrent writes.

Two tensions define the design. First, read/write asymmetry: searches outnumber confirmed bookings roughly 100 to 1 at peak. The search path needs to be fast and horizontally scalable, but strong consistency there is unnecessary — a result that's a few seconds stale is fine. The booking path is the opposite: it needs ACID correctness and zero tolerance for the no-overlap invariant being violated. These two demands require completely separate stacks. Second, the checkout window: after a guest picks a listing, they spend several minutes on the payment form. During that time, the room must be effectively held — but locking it in the database for 10 minutes under high concurrency is impractical. That tension produces the reservation hold pattern and the booking saga.

A hotel booking system is the date-range cousin of the ticketmaster problem. Ticketmaster guards discrete seat inventory — a seat is either sold or not. Here the inventory is a contiguous date range: a room on nights [check-in, check-out). The correctness problem is the same (no double-sell), but the domain model and the concurrent-booking race look different.

Functional requirements

Search: given city/coordinates, check-in date, check-out date, and optional filters (price range, beds, amenities, property type), return ranked available listings with price and thumbnail.
Listing detail + calendar: view availability calendar per listing, per-night prices.
Reserve: create a temporary hold on a listing for a date range during checkout.
Confirm: on successful payment, convert hold into a confirmed reservation.
Cancel: guest or host cancels; dates return to available.

Out of scope (unless an interviewer asks): host management, reviews, messaging, fraud, pricing algorithms.

Non-functional requirements

Availability: 99.99% for search; 99.999% for the booking write path (a dropped booking is lost revenue).
Search latency: p99 < 200ms.
Booking latency: p99 < 500ms.
No double-booking: the core correctness invariant — must hold under concurrent load.
Hold TTL: a held room must be released within a bounded time (10–15 min) if checkout is abandoned.
Idempotent bookings: retrying the confirm endpoint must not create duplicate reservations.

Capacity estimation

Dimension	Estimate	How we got there
Active listings	7.7M	Airbnb 2023 public figure
Listings storage	~15 GB	`7.7M listings × 2 KB metadata` — fits in RAM on one node
Search QPS (avg)	~290/s	`150M users × 5 searches/session/month ÷ 2.6M s/month`
Search QPS (peak)	~50k/s	Platform-level assumption for holiday season at Airbnb scale; the 290/s avg → 50k/s peak implies a ~170× ratio, so treat 50k as an independent input, not derived from the per-user model
Bookings/year	~115M	`448M nights booked/year ÷ 3.9-night avg stay (2023 actual)`
Booking QPS (avg)	~3.6/s	`115M ÷ 31.5M s/year`
Booking QPS (peak)	~500/s	Platform peak estimate
Read:write ratio	100:1	`50k search QPS : 500 booking QPS`
Reservation table	~11.5 GB/year	`115M rows/year × ~100 bytes` (unit_id, check_in, check_out, guest_id, status, idempotency_key, timestamps)
Calendar date-cells	~2.8B	`7.7M listings × 365 nights` — stored as date ranges, not per-night rows
Calendar rows/year	~77M	`~10 reservations/year per active listing × 7.7M listings`
Active holds in Redis	~2,500	`5× peak booking rate = 5 × 500`; at ~200 bytes each → `<1 MB total`

Takeaway: The search tier must absorb 50k QPS at low latency. The booking write tier sees 500 QPS but zero tolerance for errors. These two paths get completely separate stacks.

Building up to the design

V1: One Postgres table

Start with the obvious thing: a single reservations table.

CREATE TABLE reservations (
  id         BIGSERIAL PRIMARY KEY,
  unit_id    BIGINT NOT NULL,
  check_in   DATE   NOT NULL,
  check_out  DATE   NOT NULL,
  guest_id   BIGINT NOT NULL,
  status     TEXT   NOT NULL  -- 'confirmed'
);

On booking, scan for overlaps, then insert if none are found:

SELECT 1 FROM reservations
WHERE unit_id = $1
  AND status = 'confirmed'
  AND check_in < $check_out AND check_out > $check_in;

This works correctly on a single connection. The problem is concurrency. Two requests that arrive simultaneously both run the SELECT, both find zero conflicts, and both insert — now you have a double-booking. The check-then-insert isn't atomic.

V2: Make the overlap check atomic

The fix is to move the enforcement into the database itself, where the transaction system can serialize it. One approach: wrap the check in SELECT FOR UPDATE.

BEGIN;
SELECT 1 FROM reservations
WHERE unit_id = $1
  AND status = 'confirmed'
  AND check_in < $check_out AND check_out > $check_in
FOR UPDATE;
-- if rows returned → conflict → ROLLBACK
INSERT INTO reservations ...;
COMMIT;

There's a subtle problem here though: FOR UPDATE only locks rows that actually exist. If the unit has no existing reservations for that range, the SELECT returns nothing — and locks nothing — so two concurrent transactions can still both pass and double-book. To make SELECT FOR UPDATE actually work, you'd need to lock a parent row (e.g., the unit row in a units table), not the reservation rows themselves. That's workable, but it serializes all bookings for a popular unit. Far cleaner: let Postgres enforce the invariant declaratively with an EXCLUDE USING gist constraint. Install it once, and the database rejects any overlapping insert outright — no application-level coordination needed.

CREATE EXTENSION IF NOT EXISTS btree_gist;

ALTER TABLE reservations ADD CONSTRAINT no_overlap
  EXCLUDE USING gist (
    unit_id WITH =,
    daterange(check_in, check_out, '[)') WITH &&
  )
  WHERE (status = 'confirmed');

Any INSERT that would create an overlap fails with a constraint violation, atomically, regardless of concurrency. This is the correctness foundation the rest of the system rests on.

But now the search problem appears. At 50k QPS with geo + date + filter combinations, one Postgres instance will not keep up.

V3: Separate search from booking

Introduce an Elasticsearch cluster for search. Listings — with denormalized availability windows and pricing — are indexed there. The booking write path stays in Postgres. A CDC pipeline (Debezium → Kafka → Elasticsearch indexer) keeps the index fresh, typically with a few seconds of lag.

Search now runs at 50k QPS against Elasticsearch at sub-100ms. Postgres handles only the 500 QPS booking write path, where it's the right tool: ACID transactions, row-level locking, constraint enforcement.

During checkout, though, a guest spends several minutes filling in payment details. The room still shows as available in search results during that time. Another guest may reach the payment step for the same room.

V4: The reservation hold

Before showing the payment form, write a hold into Redis:

SET hold:{unit_id}:{check_in}:{check_out} {guest_id} NX EX 600   # 10-minute TTL, set only if not exists

The NX flag makes this atomic and conditional: only one concurrent caller wins. Before writing this hold, also check for confirmed reservations in Postgres for the date range. If either check fails, reject the hold request. The search index excludes held units from results (either via a fast Redis check at query time, or by marking the unit temporarily unavailable in the search index).

Two guests racing to checkout the same room hit this hold atomically. One gets OK, the other gets nil — "room no longer available" — before either has touched the payment form. No wasted payment attempts.

The 10-minute TTL is the safety valve: if the guest abandons checkout or the process crashes mid-saga, the hold evaporates and the room becomes bookable again.

V5: The booking saga

Payment adds failure modes that need explicit compensation. The saga runs three steps in sequence:

flowchart LR
    S1[Place Hold<br/>Redis TTL] --> S2[Charge Payment]
    S2 -->|success| S3[Write Reservation<br/>Postgres]
    S3 --> S4[Release Hold<br/>Redis delete]
    S2 -->|failure| C1[Release Hold<br/>compensation]
    S3 -->|conflict| C2[Refund Payment<br/>compensation]
    style S1 fill:#ff6b1a,color:#0a0a0f
    style S3 fill:#15803d,color:#fff
    style C1 fill:#ff2e88,color:#fff
    style C2 fill:#ff2e88,color:#fff

Hold: write the Redis entry with TTL. If this fails, abort — show "room no longer available".
Charge: call Payment Service. If the charge fails, delete the hold and return an error. The room goes back to available.
Confirm: insert the reservation row in Postgres. The EXCLUDE constraint is the backstop — if it fires here (extremely rare: two holds raced through with partially overlapping date ranges where the Redis key was coarser than the actual overlap), refund the payment and surface an error.

Even if the saga crashes mid-flight and no compensation runs, the hold TTL eventually releases the room. The worst case is a 10-minute window where the room appears unavailable to other guests.

V6: Production system

V2 + V3 + V4 + V5, plus idempotency keys on booking requests (retries must not double-charge), a CDC pipeline from Postgres to Elasticsearch for search freshness, availability calendar cache in Redis, multi-region read replicas for listing metadata, and async notifications after confirmation via a message queue.

flowchart LR
    V1[V1: Postgres scan<br/>correct, not concurrent] --> V2[V2: FOR UPDATE / EXCLUDE<br/>atomic no-overlap]
    V2 --> V3[V3: + Elasticsearch<br/>fast search]
    V3 --> V4[V4: + Redis hold<br/>blocks concurrent bookers]
    V4 --> V5[V5: + booking saga<br/>payment + compensation]
    V5 --> V6[V6: + idempotency + CDC<br/>+ notifications]
    style V1 fill:#0e7490,color:#fff
    style V3 fill:#15803d,color:#fff
    style V4 fill:#ff6b1a,color:#0a0a0f
    style V6 fill:#a855f7,color:#fff

High-level architecture

flowchart TD
    U[Traveler] --> GW[API Gateway / Auth]
    H[Host] --> GW

    GW -->|"GET /search"| SRVC[Search Service]
    GW -->|"GET /listings/:id/calendar"| CALSVC[Calendar Service]
    GW -->|"POST /holds"| BSVC[Booking Service]
    GW -->|"POST /reservations"| BSVC

    SRVC --> ES[(Elasticsearch<br/>geo + date + filters)]
    CALSVC --> RCAL[(Redis<br/>calendar cache)]
    RCAL -.miss.-> AVDB[(Availability DB<br/>Postgres)]

    BSVC --> RHOLD[(Redis<br/>hold store)]
    BSVC --> AVDB
    BSVC --> PAY[Payment Service]
    BSVC --> NOTIFY[Notification Service<br/>email / push]

    AVDB -.CDC / Debezium.-> KAFKA[Kafka]
    KAFKA --> EIDX[ES Indexer]
    EIDX --> ES

    style SRVC fill:#0e7490,color:#fff
    style BSVC fill:#ff6b1a,color:#0a0a0f
    style AVDB fill:#15803d,color:#fff
    style RHOLD fill:#a855f7,color:#fff
    style ES fill:#ffaa00,color:#0a0a0f
    style KAFKA fill:#ff2e88,color:#fff

The core data model

availability table (Postgres)

CREATE TABLE units (
  unit_id     BIGINT       PRIMARY KEY,
  listing_id  BIGINT       NOT NULL,
  -- property metadata omitted for brevity
  max_guests  SMALLINT
);

CREATE INDEX ON units (listing_id);

CREATE TABLE reservations (
  reservation_id     BIGSERIAL    PRIMARY KEY,
  unit_id            BIGINT       NOT NULL REFERENCES units,
  guest_id           BIGINT       NOT NULL,
  check_in           DATE         NOT NULL,
  check_out          DATE         NOT NULL,
  status             TEXT         NOT NULL DEFAULT 'confirmed',
                     -- 'confirmed' | 'cancelled'
  idempotency_key    TEXT         UNIQUE NOT NULL,
  created_at         TIMESTAMPTZ  DEFAULT now(),

  CONSTRAINT no_overlap EXCLUDE USING gist (
    unit_id WITH =,
    daterange(check_in, check_out, '[)') WITH &&
  ) WHERE (status = 'confirmed')
);

CREATE INDEX ON reservations (unit_id, check_in, check_out);

The daterange(check_in, check_out, '[)') creates a half-open interval: a booking for July 10–14 covers nights 10, 11, 12, 13, with checkout morning of the 14th. The && operator checks range overlap. The WHERE (status = 'confirmed') makes this a partial constraint — cancelled reservations are excluded, so the same dates can be rebooked after a cancellation. And idempotency_key with a UNIQUE constraint gives exactly-once semantics: retrying an already-confirmed booking returns the existing row rather than creating a second reservation.

holds (Redis)

Key:    hold:{unit_id}:{check_in}:{check_out}
Value:  {guest_id}:{session_id}
TTL:    600 seconds (10 minutes)

Write via SET key value NX EX seconds — set-if-not-exists with TTL in a single atomic command. Two concurrent checkout sessions racing here: one wins (gets OK), the other gets nil and sees "room no longer available" immediately. No separate GET is needed, and there's no window where one session reads an absent key and another writes it in between.

search index (Elasticsearch)

One document per unit, with a geo_point field for location, an available_ranges array of {from, to} date ranges (the complement of booked ranges), denormalized pricing (minimum nightly price, weekend price), amenities as keyword fields, and ratings and review count. Queries combine a geo_distance filter, a date range filter against available_ranges, and a function score for ranking. Freshness lag from Postgres is typically a few seconds via CDC.

Availability as a range problem

Before going further, the overlap logic is worth making explicit because it trips people up in interviews.

A booking covers the interval [check_in, check_out). Two bookings for the same unit conflict if and only if:

booking_A.check_in < booking_B.check_out
AND
booking_B.check_in < booking_A.check_out

A picture makes this clearer. Consider three scenarios:

flowchart LR
    subgraph "No conflict"
        A1["Booking A: Jul 1–5"]
        B1["Booking B: Jul 5–9"]
    end
    subgraph "Conflict — B starts inside A"
        A2["Booking A: Jul 1–7"]
        B2["Booking B: Jul 4–9"]
    end
    subgraph "Conflict — B contained in A"
        A3["Booking A: Jul 1–10"]
        B3["Booking B: Jul 3–7"]
    end
    style A2 fill:#ff6b1a,color:#0a0a0f
    style B2 fill:#ff2e88,color:#fff
    style A3 fill:#ff6b1a,color:#0a0a0f
    style B3 fill:#ff2e88,color:#fff
    style A1 fill:#15803d,color:#fff
    style B1 fill:#0e7490,color:#fff

The first pair doesn't conflict because checkout day is not a night — July 5 checkout and July 5 check-in is two separate guests sharing a cleaning window, not a double-booking. The half-open interval [) captures this correctly: Jul1..Jul5 and Jul5..Jul9 don't overlap.

The Postgres && operator on daterange types implements this test exactly. Some systems store per-night "blocked" records (one row per night); that makes "is this range available?" a simple count, but inflates storage O(nights) and complicates partial-overlap queries. Date ranges are the right abstraction here.

Sequence: concurrent bookers, last room

The scenario everyone asks about in interviews: two guests search, both see the same listing as available, both start checkout at the same time.

sequenceDiagram
    participant A as Guest A
    participant B as Guest B
    participant Redis as Redis (hold)
    participant PAY as Payment Service
    participant PG as Postgres

    A->>Redis: SET NX hold:unit42:Jul10:Jul14 guestA EX 600
    B->>Redis: SET NX hold:unit42:Jul10:Jul14 guestB EX 600
    Redis-->>A: OK (won the hold)
    Redis-->>B: nil (lost — room already held)
    B->>B: show "room no longer available"
    A->>PAY: charge payment
    PAY-->>A: payment confirmed
    A->>PG: INSERT reservation (EXCLUDE enforces no overlap)
    PG-->>A: 201 Created
    A->>Redis: DEL hold:unit42:Jul10:Jul14

Guest B is rejected at the Redis hold stage — before payment is even attempted. No wasted payment API calls, no compensations needed. The EXCLUDE constraint in Postgres is the backstop for the edge case where two different date ranges partially overlap and the Redis hold key scheme is coarser-grained than the actual conflict.

Calendar and pricing cache

A listing's availability calendar is read-heavy: every listing detail page view triggers a calendar fetch. Calendar data changes only when a booking is confirmed or cancelled — infrequent writes, very frequent reads.

Cache key:   calendar:{listing_id}:{year_month}
Value:       JSON — {booked_ranges: [...], price_overrides: [...]}
TTL:         60 seconds
Write-on-change: Booking Service invalidates on reservation insert/cancel

Sixty seconds of potential staleness for the calendar display is acceptable. The reservation hold and EXCLUDE constraint protect correctness regardless of what the calendar shows — a stale calendar might cause a guest to try booking unavailable dates, but they'll be rejected correctly rather than double-booked.

Search path deep-dive

Search is the 50k QPS path. It cannot touch Postgres on every request.

A search query arrives as: city="Paris", check_in=2026-07-10, check_out=2026-07-14, guests=2, max_price=200.

flowchart LR
    Q["Search query<br/>city + dates + filters"] --> GC[Geocode city<br/>cached by name]
    GC --> ES[Elasticsearch query]
    ES --> GF["geo_distance filter<br/>within 30km"]
    ES --> DF["date availability filter<br/>available_ranges covers request"]
    ES --> PF["price/amenity filters"]
    GF --> RANK[Function score ranking<br/>rating × recency × price]
    DF --> RANK
    PF --> RANK
    RANK --> RES[Paginated results]
    style ES fill:#ffaa00,color:#0a0a0f
    style RANK fill:#ff6b1a,color:#0a0a0f

Geocode the city to a bounding box or geo-point (cache by city name; this doesn't change often).
Elasticsearch query: geo_distance filter (within 30km of center), range filter on min_price, term filters on amenities, and a date availability filter — the listing's available_ranges must contain a range that covers [check_in, check_out).
Rank by a function score combining rating, reviews, price, and recency of activity.
Return paginated results.

The availability filter in Elasticsearch is approximate: the index might be a few seconds behind Postgres. A listing that was just booked can briefly still appear in results. This is intentional — the hold and EXCLUDE mechanism handles the real race. Stale search results are an annoyance, not a correctness bug.

Geo-date index design

Each unit document in Elasticsearch has an available_from/available_to per blocked-range, or more practically a nested available_ranges array. A nested query filter checks that at least one element in the array covers the requested dates.

An alternative: pre-compute a "date bitmap" per listing for the next 365 days (one bit per night) and store as a binary field. A bitwise AND with the requested range tells you if dates are free. Compact and fast for small windows, but awkward for large ranges. Most production implementations use the nested date-range approach.

Failure modes

Failure	What happens	Recovery
Redis hold store down	Cannot place hold	Fallback: attempt direct Postgres reservation (higher contention, but correct)
Hold TTL expires during slow payment	Hold released mid-checkout	Postgres `EXCLUDE` fires on confirm; saga refunds payment; user sees error
Payment succeeds, Postgres insert fails	Charged but not confirmed	Idempotency key: retry triggers `INSERT ... ON CONFLICT DO NOTHING`; if still fails, saga refunds
Booking Service crash mid-saga	Hold remains until TTL	Expired hold releases the room; payment never charged (crash before payment step) or refunded by async job
Search index shows stale availability	User sees "available" but hold/booking exists	Hold check at confirm step catches it; user sees "no longer available"
Postgres primary fails	Writes unavailable until failover	Synchronous standby + automated failover (e.g., Patroni); target RTO < 30s
Double-booking race through Redis	Unlikely; two holds with different date keys partially overlap	Postgres `EXCLUDE` catches it at insert; last writer gets a constraint violation, saga compensates

Storage choices

Data	Store	Reason
Listings metadata	Postgres (primary) + read replicas	Durable, relational; query patterns are simple
Reservations	Postgres with `EXCLUDE` constraint	ACID required for the no-overlap invariant
Active holds	Redis (volatile)	TTL semantics, sub-ms latency; data is ephemeral by design
Availability calendar cache	Redis	High read rate; short TTL acceptable
Search index	Elasticsearch	Geo + multi-filter + full-text at 50k QPS
Pricing history	Postgres / data warehouse	Analytical queries; audit trail
Events (bookings, cancellations)	Kafka → S3	Fan-out to notifications, analytics, fraud detection

Idempotent bookings

The confirm endpoint accepts an idempotency_key from the client — typically a UUID generated when the user first tapped "Book". The server stores this key in the reservations table with a UNIQUE constraint. On retry:

INSERT INTO reservations (unit_id, guest_id, check_in, check_out, idempotency_key, ...)
VALUES ($1, $2, $3, $4, $5, ...)
ON CONFLICT (idempotency_key) DO NOTHING
RETURNING *;

If RETURNING gives back a row, the first attempt succeeded — return success. If no row is returned and ON CONFLICT fired, the previous insert succeeded; re-fetch the existing reservation and return it. The client always generates the same key for the same checkout session, so retrying a timed-out request never creates a second charge.

Overbooking policy

Some hotel systems intentionally allow overbooking — they sell more rooms than they have, betting on cancellations, no-shows, and rate-sensitive guests who accept upgrades. Airlines and hotels both do this.

If asked about it: the same system applies, but the EXCLUDE constraint is replaced with a capacity counter. Instead of enforcing zero overlaps, you enforce active_reservations_in_range <= unit_count + overbook_allowance. The tradeoff: occasional guest walkouts (with compensation), in exchange for higher revenue from fuller occupancy.

For Airbnb-style systems where each unit is a single unique property, overbooking isn't meaningful — enforce strict no-overlap.

API design

GET /api/v1/search
  ?city=Paris&check_in=2026-07-10&check_out=2026-07-14
  &guests=2&max_price=200&page=1

→ 200 OK
{ "results": [ { "unit_id": 42, "price_per_night": 185, ... } ], "cursor": "..." }

GET /api/v1/listings/42/calendar?year=2026&month=07

→ 200 OK
{ "booked_ranges": [{"from":"2026-07-01","to":"2026-07-09"}],
  "price_overrides": [...] }

POST /api/v1/holds
{ "unit_id": 42, "check_in": "2026-07-10", "check_out": "2026-07-14",
  "guest_id": 7, "idempotency_key": "a3f9-..." }

→ 201 Created
{ "hold_id": "hold:42:2026-07-10:2026-07-14", "expires_at": "2026-07-10T10:10:00Z" }
→ 409 Conflict   (dates already held or booked)

POST /api/v1/reservations
{ "hold_id": "...", "payment_method_id": "pm_...", "idempotency_key": "a3f9-..." }

→ 201 Created
{ "reservation_id": 9871, "status": "confirmed", ... }
→ 409 Conflict   (hold expired or dates taken)
→ 402 Payment Required   (card declined — hold released)

Booking lifecycle as a state machine

stateDiagram-v2
    [*] --> Searching
    Searching --> HoldPlaced: POST /holds OK
    HoldPlaced --> PaymentPending: proceed to payment
    PaymentPending --> Confirmed: payment succeeds
    PaymentPending --> Released: payment fails
    HoldPlaced --> Released: TTL expires or user abandons
    Confirmed --> CancelledByGuest: guest cancels
    Confirmed --> CancelledByHost: host cancels
    CancelledByGuest --> [*]
    CancelledByHost --> [*]
    Released --> [*]

Each transition is a durable event published to Kafka. Downstream: notification service sends email/push, search indexer updates availability, analytics pipeline records revenue.

Comparison to Ticketmaster

Dimension	Hotel booking	Ticketmaster
Inventory unit	Date range `[check_in, check_out)`	Discrete seat (seat 12B, Section C)
Overlap check	Range intersection (`&&` on `daterange`)	Exact uniqueness (seat already sold)
Concurrent demand	Moderate (rarely 1000:1 per unit)	Extreme (50k users for 1 seat on sale day)
Hold TTL	10–15 minutes (checkout time)	10–15 minutes (same)
Overbooking	Sometimes intentional (hotel chains)	Essentially never
Search read:write	100:1	1000:1 on sale day, then drops

The key difference: in Ticketmaster, inventory is atomic and discrete — a seat either exists or doesn't. Here, inventory is a continuous timeline — a unit is available for any date range not already booked. The EXCLUDE constraint on date ranges is the equivalent of the unique-seat constraint in a ticketing system.

Things to discuss in an interview

Why not just lock the row in Postgres and check? SELECT FOR UPDATE on the reservation rows doesn't work reliably — it only locks rows that exist, so the first booking on a unit has nothing to lock. You could lock a parent unit row instead, but that serializes all booking attempts for a popular unit. The Redis hold gives fast early rejection without locking the DB at all; the EXCLUDE constraint in Postgres is the correctness backstop at insert time.
What if two separate date ranges partially overlap? Explain the half-open interval model and how daterange && daterange catches all overlap patterns, including when one range is contained within another.
How would you shard the reservations table? Shard by unit_id. All reservations for a unit live on one shard, so the EXCLUDE constraint and SELECT FOR UPDATE work within one shard without cross-shard coordination.
What's the freshness guarantee on search results? The search index can be a few seconds stale. This is intentional and acceptable: the hold + EXCLUDE constraint at booking time enforces correctness regardless.
How do you handle a TTL expiry that races with a payment confirm? The saga checks, at the moment of the Postgres insert, whether the hold is still valid (by checking the reservation's idempotency key). If the hold already expired and someone else booked the dates, the EXCLUDE fires and the saga refunds the payment.
Overbooking: know what it is, when it's used, and how to model it as a capacity counter instead of a strict exclusion constraint.

Things you should now be able to answer

Why is the availability problem a date-range intersection problem, not a point lookup?
How does EXCLUDE USING gist on a daterange prevent double-booking at the database level?
Why is the search index allowed to be slightly stale, and what enforces correctness instead?
What is the booking saga, and what compensating actions exist at each step?
How does the Redis hold prevent wasted payment attempts for the same room?
How does an idempotency key prevent duplicate reservations on retry?
When would you intentionally allow overbooking, and how would you model it?

Frequently asked questions

▸How does Postgres prevent double-booking in a hotel reservation system?

A partial EXCLUDE USING gist constraint on the reservations table enforces the no-overlap invariant atomically. The constraint indexes unit_id with equality and daterange(check_in, check_out, '[)') with the overlap operator &&, filtered to WHERE (status = 'confirmed'). Any INSERT that would create an overlapping confirmed reservation fails with a constraint violation, regardless of concurrency.

▸Why use Elasticsearch for search instead of querying Postgres directly?

The search path peaks at roughly 50,000 QPS — far beyond what a single Postgres instance can handle — while the booking write path sees only 500 QPS. Elasticsearch handles geo-distance filters, multi-filter queries, and date-range availability filters at that scale. Strong consistency is not required for search because the Redis hold and Postgres EXCLUDE constraint enforce correctness at booking time regardless of whether a search result is a few seconds stale.

▸What is the reservation hold pattern, and why is the TTL set to 10 minutes?

When a guest proceeds to checkout, the Booking Service writes a Redis key with SET hold:{unit_id}:{check_in}:{check_out} using NX (set-if-not-exists) and EX 600 (10-minute TTL). The NX flag makes the operation atomic, so only one concurrent caller wins the hold. The 10-minute TTL is calibrated to cover the payment form interaction; if the guest abandons checkout or the saga crashes, the hold evaporates and the room becomes bookable again without any manual cleanup.

▸What is the booking saga, and what compensating actions run on failure?

The saga sequences three steps: place the Redis hold, charge payment, then insert the reservation row in Postgres. If payment fails, the hold is deleted and the room returns to available. If the Postgres insert fails after a successful charge — possible when the EXCLUDE constraint fires because two partially-overlapping holds raced through — the saga refunds the payment. A mid-saga crash is recovered by TTL expiry: the hold releases within 10 minutes, and any payment taken before the crash is caught by an async refund job.

▸How does an idempotency key prevent duplicate reservations on retry?

The client generates a UUID for the checkout session and sends it as idempotency_key on every confirm request. The reservations table stores this key with a UNIQUE constraint. On retry, the INSERT uses ON CONFLICT (idempotency_key) DO NOTHING RETURNING *; if no row is returned the previous insert succeeded and the server re-fetches the existing reservation. This guarantees exactly-once semantics even if the client retransmits due to a network timeout.

← previous

Design an Email Service (Gmail)

Design an Online Code Judge (LeetCode)

// RELATED

Design a Hotel / Airbnb Booking System

The problem

Functional requirements

Non-functional requirements

Capacity estimation

Building up to the design

V1: One Postgres table

V2: Make the overlap check atomic

V3: Separate search from booking

V4: The reservation hold

V5: The booking saga

V6: Production system

High-level architecture

The core data model

availability table (Postgres)

holds (Redis)

search index (Elasticsearch)

Availability as a range problem

Sequence: concurrent bookers, last room

Calendar and pricing cache

Search path deep-dive

Geo-date index design

Failure modes

Storage choices

Idempotent bookings

Overbooking policy

API design

Booking lifecycle as a state machine

Comparison to Ticketmaster

Things to discuss in an interview

Things you should now be able to answer

Further reading

Frequently asked questions

You may also like

Design an LLM Observability Platform

Design an LLM Gateway (AI Gateway & Model Router)

Design an LLM Fine-Tuning Platform