~/articles/design-calendar

◆◆Intermediateasked at Googleasked at Microsoftasked at Apple

Design a Calendar System (Google Calendar)

Store events, share calendars, find free slots, and fire reminders — across time zones and recurring rules. The RRULE expansion and free/busy problem.

23 min read2026-05-11Ironclad Academy

#interview #scheduling #consistency #notifications

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The problem

Google Calendar, Apple Calendar, and Microsoft Outlook handle a problem that looks simple from the outside: store some events, show them in a grid, fire a reminder before each one. Any developer could wire up a basic version in an afternoon using a single database table with a start_time column.

The complexity starts when you add recurring events. A weekly stand-up has one rule — RRULE:FREQ=WEEKLY;BYDAY=MO — but represents hundreds of occurrences. Now a user wants to move just this Friday's occurrence to Thursday. Another user creates the same recurring meeting in New York, and a London attendee needs to see it in their time zone across a daylight saving time boundary. Every one of these cases is a trap that silently produces wrong answers if the implementation cuts corners.

Layer in free/busy queries — "find a time when all 12 people on this team are free for 45 minutes this week" — and reminder delivery at scale, and the problem looks quite different. Google Calendar serves roughly 500 million users, each with hundreds of events. On a typical weekday at 8:50 AM, 1M+ 10-minute-before reminders need to fire within the same 5-minute window. A naive scheduled job collapses under that load.

The two engineering tensions that make this a canonical interview question are: (1) recurrence expansion — how to store one rule and serve any window of occurrences cheaply, while correctly handling DST, exceptions to the series, and per-occurrence overrides; and (2) fan-out at scale — how to turn a 9 AM reminder wave from a thundering herd into steady, exactly-once delivery across distributed workers.

Functional requirements

Event management: create, read, update, delete events (title, description, location, start, end, recurrence rule, attendees, reminders).
Invitations and RSVP: invite users by email; each attendee tracks their own RSVP status (accepted / tentative / declined / no reply).
Calendar views: list events for a user in a time range, rendered in any time zone.
Free/busy: given N users and a time range, return their combined busy intervals.
Find a meeting time: given N users and a duration, return the earliest free slot where all are available.
Reminders: notify via push and email at a configurable lead time (e.g. 10 minutes before start).
Sharing and permissions: owner, editor, free/busy-only, read-only access per calendar.
Multi-device sync: changes propagate to all devices within a few seconds.

Non-functional requirements

Correctness over throughput — a wrong reminder time or a missing DST adjustment is a user-visible bug that erodes trust.
High availability — users check calendars throughout the day; p99 read latency < 200ms.
Eventual consistency is acceptable for most reads; a change written on mobile may take seconds to appear on web.
Exactly-once reminder delivery — a double-reminder is a UX annoyance; a missed reminder is a meeting miss.

Capacity estimation

Dimension	Estimate	How we got there
Users	~500M registered; ~100M active daily	—
Event writes (avg)	~1,000 writes/sec	`5 events/user/month × 500M users = 2.5B/month ÷ (30 × 86,400 s)`
Event writes (peak)	~5,000 writes/sec	Monday morning surge, ~5× avg
Event reads (avg)	~50,000 reads/sec	50:1 read/write ratio
Event reads (peak)	~200,000 reads/sec	~4× avg at peak
Event row size	~2 KB	Title, description, attendees list, RRULE string
Event storage	~500 TB	`500M users × 500 events/user × 2 KB`; 500 events/user is an order-of-magnitude design assumption for a mature user; sharded across O(100+) nodes by user_id
Reminder load (avg)	~9,000 reminders/sec	`~2 events/active-user/day × 500M active users = 1B events/day × 80% have reminders = 800M reminders/day ÷ 86,400 s`
Reminder load (peak)	~1M+ reminders in a ~5-min window	9:00 AM on a workday; must be distributed across many workers
Attendee index size	~2.5 TB	`~10 attendees/event × 500M users × 500 events = 2.5T entries` in theory; in practice sparse (1–5 attendees/event), realistic estimate `~50B entries × ~50 bytes`

The 9 AM reminder storm is the key operational constraint: a naive single-process cron would queue millions of jobs at 8:50 AM and then saturate. See the distributed job scheduler article for the architecture that solves this.

Building up to the design

V1: One table, one time zone, no recurrence

Start with the simplest possible schema:

CREATE TABLE events (
  id         BIGINT PRIMARY KEY,
  user_id    BIGINT NOT NULL,
  title      TEXT,
  start_time TIMESTAMPTZ,
  end_time   TIMESTAMPTZ
);

You can build and demo a personal calendar in an afternoon with this. It works fine until someone asks for a weekly recurring stand-up. Storing 52 copies of the same stand-up per year is wasteful, and if the meeting time ever changes you have to update all 52 rows. So we need a better abstraction.

V2: Store recurrence as a rule string

Add an rrule TEXT column. A weekly Monday stand-up becomes:

RRULE:FREQ=WEEKLY;BYDAY=MO;UNTIL=20271231T000000Z

Now one row represents hundreds of occurrences. When a user opens "this week," the API expands the rule over the requested window and returns only the occurrences that fall in it. This is dramatically more compact — and if you change the meeting day, you update one row.

The next crack appears when a user wants to move this Friday's occurrence to Thursday — just this one time. We need a way to override a single occurrence without touching the series rule.

V3: Add an exceptions table

CREATE TABLE event_exceptions (
  event_id           BIGINT,
  original_start_utc TIMESTAMPTZ,  -- identifies which occurrence
  override_start_utc TIMESTAMPTZ,  -- null = deleted occurrence
  override_end_utc   TIMESTAMPTZ,
  override_title     TEXT,
  -- ... other override fields
  PRIMARY KEY (event_id, original_start_utc)
);

During expansion, after generating occurrences from the rule, the service checks this table: if an occurrence's original_start_utc appears here, either substitute the overridden values or exclude it entirely (deleted occurrence). The key insight is using original_start_utc — not a sequence number — to identify the occurrence, because the original slot is what ties the exception back to the series unambiguously.

Once that's in place, attendees are the next problem. A user invites 10 colleagues; each needs their own RSVP state, and the event should appear on each attendee's calendar view.

V4: Attendee index and RSVP

Add an event_attendees table. The event itself is "owned" by the organizer. Each attendee gets a row linking their user_id to the event_id, with their RSVP status. Calendar views for any user are built by joining their attendee rows against the events table.

At this point the most dangerous remaining problem is time zones. A user in New York creates a 9 AM recurring meeting. In January (EST, UTC−5) the correct UTC representation is 14:00:00 UTC; a London attendee (UTC+0 in January) sees 2:00 PM — correct. After the US clocks spring forward to EDT (UTC−4), "9 AM New York" becomes 13:00:00 UTC. If you stored a fixed offset of −05:00 instead of the IANA zone name, the next occurrence still fires at 14:00 UTC, so New York attendees see the meeting at 10:00 AM instead of 9:00 AM — one hour late.

V5: Correct time zone handling

Store two things per event: start_utc TIMESTAMPTZ for sorting and querying, and tzid TEXT (the IANA time zone name like America/New_York) for display and for recurrence expansion. When expanding a recurring event, re-derive local midnight in the originating zone for each occurrence — this handles DST automatically because the IANA zone database knows when transitions happen.

With recurrence and time zones handled correctly, we hit the final scalability wall: free/busy queries across large teams, and reminders at scale.

V6: Free/busy service and distributed reminders

A dedicated Free/Busy Service handles "show me all busy intervals for these 10 users between Monday and Friday." It reads each user's events, expands any recurrences in the window, and returns merged intervals. Results are cached per user per day in Redis (TTL = a few minutes — events can change).

Reminders are handed off to a distributed job scheduler. On event create/update, the service writes one reminder job per upcoming occurrence (up to some horizon, e.g., 30 days). The scheduler fires each job at occurrence_start - lead_time, fanning out to push/email gateways.

flowchart LR
    V1["V1: single table<br/>one box, no recurrence"] --> V2["V2: + RRULE column<br/>expand on read"]
    V2 --> V3["V3: + exceptions table<br/>per-occurrence overrides"]
    V3 --> V4["V4: + attendees + RSVP<br/>shared calendars"]
    V4 --> V5["V5: + IANA tzid<br/>correct DST handling"]
    V5 --> V6["V6: + free/busy cache<br/>+ distributed reminders"]
    style V1 fill:#0e7490,color:#fff
    style V3 fill:#15803d,color:#fff
    style V5 fill:#ff6b1a,color:#0a0a0f
    style V6 fill:#a855f7,color:#fff

The data model

-- Core event; sharded by user_id (organizer)
CREATE TABLE events (
  id           BIGINT PRIMARY KEY,
  user_id      BIGINT NOT NULL,              -- organizer
  title        TEXT,
  description  TEXT,
  start_utc    TIMESTAMPTZ NOT NULL,
  end_utc      TIMESTAMPTZ NOT NULL,
  tzid         TEXT NOT NULL,                -- IANA zone, e.g. 'America/New_York'
  rrule        TEXT,                         -- iCalendar RRULE string, NULL = one-off
  status       TEXT DEFAULT 'confirmed',     -- confirmed | tentative | cancelled
  created_at   TIMESTAMPTZ DEFAULT now(),
  updated_at   TIMESTAMPTZ DEFAULT now()
);

-- Each attendee's view of an event (RSVP state lives here)
CREATE TABLE event_attendees (
  event_id     BIGINT NOT NULL,
  user_id      BIGINT NOT NULL,
  rsvp         TEXT DEFAULT 'needsAction',   -- accepted | tentative | declined | needsAction
  is_organizer BOOLEAN DEFAULT FALSE,
  added_at     TIMESTAMPTZ DEFAULT now(),
  PRIMARY KEY (event_id, user_id)
);
CREATE INDEX ea_user ON event_attendees(user_id);  -- "all events for user X"

-- Overrides / deletions for individual occurrences of a recurring series
CREATE TABLE event_exceptions (
  event_id            BIGINT NOT NULL,
  original_start_utc  TIMESTAMPTZ NOT NULL,  -- identifies the occurrence
  is_deleted          BOOLEAN DEFAULT FALSE,
  override_start_utc  TIMESTAMPTZ,
  override_end_utc    TIMESTAMPTZ,
  override_title      TEXT,
  override_tzid       TEXT,
  PRIMARY KEY (event_id, original_start_utc)
);

Why not one table per user?

Sharding by user_id means each user's events land on the same shard (good for single-user queries). But an attendee index means a shared event's row lives on the organizer's shard; attendees' shards hold only the link. This keeps event data consistent — there is one authoritative copy — while letting each attendee's calendar view be assembled from their index rows.

Recurring events — RRULE in depth

The iCalendar standard (RFC 5545) defines RRULE as a structured string describing recurrence. Examples:

# Every weekday
RRULE:FREQ=DAILY;BYDAY=MO,TU,WE,TH,FR

# Every other week on Tuesdays, for 10 occurrences
RRULE:FREQ=WEEKLY;INTERVAL=2;BYDAY=TU;COUNT=10

# First Monday of every month
RRULE:FREQ=MONTHLY;BYDAY=1MO

# Annually on March 15
RRULE:FREQ=YEARLY;BYMONTH=3;BYMONTHDAY=15

Expand on read, not on write

There are two strategies for turning a rule into queryable occurrences:

Strategy	How	Pro	Con
Materialize all occurrences	On create, write one row per future occurrence up to some limit	Simple reads; easy indexing	Huge write amplification; rule changes require mass updates
Store rule + expand on read	Store one row; expand over the query window at read time	Compact storage; rule edits are one-row updates	Expansion must be bounded; complex queries need care

Expanding on read is the right call. A query for "this week" expands the rule over 7 days and returns at most 7 occurrences. Even FREQ=DAILY produces only 7 in a week-window. Expansion is CPU-cheap, and the window keeps it bounded. Libraries like python-dateutil (rrule / rrulestr) or ical.js implement RFC 5545 expansion correctly.

The one exception: reminder scheduling. A reminder job must fire at a specific wall-clock time, so you need to pre-materialize upcoming occurrence times up to a short horizon (e.g., 7–30 days) and register them with the job scheduler. The scheduler itself stores these as discrete jobs, not rule strings.

Here is what that expansion flow looks like for a given query window:

flowchart TD
    RQ["API query: events Mon–Fri"] --> FETCH["Fetch event rows<br/>(rrule + start_utc + tzid)"]
    FETCH --> EXP["Expand RRULE<br/>over window in local tz"]
    EXP --> CHECK["Check event_exceptions<br/>for each occurrence"]
    CHECK --> OVERRIDE["Apply overrides<br/>or drop deleted occurrences"]
    OVERRIDE --> SORT["Sort by start_utc"]
    SORT --> RESP["Return occurrence list"]
    style EXP fill:#ff6b1a,color:#0a0a0f
    style CHECK fill:#0e7490,color:#fff
    style OVERRIDE fill:#15803d,color:#fff

DST-correct expansion

import pytz
from dateutil.rrule import rrulestr
from datetime import datetime

tz = pytz.timezone("America/New_York")

# The DTSTART is the series anchor in local time
dtstart = tz.localize(datetime(2026, 1, 5, 9, 0, 0))  # 9 AM New York, Jan 5

rule = rrulestr("RRULE:FREQ=WEEKLY;BYDAY=MO", dtstart=dtstart)

# Occurrences across the DST transition (second Sunday in March)
for dt in rule.between(
    tz.localize(datetime(2026, 3, 1)), tz.localize(datetime(2026, 3, 31))
):
    print(dt.astimezone(pytz.utc))
    # March 2: 14:00 UTC (9 AM EST, UTC-5)
    # March 9: 13:00 UTC (9 AM EDT, UTC-4)  ← DST moved forward; UTC shifts by 1h

The difference is subtle but important. If you stored UTC-5 as a fixed offset, the March 9th occurrence would land at 14:00 UTC — 10:00 AM local — one hour late, every time, for everyone in that time zone until someone notices. The IANA zone name America/New_York tells the expansion library when DST transitions happen and keeps local time stable across them.

flowchart LR
    JAN["Jan 5<br/>9:00 AM EST<br/>= 14:00 UTC"]
    MAR2["Mar 2<br/>9:00 AM EST<br/>= 14:00 UTC"]
    SPRING["DST springs forward<br/>Mar 8 at 2:00 AM"]
    MAR9_IANA["Mar 9 — IANA tzid<br/>9:00 AM EDT<br/>= 13:00 UTC ✓"]
    MAR9_OFFSET["Mar 9 — fixed offset -05:00<br/>fires at 14:00 UTC<br/>= 10:00 AM local ✗"]

    JAN --> MAR2
    MAR2 --> SPRING
    SPRING --> MAR9_IANA
    SPRING --> MAR9_OFFSET

    style SPRING fill:#ffaa00,color:#0a0a0f
    style MAR9_IANA fill:#15803d,color:#fff
    style MAR9_OFFSET fill:#ff2e88,color:#fff

Rule: Always store tzid as an IANA name (from the IANA Time Zone Database), never as a numeric offset like +05:30 or -05:00.

Free/busy and "find a meeting time"

The free/busy query

Given user IDs [u1, u2, …, uN] and a time window [start, end]:

For each user, fetch all events (including recurring) in the window.
Expand recurrences; apply exceptions.
Sort intervals by start time.
Merge overlapping intervals into a list of busy intervals.
Subtract busy from the full window → free intervals.
Intersect free intervals across all N users → common free slots.

Step 6 is the interval-overlap problem. With N users and M events each, a sweep-line algorithm runs in O(N·M log M). For typical "find a meeting time" UIs (up to 20 attendees, a few dozen events each), this runs in sub-millisecond time.

sequenceDiagram
    participant C as Client
    participant FB as Free/Busy Service
    participant Cache as Redis Cache
    participant ES as Event Store

    C->>FB: findMeetingTime(users=[u1,u2,u3], window=Mon–Fri, duration=60m)
    FB->>Cache: get free/busy for u1, u2, u3 (day-granularity keys)
    Cache-->>FB: hit for u1, miss for u2 and u3
    FB->>ES: fetch events for u2, u3 in window
    ES-->>FB: raw events + exceptions
    FB->>FB: expand recurrences, merge intervals
    FB->>Cache: write u2, u3 free/busy (TTL=5m)
    FB->>FB: intersect all 3 users' free intervals
    FB-->>C: [{start: Mon 14:00, end: Mon 15:00}, ...]

Caching strategy: cache each user's busy intervals at day granularity with a short TTL (5 minutes). A cache key like freebusy:u1:2026-06-09 holds the merged busy list for that user-day. A calendar edit invalidates the affected day keys for all attendees on that event.

Permissions on free/busy

Users with only "free/busy" permission see opaque blocks (busy) rather than event titles. The Free/Busy Service must check the permission tier per user and redact event details accordingly. For external attendees (outside the organization), always return only free/busy.

Reminders and notification fan-out

The reminder storm problem

At 8:50 AM on a weekday, every 9:00 AM event needs a 10-minute-before reminder. Even if only a few percent of active users have a 9:00 AM meeting, that's millions of reminders clustering into a ~5-minute window — consistent with the capacity estimate of 1M+ reminder jobs in a peak burst.

A naive cron job scanning all events at 8:50 AM cannot handle this. The solution is to spread the work out at write time, not read time:

Pre-register reminder jobs when an event is created or updated. For each upcoming occurrence (within a 30-day horizon), write a job to the distributed scheduler: fire at occurrence_start - lead_time, payload = {event_id, user_id, occurrence_utc}.
The scheduler (see design-distributed-job-scheduler) distributes jobs across time buckets and workers, ensuring horizontal fan-out.
Idempotency key: each job is identified by (event_id, occurrence_utc, lead_time_seconds). If the scheduler fires twice (at-least-once delivery), the push/email gateway deduplicates on this key.

stateDiagram-v2
    [*] --> Pending : event created / updated
    Pending --> Scheduled : reminder job registered with scheduler
    Scheduled --> Fired : scheduler fires at trigger_time
    Fired --> Delivered : push / email gateway ACKs
    Fired --> Retried : gateway timeout → retry with same idempotency key
    Retried --> Delivered
    Delivered --> [*]
    Pending --> Cancelled : event deleted or reminder removed
    Scheduled --> Cancelled : event deleted or occurrence exception added

Handling recurring-event reminder updates: when a user edits an event's recurrence rule (e.g., changes WEEKLY to BIWEEKLY), the service must cancel all pending reminder jobs for the old series and re-register jobs for the new expansion. Use a series_version counter on the event; jobs with an outdated version silently no-op on fire.

Permission model

Role	Can read event details	Can see free/busy	Can edit event	Can invite others
Owner / organizer	Yes	Yes	Yes	Yes
Editor	Yes	Yes	Yes	No (typically)
Viewer	Yes	Yes	No	No
Free/busy only	No (opaque)	Yes	No	No

Calendars can be shared at the calendar level (all events) or via individual event invitations. Invitations generate an event_attendees row with rsvp = needsAction.

RSVP flow

sequenceDiagram
    participant Org as Organizer
    participant CS as Calendar Service
    participant Email as Email Service
    participant Att as Attendee

    Org->>CS: createEvent(attendees=[att@example.com, ...])
    CS->>CS: write event row + event_attendees rows (rsvp=needsAction)
    CS->>Email: send invite email to each attendee
    Email-->>Att: invite email with Accept / Decline links
    Att->>CS: RSVP accept (PUT /events/{id}/rsvp {status: "accepted"})
    CS->>CS: update event_attendees.rsvp for att
    CS->>Org: notify organizer of RSVP update

The event is owned by the organizer's shard. Attendee RSVP writes update the single shared row in event_attendees. This is a low-write-rate operation (each user RSVPs once per invitation), so no special sharding trick is needed.

Double-RSVP prevention

A user who clicks an email link twice (or clicks Accept on two devices simultaneously) must not create duplicate RSVP rows. The PRIMARY KEY (event_id, user_id) on event_attendees makes the upsert idempotent: INSERT ... ON CONFLICT (event_id, user_id) DO UPDATE SET rsvp = EXCLUDED.rsvp.

Multi-device sync

The sync token pattern

Rather than polling the full calendar on every app open, clients use sync tokens:

Initial sync: GET /events?start=...&end=... returns all events + sync_token = T1.
Subsequent sync: GET /events/changes?sync_token=T1 returns only events changed since T1 + a new sync_token = T2.
Client applies the delta (upsert changed events, delete removed ones) to its local store.

The Change Stream (Kafka) captures every event write. The Sync Token Service maintains a per-user log of event change IDs, stamped with a monotonically increasing token. A client's sync request is a range scan over (user_id, token > T1).

sequenceDiagram
    participant App as Mobile App
    participant API as Calendar API
    participant STS as Sync Token Service
    participant KF as Kafka

    App->>API: GET /events (initial sync)
    API-->>App: events list + sync_token=T1

    Note over App: User is offline, edits event A

    App->>API: PUT /events/A (If-Match: etag)
    API->>KF: publish change(event_A, user_id)
    KF->>STS: append change to per-user log → token T2
    API-->>App: 200 OK, new etag

    App->>API: GET /events/changes?sync_token=T1
    API->>STS: scan (user_id, token > T1)
    STS-->>API: [event_A changed]
    API-->>App: delta response + sync_token=T2

Offline edits and conflict resolution

A user edits an event on their phone while offline. When connectivity returns:

Phone sends the edit with an If-Match: {etag} header (the etag is the updated_at timestamp from the last known server version).
Server checks: if the event's current updated_at matches the etag, apply the edit and advance updated_at.
If updated_at has changed (someone else edited), return 409 Conflict. The client resolves by presenting the user with "keep mine / keep server / merge" options — or, for simple fields, last-write-wins (whoever submitted later wins).

For calendar systems, last-write-wins per event field is the standard resolution strategy. Full three-way merge (like Google Docs) is unnecessary here; calendar edits are infrequent and usually non-overlapping.

High-level architecture (full)

flowchart TD
    CLIENT[Client<br/>web / iOS / Android] --> GW[API Gateway<br/>auth + rate limit]

    GW --> ES[Event Service]
    GW --> FB[Free/Busy Service]
    GW --> SY[Sync Service]

    ES --> EVSHARD[(Event Store<br/>sharded Postgres<br/>shard key: user_id)]
    ES --> AIDX[(Attendee Index<br/>event_id × user_id)]
    ES --> EXC[(Exceptions Table)]
    ES --> KAFKA[Kafka<br/>event change stream]

    KAFKA --> NS[Notification Service]
    KAFKA --> SY
    KAFKA --> FBIN[Free/Busy Invalidator]

    NS --> SCHED[Distributed Job Scheduler]
    SCHED --> PUSH[Push Gateway<br/>FCM / APNs]
    SCHED --> EMAL[Email Gateway<br/>SES / SendGrid]

    FB --> FBCACHE[(Free/Busy Cache<br/>Redis — day-grain TTL 5m)]
    FBIN --> FBCACHE
    FBCACHE -.miss.-> EVSHARD

    SY --> SYNCTOK[(Sync Token Log<br/>append-only per user)]

    style ES fill:#ff6b1a,color:#0a0a0f
    style SCHED fill:#15803d,color:#fff
    style EVSHARD fill:#0e7490,color:#fff
    style FBCACHE fill:#a855f7,color:#fff
    style KAFKA fill:#ffaa00,color:#0a0a0f

Storage choices

Data	Store	Rationale
Events (organizer-owned)	Sharded Postgres (shard: `user_id`)	Relational joins; strong consistency for writes; ACID for RSVP upserts
Event attendees / RSVP	Same Postgres shard as event	Co-located for organizer queries; low write rate
Event exceptions	Same shard as parent event	Always accessed together with the event
Free/busy cache	Redis (day-grain keys, TTL 5 min)	Sub-millisecond interval reads; short TTL keeps staleness tolerable
Sync token log	Append-only log in Postgres or Cassandra	Sequential scans by (user_id, token); high write rate (one row per change per user)
Reminder jobs	Distributed job scheduler store	Purpose-built for time-indexed job retrieval
Audit / history	S3 + Parquet (Athena / BigQuery)	Cheap; rarely queried; full fidelity

Failure modes

DST bugs

The classic bug: store America/New_York UTC offset as -05:00 (EST). In March, after the DST spring-forward, the same recurring event fires at 10 AM instead of 9 AM. Fix: always store the IANA zone name, never a numeric offset. Validate on write: reject tzid values that don't exist in the IANA Time Zone Database.

Recurrence expansion cost

A rule like FREQ=SECONDLY or COUNT=100000 with a very wide query window can produce millions of occurrences. Mitigate:

Cap expansion: refuse any rule that would produce > 5,000 occurrences in a single API window (return a 400 with a clear error).
Validate RRULE on write; reject pathological combinations.
Time-limit the expansion: if expansion takes > 500ms, abort and return a partial result.

Reminder storms

A 9 AM reminder wave on a work day can reach millions of jobs per minute. The distributed job scheduler partitions jobs by time bucket across worker shards. If a worker falls behind, it processes its bucket's jobs as fast as it can — some reminders fire a few minutes late, which is acceptable. An idempotency key prevents double-delivery if the worker retries.

Double-RSVP

Handled by ON CONFLICT DO UPDATE — the database serializes concurrent RSVPs to the same (event_id, user_id) pair. The last writer wins.

Sync conflicts

Two devices edit different fields of the same event while offline. Server sees two conflicting etags. For calendar events, field-level last-write-wins is standard: each device's edit is applied field by field with the latest updated_at winning per field. If neither field overlaps (one device changed the title; the other changed the location), both changes are preserved.

Reminder for deleted event

A user deletes an event after its reminder jobs are already registered. Fix: on event delete, cancel all pending reminder jobs (set them to a cancelled state in the scheduler store). When the scheduler picks up a job, check whether the event still exists (and the occurrence hasn't been deleted) before firing.

API design

# Create event
POST /api/v1/calendars/{calendar_id}/events
{
  "title": "Weekly sync",
  "start": { "dateTime": "2026-06-15T09:00:00", "timeZone": "America/New_York" },
  "end":   { "dateTime": "2026-06-15T09:30:00", "timeZone": "America/New_York" },
  "recurrence": ["RRULE:FREQ=WEEKLY;BYDAY=MO"],
  "attendees": [{"email": "alice@example.com"}, {"email": "bob@example.com"}],
  "reminders": { "useDefault": false, "overrides": [{"method": "popup", "minutes": 10}] }
}

→ 201 Created { "id": "evt_abc123", "htmlLink": "..." }

# List events in a window (server expands recurrence)
GET /api/v1/calendars/{calendar_id}/events
  ?timeMin=2026-06-15T00:00:00Z
  &timeMax=2026-06-22T00:00:00Z
  &singleEvents=true   ← expand recurrence to individual instances

→ 200 OK { "items": [ { "start": ..., "end": ..., "recurringEventId": "evt_abc123" }, ... ] }

# Update a single occurrence ("this event only")
PUT /api/v1/calendars/{calendar_id}/events/{event_id}/{occurrence_start_utc}
{
  "start": { "dateTime": "2026-06-22T10:00:00", "timeZone": "America/New_York" },
  "end":   { "dateTime": "2026-06-22T10:30:00", "timeZone": "America/New_York" }
}

→ 200 OK  (writes to event_exceptions; does not modify the parent RRULE)

# Free/busy query
POST /api/v1/freeBusy
{
  "timeMin": "2026-06-15T00:00:00Z",
  "timeMax": "2026-06-20T00:00:00Z",
  "items": [{"id": "user_alice"}, {"id": "user_bob"}]
}

→ 200 OK { "calendars": { "user_alice": { "busy": [...] }, "user_bob": { "busy": [...] } } }

The API above deliberately mirrors the Google Calendar API v3 shape — not because interviewers expect you to memorize it, but because aligning with a widely-used convention signals that you understand the domain.

Things to discuss in an interview

RRULE expand-on-read vs. materialize: always defend expand-on-read for writes; materialize only reminder jobs at a bounded horizon.
DST correctness: explain why storing a numeric UTC offset is wrong and what tzid as an IANA name gives you.
Exceptions vs. editing the whole series: the (event_id, original_start_utc) PK on exceptions; how "all future" is handled (split the series: end the original before the edit point, create a new series starting there).
Free/busy as an interval problem: describe the sweep-line merge and the cross-user intersection.
Reminder storm: the distributed scheduler partitions the 9 AM wave; idempotency key prevents double delivery.
Sync tokens: delta sync avoids re-fetching the full calendar on every app open.

Things you should now be able to answer

Why is storing a UTC offset like -05:00 wrong for recurring events, and what's the correct fix?
What does the event_exceptions table contain, and why is original_start_utc the right primary key component?
How would you handle a user who changes a recurring meeting from weekly to biweekly — what happens to existing reminder jobs?
A team of 50 people needs a "find a meeting time" feature. What's the algorithm and what's the bottleneck?
How do you prevent a double reminder if the scheduler delivers a job twice?
What is the sync token pattern and why does it beat full polling?

Frequently asked questions

▸Why is storing a UTC offset like -05:00 wrong for recurring events, and what is the correct alternative?

A fixed offset breaks across DST transitions. If you store -05:00 for a 9 AM New York meeting, the March occurrence after the spring-forward still fires at 14:00 UTC, which is 10 AM local time — one hour late. The fix is to store the IANA zone name (tzid = 'America/New_York') alongside start_utc; the expansion library consults the IANA Time Zone Database to derive the correct UTC instant for each future occurrence.

▸How does a calendar system handle editing a single occurrence of a recurring event without modifying the whole series?

A separate event_exceptions table keyed by (event_id, original_start_utc) stores the override. During recurrence expansion, the service checks this table for each generated occurrence; if a match exists, it substitutes the overridden values or drops the occurrence entirely. The original_start_utc is the right key because it unambiguously identifies which slot in the series the exception belongs to, regardless of how the rule is later edited.

▸Should recurring event occurrences be materialized on write or expanded on read?

Expand on read. Storing one RRULE string per event and expanding over the queried window at read time is dramatically more compact — a rule like FREQ=DAILY produces at most 7 occurrences for a one-week query window regardless of how far into the future the series extends, and rule changes require updating a single row. The one exception is reminder scheduling: upcoming occurrence times must be pre-materialized as discrete jobs in the distributed scheduler up to a bounded horizon (7-30 days).

▸What is the peak reminder load, and why can a single cron job not handle it?

On a typical weekday at 9 AM, 1M+ reminder jobs can cluster into a roughly 5-minute window. A naive cron scanning all events at 8:50 AM saturates under that volume. The solution is to pre-register one reminder job per upcoming occurrence at event create or update time, then use a distributed job scheduler that partitions jobs across time buckets and workers; an idempotency key of (event_id, occurrence_utc, lead_time_seconds) prevents double delivery if a worker retries.

▸What is the estimated storage requirement for event data at Google Calendar scale, and how is the data sharded?

Roughly 500 TB, derived from 500M users times 500 events per user times 2 KB per event row. The event store is sharded by user_id so that all of an organizer's events land on the same shard, which keeps single-user queries local. Attendees' shards hold only a lightweight link row in event_attendees; the single authoritative copy of the event stays on the organizer's shard.

← previous

Design a Distributed Counter (view / like counts)

Design a Distributed Search Engine (Elasticsearch)

// RELATED

Frequently asked questions

You may also like

Design an LLM Observability Platform

Design an LLM Gateway (AI Gateway & Model Router)

Design an LLM Fine-Tuning Platform