Design a Calendar System (Google Calendar)
Store events, share calendars, find free slots, and fire reminders — across time zones and recurring rules. The RRULE expansion and free/busy problem.
The problem
Google Calendar, Apple Calendar, and Microsoft Outlook handle a problem that looks simple from the outside: store some events, show them in a grid, fire a reminder before each one. Any developer could wire up a basic version in an afternoon using a single database table with a start_time column.
The complexity starts when you add recurring events. A weekly stand-up has one rule — RRULE:FREQ=WEEKLY;BYDAY=MO — but represents hundreds of occurrences. Now a user wants to move just this Friday's occurrence to Thursday. Another user creates the same recurring meeting in New York, and a London attendee needs to see it in their time zone across a daylight saving time boundary. Every one of these cases is a trap that silently produces wrong answers if the implementation cuts corners.
Layer in free/busy queries — "find a time when all 12 people on this team are free for 45 minutes this week" — and reminder delivery at scale, and the problem looks quite different. Google Calendar serves roughly 500 million users, each with hundreds of events. On a typical weekday at 8:50 AM, 1M+ 10-minute-before reminders need to fire within the same 5-minute window. A naive scheduled job collapses under that load.
The two engineering tensions that make this a canonical interview question are: (1) recurrence expansion — how to store one rule and serve any window of occurrences cheaply, while correctly handling DST, exceptions to the series, and per-occurrence overrides; and (2) fan-out at scale — how to turn a 9 AM reminder wave from a thundering herd into steady, exactly-once delivery across distributed workers.
Functional requirements
- Event management: create, read, update, delete events (title, description, location, start, end, recurrence rule, attendees, reminders).
- Invitations and RSVP: invite users by email; each attendee tracks their own RSVP status (accepted / tentative / declined / no reply).
- Calendar views: list events for a user in a time range, rendered in any time zone.
- Free/busy: given N users and a time range, return their combined busy intervals.
- Find a meeting time: given N users and a duration, return the earliest free slot where all are available.
- Reminders: notify via push and email at a configurable lead time (e.g. 10 minutes before start).
- Sharing and permissions: owner, editor, free/busy-only, read-only access per calendar.
- Multi-device sync: changes propagate to all devices within a few seconds.
Non-functional requirements
- Correctness over throughput — a wrong reminder time or a missing DST adjustment is a user-visible bug that erodes trust.
- High availability — users check calendars throughout the day; p99 read latency < 200ms.
- Eventual consistency is acceptable for most reads; a change written on mobile may take seconds to appear on web.
- Exactly-once reminder delivery — a double-reminder is a UX annoyance; a missed reminder is a meeting miss.
Capacity estimation
| Dimension | Estimate | How we got there |
|---|---|---|
| Users | ~500M registered; ~100M active daily | — |
| Event writes (avg) | ~1,000 writes/sec | 5 events/user/month × 500M users = 2.5B/month ÷ (30 × 86,400 s) |
| Event writes (peak) | ~5,000 writes/sec | Monday morning surge, ~5× avg |
| Event reads (avg) | ~50,000 reads/sec | 50:1 read/write ratio |
| Event reads (peak) | ~200,000 reads/sec | ~4× avg at peak |
| Event row size | ~2 KB | Title, description, attendees list, RRULE string |
| Event storage | ~500 TB | 500M users × 500 events/user × 2 KB; 500 events/user is an order-of-magnitude design assumption for a mature user; sharded across O(100+) nodes by user_id |
| Reminder load (avg) | ~9,000 reminders/sec | ~2 events/active-user/day × 500M active users = 1B events/day × 80% have reminders = 800M reminders/day ÷ 86,400 s |
| Reminder load (peak) | ~1M+ reminders in a ~5-min window | 9:00 AM on a workday; must be distributed across many workers |
| Attendee index size | ~2.5 TB | ~10 attendees/event × 500M users × 500 events = 2.5T entries in theory; in practice sparse (1–5 attendees/event), realistic estimate ~50B entries × ~50 bytes |
The 9 AM reminder storm is the key operational constraint: a naive single-process cron would queue millions of jobs at 8:50 AM and then saturate. See the distributed job scheduler article for the architecture that solves this.
Building up to the design
V1: One table, one time zone, no recurrence
Start with the simplest possible schema:
CREATE TABLE events (
id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
title TEXT,
start_time TIMESTAMPTZ,
end_time TIMESTAMPTZ
);
You can build and demo a personal calendar in an afternoon with this. It works fine until someone asks for a weekly recurring stand-up. Storing 52 copies of the same stand-up per year is wasteful, and if the meeting time ever changes you have to update all 52 rows. So we need a better abstraction.
V2: Store recurrence as a rule string
Add an rrule TEXT column. A weekly Monday stand-up becomes:
RRULE:FREQ=WEEKLY;BYDAY=MO;UNTIL=20271231T000000Z
Now one row represents hundreds of occurrences. When a user opens "this week," the API expands the rule over the requested window and returns only the occurrences that fall in it. This is dramatically more compact — and if you change the meeting day, you update one row.
The next crack appears when a user wants to move this Friday's occurrence to Thursday — just this one time. We need a way to override a single occurrence without touching the series rule.
V3: Add an exceptions table
CREATE TABLE event_exceptions (
event_id BIGINT,
original_start_utc TIMESTAMPTZ, -- identifies which occurrence
override_start_utc TIMESTAMPTZ, -- null = deleted occurrence
override_end_utc TIMESTAMPTZ,
override_title TEXT,
-- ... other override fields
PRIMARY KEY (event_id, original_start_utc)
);
During expansion, after generating occurrences from the rule, the service checks this table: if an occurrence's original_start_utc appears here, either substitute the overridden values or exclude it entirely (deleted occurrence). The key insight is using original_start_utc — not a sequence number — to identify the occurrence, because the original slot is what ties the exception back to the series unambiguously.
Once that's in place, attendees are the next problem. A user invites 10 colleagues; each needs their own RSVP state, and the event should appear on each attendee's calendar view.
V4: Attendee index and RSVP
Add an event_attendees table. The event itself is "owned" by the organizer. Each attendee gets a row linking their user_id to the event_id, with their RSVP status. Calendar views for any user are built by joining their attendee rows against the events table.
At this point the most dangerous remaining problem is time zones. A user in New York creates a 9 AM recurring meeting. In January (EST, UTC−5) the correct UTC representation is 14:00:00 UTC; a London attendee (UTC+0 in January) sees 2:00 PM — correct. After the US clocks spring forward to EDT (UTC−4), "9 AM New York" becomes 13:00:00 UTC. If you stored a fixed offset of −05:00 instead of the IANA zone name, the next occurrence still fires at 14:00 UTC, so New York attendees see the meeting at 10:00 AM instead of 9:00 AM — one hour late.
V5: Correct time zone handling
Store two things per event: start_utc TIMESTAMPTZ for sorting and querying, and tzid TEXT (the IANA time zone name like America/New_York) for display and for recurrence expansion. When expanding a recurring event, re-derive local midnight in the originating zone for each occurrence — this handles DST automatically because the IANA zone database knows when transitions happen.
With recurrence and time zones handled correctly, we hit the final scalability wall: free/busy queries across large teams, and reminders at scale.
V6: Free/busy service and distributed reminders
A dedicated Free/Busy Service handles "show me all busy intervals for these 10 users between Monday and Friday." It reads each user's events, expands any recurrences in the window, and returns merged intervals. Results are cached per user per day in Redis (TTL = a few minutes — events can change).
Reminders are handed off to a distributed job scheduler. On event create/update, the service writes one reminder job per upcoming occurrence (up to some horizon, e.g., 30 days). The scheduler fires each job at occurrence_start - lead_time, fanning out to push/email gateways.
flowchart LR
V1["V1: single table<br/>one box, no recurrence"] --> V2["V2: + RRULE column<br/>expand on read"]
V2 --> V3["V3: + exceptions table<br/>per-occurrence overrides"]
V3 --> V4["V4: + attendees + RSVP<br/>shared calendars"]
V4 --> V5["V5: + IANA tzid<br/>correct DST handling"]
V5 --> V6["V6: + free/busy cache<br/>+ distributed reminders"]
style V1 fill:#0e7490,color:#fff
style V3 fill:#15803d,color:#fff
style V5 fill:#ff6b1a,color:#0a0a0f
style V6 fill:#a855f7,color:#fff
The data model
-- Core event; sharded by user_id (organizer)
CREATE TABLE events (
id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL, -- organizer
title TEXT,
description TEXT,
start_utc TIMESTAMPTZ NOT NULL,
end_utc TIMESTAMPTZ NOT NULL,
tzid TEXT NOT NULL, -- IANA zone, e.g. 'America/New_York'
rrule TEXT, -- iCalendar RRULE string, NULL = one-off
status TEXT DEFAULT 'confirmed', -- confirmed | tentative | cancelled
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
-- Each attendee's view of an event (RSVP state lives here)
CREATE TABLE event_attendees (
event_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
rsvp TEXT DEFAULT 'needsAction', -- accepted | tentative | declined | needsAction
is_organizer BOOLEAN DEFAULT FALSE,
added_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (event_id, user_id)
);
CREATE INDEX ea_user ON event_attendees(user_id); -- "all events for user X"
-- Overrides / deletions for individual occurrences of a recurring series
CREATE TABLE event_exceptions (
event_id BIGINT NOT NULL,
original_start_utc TIMESTAMPTZ NOT NULL, -- identifies the occurrence
is_deleted BOOLEAN DEFAULT FALSE,
override_start_utc TIMESTAMPTZ,
override_end_utc TIMESTAMPTZ,
override_title TEXT,
override_tzid TEXT,
PRIMARY KEY (event_id, original_start_utc)
);
Why not one table per user?
Sharding by user_id means each user's events land on the same shard (good for single-user queries). But an attendee index means a shared event's row lives on the organizer's shard; attendees' shards hold only the link. This keeps event data consistent — there is one authoritative copy — while letting each attendee's calendar view be assembled from their index rows.
Recurring events — RRULE in depth
The iCalendar standard (RFC 5545) defines RRULE as a structured string describing recurrence. Examples:
# Every weekday
RRULE:FREQ=DAILY;BYDAY=MO,TU,WE,TH,FR
# Every other week on Tuesdays, for 10 occurrences
RRULE:FREQ=WEEKLY;INTERVAL=2;BYDAY=TU;COUNT=10
# First Monday of every month
RRULE:FREQ=MONTHLY;BYDAY=1MO
# Annually on March 15
RRULE:FREQ=YEARLY;BYMONTH=3;BYMONTHDAY=15
Expand on read, not on write
There are two strategies for turning a rule into queryable occurrences:
| Strategy | How | Pro | Con |
|---|---|---|---|
| Materialize all occurrences | On create, write one row per future occurrence up to some limit | Simple reads; easy indexing | Huge write amplification; rule changes require mass updates |
| Store rule + expand on read | Store one row; expand over the query window at read time | Compact storage; rule edits are one-row updates | Expansion must be bounded; complex queries need care |
Expanding on read is the right call. A query for "this week" expands the rule over 7 days and returns at most 7 occurrences. Even FREQ=DAILY produces only 7 in a week-window. Expansion is CPU-cheap, and the window keeps it bounded. Libraries like python-dateutil (rrule / rrulestr) or ical.js implement RFC 5545 expansion correctly.
The one exception: reminder scheduling. A reminder job must fire at a specific wall-clock time, so you need to pre-materialize upcoming occurrence times up to a short horizon (e.g., 7–30 days) and register them with the job scheduler. The scheduler itself stores these as discrete jobs, not rule strings.
Here is what that expansion flow looks like for a given query window:
flowchart TD
RQ["API query: events Mon–Fri"] --> FETCH["Fetch event rows<br/>(rrule + start_utc + tzid)"]
FETCH --> EXP["Expand RRULE<br/>over window in local tz"]
EXP --> CHECK["Check event_exceptions<br/>for each occurrence"]
CHECK --> OVERRIDE["Apply overrides<br/>or drop deleted occurrences"]
OVERRIDE --> SORT["Sort by start_utc"]
SORT --> RESP["Return occurrence list"]
style EXP fill:#ff6b1a,color:#0a0a0f
style CHECK fill:#0e7490,color:#fff
style OVERRIDE fill:#15803d,color:#fff
DST-correct expansion
import pytz
from dateutil.rrule import rrulestr
from datetime import datetime
tz = pytz.timezone("America/New_York")
# The DTSTART is the series anchor in local time
dtstart = tz.localize(datetime(2026, 1, 5, 9, 0, 0)) # 9 AM New York, Jan 5
rule = rrulestr("RRULE:FREQ=WEEKLY;BYDAY=MO", dtstart=dtstart)
# Occurrences across the DST transition (second Sunday in March)
for dt in rule.between(
tz.localize(datetime(2026, 3, 1)), tz.localize(datetime(2026, 3, 31))
):
print(dt.astimezone(pytz.utc))
# March 2: 14:00 UTC (9 AM EST, UTC-5)
# March 9: 13:00 UTC (9 AM EDT, UTC-4) ← DST moved forward; UTC shifts by 1h
The difference is subtle but important. If you stored UTC-5 as a fixed offset, the March 9th occurrence would land at 14:00 UTC — 10:00 AM local — one hour late, every time, for everyone in that time zone until someone notices. The IANA zone name America/New_York tells the expansion library when DST transitions happen and keeps local time stable across them.
flowchart LR
JAN["Jan 5<br/>9:00 AM EST<br/>= 14:00 UTC"]
MAR2["Mar 2<br/>9:00 AM EST<br/>= 14:00 UTC"]
SPRING["DST springs forward<br/>Mar 8 at 2:00 AM"]
MAR9_IANA["Mar 9 — IANA tzid<br/>9:00 AM EDT<br/>= 13:00 UTC ✓"]
MAR9_OFFSET["Mar 9 — fixed offset -05:00<br/>fires at 14:00 UTC<br/>= 10:00 AM local ✗"]
JAN --> MAR2
MAR2 --> SPRING
SPRING --> MAR9_IANA
SPRING --> MAR9_OFFSET
style SPRING fill:#ffaa00,color:#0a0a0f
style MAR9_IANA fill:#15803d,color:#fff
style MAR9_OFFSET fill:#ff2e88,color:#fff
Rule: Always store tzid as an IANA name (from the IANA Time Zone Database), never as a numeric offset like +05:30 or -05:00.
Free/busy and "find a meeting time"
The free/busy query
Given user IDs [u1, u2, …, uN] and a time window [start, end]:
- For each user, fetch all events (including recurring) in the window.
- Expand recurrences; apply exceptions.
- Sort intervals by start time.
- Merge overlapping intervals into a list of busy intervals.
- Subtract busy from the full window → free intervals.
- Intersect free intervals across all N users → common free slots.
Step 6 is the interval-overlap problem. With N users and M events each, a sweep-line algorithm runs in O(N·M log M). For typical "find a meeting time" UIs (up to 20 attendees, a few dozen events each), this runs in sub-millisecond time.
sequenceDiagram
participant C as Client
participant FB as Free/Busy Service
participant Cache as Redis Cache
participant ES as Event Store
C->>FB: findMeetingTime(users=[u1,u2,u3], window=Mon–Fri, duration=60m)
FB->>Cache: get free/busy for u1, u2, u3 (day-granularity keys)
Cache-->>FB: hit for u1, miss for u2 and u3
FB->>ES: fetch events for u2, u3 in window
ES-->>FB: raw events + exceptions
FB->>FB: expand recurrences, merge intervals
FB->>Cache: write u2, u3 free/busy (TTL=5m)
FB->>FB: intersect all 3 users' free intervals
FB-->>C: [{start: Mon 14:00, end: Mon 15:00}, ...]
Caching strategy: cache each user's busy intervals at day granularity with a short TTL (5 minutes). A cache key like freebusy:u1:2026-06-09 holds the merged busy list for that user-day. A calendar edit invalidates the affected day keys for all attendees on that event.
Permissions on free/busy
Users with only "free/busy" permission see opaque blocks (busy) rather than event titles. The Free/Busy Service must check the permission tier per user and redact event details accordingly. For external attendees (outside the organization), always return only free/busy.
Reminders and notification fan-out
The reminder storm problem
At 8:50 AM on a weekday, every 9:00 AM event needs a 10-minute-before reminder. Even if only a few percent of active users have a 9:00 AM meeting, that's millions of reminders clustering into a ~5-minute window — consistent with the capacity estimate of 1M+ reminder jobs in a peak burst.
A naive cron job scanning all events at 8:50 AM cannot handle this. The solution is to spread the work out at write time, not read time:
- Pre-register reminder jobs when an event is created or updated. For each upcoming occurrence (within a 30-day horizon), write a job to the distributed scheduler:
fire at occurrence_start - lead_time, payload = {event_id, user_id, occurrence_utc}. - The scheduler (see design-distributed-job-scheduler) distributes jobs across time buckets and workers, ensuring horizontal fan-out.
- Idempotency key: each job is identified by
(event_id, occurrence_utc, lead_time_seconds). If the scheduler fires twice (at-least-once delivery), the push/email gateway deduplicates on this key.
stateDiagram-v2
[*] --> Pending : event created / updated
Pending --> Scheduled : reminder job registered with scheduler
Scheduled --> Fired : scheduler fires at trigger_time
Fired --> Delivered : push / email gateway ACKs
Fired --> Retried : gateway timeout → retry with same idempotency key
Retried --> Delivered
Delivered --> [*]
Pending --> Cancelled : event deleted or reminder removed
Scheduled --> Cancelled : event deleted or occurrence exception added
Handling recurring-event reminder updates: when a user edits an event's recurrence rule (e.g., changes WEEKLY to BIWEEKLY), the service must cancel all pending reminder jobs for the old series and re-register jobs for the new expansion. Use a series_version counter on the event; jobs with an outdated version silently no-op on fire.
Sharing, permissions, and invitations
Permission model
| Role | Can read event details | Can see free/busy | Can edit event | Can invite others |
|---|---|---|---|---|
| Owner / organizer | Yes | Yes | Yes | Yes |
| Editor | Yes | Yes | Yes | No (typically) |
| Viewer | Yes | Yes | No | No |
| Free/busy only | No (opaque) | Yes | No | No |
Calendars can be shared at the calendar level (all events) or via individual event invitations. Invitations generate an event_attendees row with rsvp = needsAction.
RSVP flow
sequenceDiagram
participant Org as Organizer
participant CS as Calendar Service
participant Email as Email Service
participant Att as Attendee
Org->>CS: createEvent(attendees=[att@example.com, ...])
CS->>CS: write event row + event_attendees rows (rsvp=needsAction)
CS->>Email: send invite email to each attendee
Email-->>Att: invite email with Accept / Decline links
Att->>CS: RSVP accept (PUT /events/{id}/rsvp {status: "accepted"})
CS->>CS: update event_attendees.rsvp for att
CS->>Org: notify organizer of RSVP update
The event is owned by the organizer's shard. Attendee RSVP writes update the single shared row in event_attendees. This is a low-write-rate operation (each user RSVPs once per invitation), so no special sharding trick is needed.
Double-RSVP prevention
A user who clicks an email link twice (or clicks Accept on two devices simultaneously) must not create duplicate RSVP rows. The PRIMARY KEY (event_id, user_id) on event_attendees makes the upsert idempotent: INSERT ... ON CONFLICT (event_id, user_id) DO UPDATE SET rsvp = EXCLUDED.rsvp.
Multi-device sync
The sync token pattern
Rather than polling the full calendar on every app open, clients use sync tokens:
- Initial sync:
GET /events?start=...&end=...returns all events +sync_token = T1. - Subsequent sync:
GET /events/changes?sync_token=T1returns only events changed sinceT1+ a newsync_token = T2. - Client applies the delta (upsert changed events, delete removed ones) to its local store.
The Change Stream (Kafka) captures every event write. The Sync Token Service maintains a per-user log of event change IDs, stamped with a monotonically increasing token. A client's sync request is a range scan over (user_id, token > T1).
sequenceDiagram
participant App as Mobile App
participant API as Calendar API
participant STS as Sync Token Service
participant KF as Kafka
App->>API: GET /events (initial sync)
API-->>App: events list + sync_token=T1
Note over App: User is offline, edits event A
App->>API: PUT /events/A (If-Match: etag)
API->>KF: publish change(event_A, user_id)
KF->>STS: append change to per-user log → token T2
API-->>App: 200 OK, new etag
App->>API: GET /events/changes?sync_token=T1
API->>STS: scan (user_id, token > T1)
STS-->>API: [event_A changed]
API-->>App: delta response + sync_token=T2
Offline edits and conflict resolution
A user edits an event on their phone while offline. When connectivity returns:
- Phone sends the edit with an
If-Match: {etag}header (the etag is theupdated_attimestamp from the last known server version). - Server checks: if the event's current
updated_atmatches the etag, apply the edit and advanceupdated_at. - If
updated_athas changed (someone else edited), return 409 Conflict. The client resolves by presenting the user with "keep mine / keep server / merge" options — or, for simple fields, last-write-wins (whoever submitted later wins).
For calendar systems, last-write-wins per event field is the standard resolution strategy. Full three-way merge (like Google Docs) is unnecessary here; calendar edits are infrequent and usually non-overlapping.
High-level architecture (full)
flowchart TD
CLIENT[Client<br/>web / iOS / Android] --> GW[API Gateway<br/>auth + rate limit]
GW --> ES[Event Service]
GW --> FB[Free/Busy Service]
GW --> SY[Sync Service]
ES --> EVSHARD[(Event Store<br/>sharded Postgres<br/>shard key: user_id)]
ES --> AIDX[(Attendee Index<br/>event_id × user_id)]
ES --> EXC[(Exceptions Table)]
ES --> KAFKA[Kafka<br/>event change stream]
KAFKA --> NS[Notification Service]
KAFKA --> SY
KAFKA --> FBIN[Free/Busy Invalidator]
NS --> SCHED[Distributed Job Scheduler]
SCHED --> PUSH[Push Gateway<br/>FCM / APNs]
SCHED --> EMAL[Email Gateway<br/>SES / SendGrid]
FB --> FBCACHE[(Free/Busy Cache<br/>Redis — day-grain TTL 5m)]
FBIN --> FBCACHE
FBCACHE -.miss.-> EVSHARD
SY --> SYNCTOK[(Sync Token Log<br/>append-only per user)]
style ES fill:#ff6b1a,color:#0a0a0f
style SCHED fill:#15803d,color:#fff
style EVSHARD fill:#0e7490,color:#fff
style FBCACHE fill:#a855f7,color:#fff
style KAFKA fill:#ffaa00,color:#0a0a0f
Storage choices
| Data | Store | Rationale |
|---|---|---|
| Events (organizer-owned) | Sharded Postgres (shard: user_id) | Relational joins; strong consistency for writes; ACID for RSVP upserts |
| Event attendees / RSVP | Same Postgres shard as event | Co-located for organizer queries; low write rate |
| Event exceptions | Same shard as parent event | Always accessed together with the event |
| Free/busy cache | Redis (day-grain keys, TTL 5 min) | Sub-millisecond interval reads; short TTL keeps staleness tolerable |
| Sync token log | Append-only log in Postgres or Cassandra | Sequential scans by (user_id, token); high write rate (one row per change per user) |
| Reminder jobs | Distributed job scheduler store | Purpose-built for time-indexed job retrieval |
| Audit / history | S3 + Parquet (Athena / BigQuery) | Cheap; rarely queried; full fidelity |
Failure modes
DST bugs
The classic bug: store America/New_York UTC offset as -05:00 (EST). In March, after the DST spring-forward, the same recurring event fires at 10 AM instead of 9 AM. Fix: always store the IANA zone name, never a numeric offset. Validate on write: reject tzid values that don't exist in the IANA Time Zone Database.
Recurrence expansion cost
A rule like FREQ=SECONDLY or COUNT=100000 with a very wide query window can produce millions of occurrences. Mitigate:
- Cap expansion: refuse any rule that would produce > 5,000 occurrences in a single API window (return a 400 with a clear error).
- Validate RRULE on write; reject pathological combinations.
- Time-limit the expansion: if expansion takes > 500ms, abort and return a partial result.
Reminder storms
A 9 AM reminder wave on a work day can reach millions of jobs per minute. The distributed job scheduler partitions jobs by time bucket across worker shards. If a worker falls behind, it processes its bucket's jobs as fast as it can — some reminders fire a few minutes late, which is acceptable. An idempotency key prevents double-delivery if the worker retries.
Double-RSVP
Handled by ON CONFLICT DO UPDATE — the database serializes concurrent RSVPs to the same (event_id, user_id) pair. The last writer wins.
Sync conflicts
Two devices edit different fields of the same event while offline. Server sees two conflicting etags. For calendar events, field-level last-write-wins is standard: each device's edit is applied field by field with the latest updated_at winning per field. If neither field overlaps (one device changed the title; the other changed the location), both changes are preserved.
Reminder for deleted event
A user deletes an event after its reminder jobs are already registered. Fix: on event delete, cancel all pending reminder jobs (set them to a cancelled state in the scheduler store). When the scheduler picks up a job, check whether the event still exists (and the occurrence hasn't been deleted) before firing.
API design
# Create event
POST /api/v1/calendars/{calendar_id}/events
{
"title": "Weekly sync",
"start": { "dateTime": "2026-06-15T09:00:00", "timeZone": "America/New_York" },
"end": { "dateTime": "2026-06-15T09:30:00", "timeZone": "America/New_York" },
"recurrence": ["RRULE:FREQ=WEEKLY;BYDAY=MO"],
"attendees": [{"email": "alice@example.com"}, {"email": "bob@example.com"}],
"reminders": { "useDefault": false, "overrides": [{"method": "popup", "minutes": 10}] }
}
→ 201 Created { "id": "evt_abc123", "htmlLink": "..." }
# List events in a window (server expands recurrence)
GET /api/v1/calendars/{calendar_id}/events
?timeMin=2026-06-15T00:00:00Z
&timeMax=2026-06-22T00:00:00Z
&singleEvents=true ← expand recurrence to individual instances
→ 200 OK { "items": [ { "start": ..., "end": ..., "recurringEventId": "evt_abc123" }, ... ] }
# Update a single occurrence ("this event only")
PUT /api/v1/calendars/{calendar_id}/events/{event_id}/{occurrence_start_utc}
{
"start": { "dateTime": "2026-06-22T10:00:00", "timeZone": "America/New_York" },
"end": { "dateTime": "2026-06-22T10:30:00", "timeZone": "America/New_York" }
}
→ 200 OK (writes to event_exceptions; does not modify the parent RRULE)
# Free/busy query
POST /api/v1/freeBusy
{
"timeMin": "2026-06-15T00:00:00Z",
"timeMax": "2026-06-20T00:00:00Z",
"items": [{"id": "user_alice"}, {"id": "user_bob"}]
}
→ 200 OK { "calendars": { "user_alice": { "busy": [...] }, "user_bob": { "busy": [...] } } }
The API above deliberately mirrors the Google Calendar API v3 shape — not because interviewers expect you to memorize it, but because aligning with a widely-used convention signals that you understand the domain.
Things to discuss in an interview
- RRULE expand-on-read vs. materialize: always defend expand-on-read for writes; materialize only reminder jobs at a bounded horizon.
- DST correctness: explain why storing a numeric UTC offset is wrong and what
tzidas an IANA name gives you. - Exceptions vs. editing the whole series: the
(event_id, original_start_utc)PK on exceptions; how "all future" is handled (split the series: end the original before the edit point, create a new series starting there). - Free/busy as an interval problem: describe the sweep-line merge and the cross-user intersection.
- Reminder storm: the distributed scheduler partitions the 9 AM wave; idempotency key prevents double delivery.
- Sync tokens: delta sync avoids re-fetching the full calendar on every app open.
Things you should now be able to answer
- Why is storing a UTC offset like
-05:00wrong for recurring events, and what's the correct fix? - What does the
event_exceptionstable contain, and why isoriginal_start_utcthe right primary key component? - How would you handle a user who changes a recurring meeting from weekly to biweekly — what happens to existing reminder jobs?
- A team of 50 people needs a "find a meeting time" feature. What's the algorithm and what's the bottleneck?
- How do you prevent a double reminder if the scheduler delivers a job twice?
- What is the sync token pattern and why does it beat full polling?
Further reading
- RFC 5545 — Internet Calendaring and Scheduling Core Object Specification (iCalendar)
- RFC 5546 — iCalendar Transport-Independent Interoperability Protocol (iTIP) — the invitation and RSVP protocol
- IANA Time Zone Database (tzdata) — the canonical source for IANA zone names and DST rules
- Google Calendar API v3 Reference — well-documented real-world implementation of the concepts above
- Design a Distributed Job Scheduler — the reminder fan-out engine
- Design a Notification System — push and email delivery at scale
- Consistent Hashing — how to shard the event store without hotspots
Frequently asked questions
▸Why is storing a UTC offset like -05:00 wrong for recurring events, and what is the correct alternative?
A fixed offset breaks across DST transitions. If you store -05:00 for a 9 AM New York meeting, the March occurrence after the spring-forward still fires at 14:00 UTC, which is 10 AM local time — one hour late. The fix is to store the IANA zone name (tzid = 'America/New_York') alongside start_utc; the expansion library consults the IANA Time Zone Database to derive the correct UTC instant for each future occurrence.
▸How does a calendar system handle editing a single occurrence of a recurring event without modifying the whole series?
A separate event_exceptions table keyed by (event_id, original_start_utc) stores the override. During recurrence expansion, the service checks this table for each generated occurrence; if a match exists, it substitutes the overridden values or drops the occurrence entirely. The original_start_utc is the right key because it unambiguously identifies which slot in the series the exception belongs to, regardless of how the rule is later edited.
▸Should recurring event occurrences be materialized on write or expanded on read?
Expand on read. Storing one RRULE string per event and expanding over the queried window at read time is dramatically more compact — a rule like FREQ=DAILY produces at most 7 occurrences for a one-week query window regardless of how far into the future the series extends, and rule changes require updating a single row. The one exception is reminder scheduling: upcoming occurrence times must be pre-materialized as discrete jobs in the distributed scheduler up to a bounded horizon (7-30 days).
▸What is the peak reminder load, and why can a single cron job not handle it?
On a typical weekday at 9 AM, 1M+ reminder jobs can cluster into a roughly 5-minute window. A naive cron scanning all events at 8:50 AM saturates under that volume. The solution is to pre-register one reminder job per upcoming occurrence at event create or update time, then use a distributed job scheduler that partitions jobs across time buckets and workers; an idempotency key of (event_id, occurrence_utc, lead_time_seconds) prevents double delivery if a worker retries.
▸What is the estimated storage requirement for event data at Google Calendar scale, and how is the data sharded?
Roughly 500 TB, derived from 500M users times 500 events per user times 2 KB per event row. The event store is sharded by user_id so that all of an organizer's events land on the same shard, which keeps single-user queries local. Attendees' shards hold only a lightweight link row in event_attendees; the single authoritative copy of the event stays on the organizer's shard.
You may also like
Design an LLM Observability Platform
Build the distributed tracing backbone for non-deterministic, multi-step LLM applications — capturing every prompt, completion, token count, and dollar cost across chains, retrievals, and tool calls so you can debug a failed agent run and account for every cent.
Design an LLM Gateway (AI Gateway & Model Router)
A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.
Design an LLM Fine-Tuning Platform
Turn a base model and a dataset into a deployed fine-tuned adapter at scale — the end-to-end platform covering dataset ingestion, LoRA/QLoRA/DPO training, fault-tolerant distributed GPU scheduling, eval gating, and multi-LoRA serving for hundreds of concurrent fine-tunes.