~/articles/design-email-service
◆◆◆Advancedasked at Googleasked at Microsoftasked at Amazon

Design an Email Service (Gmail)

Send, receive, store, and search email for hundreds of millions of users. SMTP ingestion, sharded mailbox storage, full-text search, and spam filtering.

23 min read2026-05-03Ironclad Academy
// DEPTH
the full breakdown — requirements, capacity, evolution, trade-offs

The problem

Gmail handles roughly 1.8 billion active users and, by Google's own estimates, processes hundreds of billions of messages every day. Behind the clean compose window is one of the oldest distributed systems still in widespread production use — SMTP was standardized in 1982, and every email you send still touches that protocol at some point in its journey.

At its core, email is a store-and-forward messaging system. A user composes a message; your service queues it, looks up the recipient's mail server (MX) in DNS, opens an SMTP connection, and hands it off. Inbound works in reverse: a remote mail server connects to your SMTP ingestion endpoint, transfers the message, and your system writes it durably to the recipient's mailbox. The user then fetches their mail via a web API or a legacy IMAP client. Gmail, Outlook, Yahoo Mail, and Fastmail all run this same two-sided exchange.

The engineering tension comes from three places at once. First, durability is non-negotiable: the moment your SMTP server sends 250 OK, you have made a binding delivery commitment — any message lost after that point violates the protocol and destroys trust. Second, the write volume is punishing: 500 million active users receiving 25 messages a day averages 145,000 inbound messages per second, and each one must be spam-checked, body-stored in blob storage, and metadata-indexed before you say 250. Third, the data model spans wildly different access patterns — users want sub-200ms mailbox list loads, sub-second full-text search over years of archived mail, and attachment deduplication across mass CC storms — all from the same underlying store.

Understanding how to decompose those sub-problems — inbound pipeline, outbound delivery queue, mailbox storage split, spam filtering, and per-user search — is exactly what the interview tests. The individual components are not exotic; what matters is that you name the right split points and explain why.

Functional requirements

  • Users send and receive email via web and mobile APIs.
  • Inbound server-to-server transfer uses SMTP; users do not speak SMTP directly.
  • Users can organize mail with labels and folders; mark read/unread; thread conversations.
  • Full-text search over a user's mailbox.
  • Attachments up to ~25 MB per message.
  • (Optional) Spam/phishing filtering visible to the user; virus scanning on attachments.

Non-functional requirements

  • Durability: no message loss after ingestion — a dropped email is a catastrophe.
  • Availability: 99.99%+ for the read path; brief ingestion delays are tolerable.
  • Inbound latency: mail delivered to inbox within seconds of SMTP acceptance.
  • Read latency: message list / thread view < 200ms p99.
  • Search latency: full-text results < 1s p95.
  • Deliverability: outbound mail must not be flagged as spam by recipients; SPF/DKIM/DMARC must be configured correctly.

Capacity

DimensionEstimateHow we got there
Active users500M (design scale; Gmail's real count is ~1.8B as of 2025)Design exercise target
Avg messages received25 msgs / user / dayMix of personal, transactional, and newsletters; varies widely by user type
Inbound rate (avg)~145,000 emails/sec500M × 25 ÷ 86,400
Inbound rate (peak)~290,000 emails/sec2× avg; bulk/transactional bursts arrive unevenly
Avg message size (raw)75 KBHeaders ~5 KB, body ~20 KB, remainder attachments; real-world averages shift with attachment ratio
Avg message size (after dedup + compression)~30 KB~2.5× savings assumed
Write throughput (raw)~10.9 GB/sec145,000 msg/s × 75 KB
Storage added per day (raw)~940 TB/day10.9 GB/s × 86,400 s
Storage added per day (net, after dedup + compression)~375 TB/day940 TB ÷ 2.5
Per-user storage per year (uncompressed)~685 MB/year25 msg/day × 75 KB × 365 days
Per-user storage after 10 years~6.9 GBWithin real free-tier quotas
Outbound rate~15,000–45,000 sends/secTypically 10–30% of inbound volume
Search QPS (global peak)~2,000/sec0.35 searches/user/day × 500M ÷ 86,400; each query touches only one user's index — isolation is free

Takeaway: Storage throughput dominates: 145,000 inbound messages per second at 75 KB each means ~10.9 GB/sec raw write throughput and ~940 TB of new data every day before dedup and compression.

A per-user inverted index at 500M users is large, but only a fraction of users search actively. Hot indexes live in memory; cold ones are loaded from disk on demand.

Protocol boundary — what SMTP does and doesn't do

This is consistently misunderstood in interviews. SMTP is the server-to-server transfer protocol — used when an external mail server delivers a message to your ingestion servers, and when your egress servers deliver outbound mail to a recipient's MX. End users do not speak SMTP to Gmail's servers in 2025; they use the HTTP/HTTPS-based web app or mobile API. Legacy clients (Outlook, Thunderbird) use IMAP (read/sync) or POP3 (download-and-delete) to access mailboxes — keep these in scope for completeness but out of the critical path.

flowchart LR
    EXT[External mail server] -.SMTP port 25.-> SMTPIN[SMTP ingestion]
    SMTPOUT[SMTP egress] -.SMTP port 25 / 587.-> EXT2[Recipient MX]
    APP[User browser / app] -.HTTPS REST.-> API[API Gateway]
    LEGACY[IMAP / POP3 clients] -.IMAP port 993.-> IMAP[IMAP server]
    style SMTPIN fill:#0e7490,color:#fff
    style API fill:#ff6b1a,color:#0a0a0f

For the interview, keep protocol discussion at this level: SMTP in/out at the edges, HTTP API for users, IMAP for legacy clients. Do not spend time on SMTP commands unless asked.

Building up to the design

Start with the simplest thing that could possibly work. Each step breaks for a specific, nameable reason — and naming those reasons is how you earn credibility in the room.

V1: One table, no spam filter

Store each message as a row in messages(id, user_id, from, to, subject, body, received_at). SMTP ingestion calls INSERT. The web app calls SELECT * FROM messages WHERE user_id = ? ORDER BY received_at DESC LIMIT 50. One API box, one Postgres.

This works for a handful of users, but at ~145 000 emails/sec (peak over 290 000) a single Postgres cannot keep up. Worse, the body column makes rows enormous — every list query drags the full message text across the wire even when you only need the subject line. And every phishing mail lands in the inbox, because there is no spam filter.

V2: Split metadata from body; add blob storage

The fix for enormous rows is architectural: move the large body and attachments out of the database and into a blob store (object storage). The database row keeps only the metadata — from, to, subject, thread ID, labels, flags, timestamps, and a pointer to the blob.

messages row: ~1 KB
message body blob: ~74 KB (object storage)

Now reading the message list means hitting the DB only. Opening a thread fetches the blob. DB I/O drops by roughly 98%. But you are still on one shard — all 500M users' metadata in one Postgres, which means write throughput, memory, and connection limits all fail simultaneously.

V3: Shard by user_id; add delivery queue

Shard the metadata store by user_id. All operations for a user — message list, label update, search — hit a single shard, so there are no cross-shard joins. Use consistent hashing with virtual nodes so adding shards later does not require full reshuffling (see consistent hashing).

While you are here, add a durable outbound delivery queue. When a user hits "Send," write to the queue immediately for a fast acknowledgment, then let egress workers drain it with retry logic. This decouples the send API response time from the reachability of the remote mail server. Now spam still lands in inboxes, there is no full-text search, and identical attachments are stored millions of times over.

V4: Add spam pipeline and attachment dedup

Route inbound messages through a spam filter between SMTP ingestion and the mailbox write. For attachments, content-hash dedup: before writing a blob, hash the body or attachment bytes. If that hash already exists in the store, skip the write and record a pointer to the existing blob. A forwarded email or CC storm stores the body exactly once, regardless of how many recipients it reaches.

At this point, users search and get nothing — the metadata DB supports queries by user_id and received_at, but not full-text over subject or body.

An indexing worker subscribes to the write stream (or reads from a CDC log) and builds an inverted index per user — a map from each word to the list of message IDs containing it. The index is stored per user (isolation comes for free), built asynchronously so the write path is not blocked.

V6: Production scale

V3 + V4 + V5, plus: multi-region replication for durability, tiered storage (recent mail on fast SSD-backed blob, older mail in cheap cold storage), quota enforcement per user, a greylisting layer in the spam pipeline, and operational tooling for abuse response.

flowchart LR
    V1[V1: single DB<br/>works for ~100 users] --> V2[V2: blob split<br/>DB rows are small]
    V2 --> V3[V3: shard by user_id<br/>delivery queue]
    V3 --> V4[V4: spam filter<br/>attachment dedup]
    V4 --> V5[V5: per-user search index<br/>async indexing]
    V5 --> V6[V6: multi-region<br/>tiered storage + quotas]
    style V1 fill:#0e7490,color:#fff
    style V3 fill:#15803d,color:#fff
    style V4 fill:#ff6b1a,color:#0a0a0f
    style V6 fill:#a855f7,color:#fff

High-level architecture

flowchart TD
    INET[Internet senders] -.SMTP port 25.-> MX[MX / SMTP ingestion<br/>load-balanced pool]
    MX --> SPAMPIPE[Spam + Virus Pipeline]
    SPAMPIPE -->|clean| MBW[Mailbox Write Service]
    SPAMPIPE -->|spam| MBW2[Mailbox Write Service<br/>spam-flagged]
    MBW --> META[(Metadata store<br/>sharded by user_id)]
    MBW --> BLOBW[(Blob store<br/>message bodies + attachments)]
    MBW --> IDXQ[Indexing queue]
    IDXQ --> IDXW[Index Worker]
    IDXW --> SIDX[(Per-user search index)]

    USER[User] --> GW[API Gateway + Auth]
    GW --> MBRAPI[Mailbox Read API]
    MBRAPI --> META
    MBRAPI --> BLOBW
    GW --> SRCHAPI[Search API]
    SRCHAPI --> SIDX

    GW --> SENDAPI[Send API]
    SENDAPI --> MBW3[Mailbox Write<br/>Sent folder]
    SENDAPI --> DQ[(Delivery Queue)]
    DQ --> EGRESS[SMTP Egress workers]
    EGRESS -.retry loop.-> DQ
    EGRESS -.SMTP.-> INET2[Recipient MX servers]

    style MX fill:#0e7490,color:#fff
    style SPAMPIPE fill:#ff6b1a,color:#0a0a0f
    style META fill:#15803d,color:#fff
    style BLOBW fill:#a855f7,color:#fff
    style SIDX fill:#ffaa00,color:#0a0a0f
    style DQ fill:#ff2e88,color:#fff

Inbound path — SMTP to inbox

Here is what actually happens when an external mail server delivers a message to you, step by step:

sequenceDiagram
    participant Sender as External MTA
    participant MX as SMTP Ingestion
    participant SP as Spam Pipeline
    participant BS as Blob store
    participant DB as Metadata shard
    participant IQ as Index queue
    participant IW as Index Worker

    Sender->>MX: SMTP EHLO / MAIL FROM / RCPT TO / DATA
    MX->>MX: SPF check at MAIL FROM, then DKIM + DMARC after DATA
    MX->>SP: pass message for scoring
    SP->>SP: IP reputation + content ML + header analysis
    SP-->>MX: verdict — clean / spam / virus
    MX->>BS: write body blob, content-hash key
    MX->>DB: INSERT metadata row — blob_key, spam_score, labels
    Note over MX,DB: 250 sent only after durable write completes
    MX-->>Sender: 250 OK, or 5xx reject for spam/virus
    MX->>IQ: emit indexing event
    IQ-->>IW: async

The 250 OK back to the sender is a durability commitment. The moment you say 250, you have accepted legal and operational responsibility for that message. If you lose it after saying 250, you have violated SMTP. So: write the blob, write the metadata row, confirm both are durable — then say 250.

SPF is checked during the MAIL FROM / RCPT TO phase, before DATA arrives, because it only needs the client IP and the envelope-from address. DKIM verification happens after DATA is received — the ingestion server checks the DKIM-Signature header against the sending domain's public key in DNS. DMARC is evaluated after both SPF and DKIM verdicts are in. All three checks are read-only and stateless; they add no shared state to your ingestion servers.

One nuance worth calling out: a spam verdict does not have to block the 250. For borderline-spam messages, you can still accept (return 250) and route to the spam folder. Silently dropping an accepted message violates the SMTP contract — only reject outright with 5xx for the most obvious abuse.

Mailbox storage — the schema

The two-layer design keeps the database lean: small metadata rows in a sharded DB, large blobs in object storage.

Metadata table (per shard)

CREATE TABLE messages (
  message_id    UUID        NOT NULL,
  user_id       BIGINT      NOT NULL,   -- shard key
  thread_id     UUID        NOT NULL,
  blob_key      VARCHAR(64) NOT NULL,   -- SHA-256 or similar content hash
  from_addr     TEXT        NOT NULL,
  subject       TEXT        NOT NULL,
  received_at   TIMESTAMPTZ NOT NULL,
  size_bytes    INT         NOT NULL,
  labels        TEXT[]      NOT NULL DEFAULT '{}',  -- INBOX, SPAM, SENT, user labels
  is_read       BOOLEAN     NOT NULL DEFAULT FALSE,
  is_starred    BOOLEAN     NOT NULL DEFAULT FALSE,
  snippet       TEXT,                   -- first ~200 chars of body, pre-extracted
  PRIMARY KEY (user_id, message_id)
);

CREATE INDEX ON messages (user_id, received_at DESC);
CREATE INDEX ON messages (user_id, thread_id);
CREATE INDEX ON messages (user_id, labels, received_at DESC);

Every query leads with user_id as the shard key. The shard router sends the query to exactly one node, and the local index handles the rest. No cross-shard queries for normal mailbox operations.

Blob store

Body and attachments are written to an object store (e.g. S3-compatible) under a content-hash key:

blob key: sha256(message_body_bytes)
value:    raw MIME body bytes (optionally compressed)

Before writing, check if blob_key already exists. If it does — say, a mass-CC email going to 10 000 recipients — skip the write and record a pointer to the existing blob. For attachments, the same PDF attached by a thousand different users is one blob.

Retention and tiering: recent blobs on hot (SSD) storage; blobs older than 1 year move to cold (spinning / tape / glacier-class) storage. The metadata row's blob_key is stable — the tier is transparent to the reader.

flowchart LR
    NEW[New message blob] --> HOT[(Hot SSD store<br/>recent mail)]
    HOT -->|"age > 1 year"| COLD[(Cold storage<br/>cheap, high-latency)]
    COLD -->|"user opens old mail"| HOT
    META[(Metadata row<br/>blob_key)] -.stable pointer.-> HOT
    META -.stable pointer.-> COLD
    style HOT fill:#ff6b1a,color:#0a0a0f
    style COLD fill:#0e7490,color:#fff
    style META fill:#15803d,color:#fff

The metadata row's blob_key never changes — tiering is transparent. When a user opens a five-year-old message, the read service fetches from cold storage, which takes a few hundred milliseconds more than hot — acceptable for infrequent access.

Conversation threading

Threading groups related messages into a conversation view. The canonical approach works in three steps:

  1. Check References and In-Reply-To headers. If they reference a known message_id in the system, assign the same thread_id.
  2. If no reference headers are present, try subject normalization (strip "Re:", "Fwd:", "AW:", etc.) combined with a participant-set match within a time window.
  3. If neither matches, create a new thread_id.
flowchart TD
    MSG[Incoming message] --> CHECK1{"References /<br/>In-Reply-To header?"}
    CHECK1 -->|yes| MATCH1[Look up referenced message_id]
    MATCH1 -->|found| ASSIGN[Assign same thread_id]
    CHECK1 -->|no| CHECK2{"Subject + participants<br/>match within window?"}
    MATCH1 -->|not found| CHECK2
    CHECK2 -->|yes| ASSIGN
    CHECK2 -->|no| NEW[Create new thread_id]
    ASSIGN --> STORE[(Store thread_id in metadata row)]
    NEW --> STORE
    style ASSIGN fill:#15803d,color:#fff
    style NEW fill:#ff6b1a,color:#0a0a0f
    style STORE fill:#a855f7,color:#fff

Threading is computed at ingestion and stored in the metadata row. A thread view is then a single index scan:

SELECT * FROM messages
WHERE user_id = ? AND thread_id = ?
ORDER BY received_at ASC;

Outbound path — the delivery queue

When a user hits Send, you want to return a fast acknowledgment. You do not want to hold the HTTP connection open while your egress server negotiates an SMTP session with a remote server that might be down for hours. So the send API writes the message to a durable delivery queue and returns immediately. Egress workers drain the queue.

sequenceDiagram
    participant U as User / Send API
    participant DQ as Delivery Queue
    participant EW as Egress Worker
    participant RMTA as Recipient MX

    U->>DQ: enqueue(recipient_domain, message_id, attempt=0)
    EW->>DQ: dequeue job
    EW->>RMTA: SMTP connect + DATA
    alt success 250
        RMTA-->>EW: 250 OK
        EW->>DQ: ack (delete job)
    else transient 4xx
        RMTA-->>EW: 421 / 450 try later
        EW->>DQ: requeue with delay (exponential backoff)
    else permanent 5xx
        RMTA-->>EW: 550 user unknown
        EW->>DQ: ack + generate NDR bounce to sender
    end

The 4xx vs 5xx distinction matters enormously here. A 4xx is a transient failure — the remote server is temporarily unavailable, overloaded, or greylisting you. You requeue and retry. A 5xx is a permanent failure — the recipient address does not exist, or the server is explicitly rejecting you. You give up and generate a Non-Delivery Report (NDR) back to the sender. Treating a 5xx as transient and retrying wastes queue resources and annoys the remote server.

Retry schedule (SMTP convention, roughly):

AttemptDelay
1immediate
25 min
330 min
42 hours
58 hours
...doubling
Final4–5 days — then bounce NDR

RFC 5321 section 4.5.4 recommends retrying for at least 4–5 days before giving up. Many production MTAs use exactly 5 days as their maximum queue lifetime.

Partition the queue by recipient domain (or by hash of recipient address) so one slow domain does not block all outbound delivery. A small ISP's MX server going dark should not hold up email going to Gmail or Outlook. Each worker holds a limited number of concurrent SMTP connections per destination to avoid overwhelming small servers.

Deliverability — SPF, DKIM, DMARC

These three mechanisms help recipient servers trust that outbound mail from your domain is legitimate. Get them wrong and your outbound mail lands in spam for recipients — or gets rejected outright.

MechanismWhat it doesWhere it lives
SPFDeclares which IP addresses are authorized to send for your domainDNS TXT record on sending domain
DKIMCryptographically signs selected message headers plus a hash of the body with a private key; recipient verifies with the public key published as a DNS TXT recordHeader added by egress server
DMARCPolicy: what to do if SPF or DKIM fails (quarantine, reject, or none); also enables aggregate reportsDNS TXT record on sending domain

In an interview: mention all three, explain SPF is an IP allowlist in DNS, DKIM is a per-message signature, and DMARC ties them together into a policy. This signals operational depth beyond "I know what SMTP is."

Spam and virus filtering

Spam filtering is a major subsystem in its own right. The key insight is to order your filters from cheapest to most expensive — reject obvious junk before you ever touch your ML model.

flowchart TD
    MSG[Inbound message] --> IPCHECK[IP / sender reputation<br/>block known spam IPs]
    IPCHECK --> DNSBL[DNSBL check<br/>real-time blackhole lists]
    DNSBL --> GREY[Greylisting<br/>new sender triplet delayed]
    GREY --> SPFDK[SPF / DKIM / DMARC verify]
    SPFDK --> RULES[Rule engine<br/>header heuristics]
    RULES --> ML[ML classifier<br/>content + metadata features]
    ML --> VERDICT{Verdict}
    VERDICT -->|score < threshold| INBOX[Deliver to inbox]
    VERDICT -->|threshold exceeded| SPAMFOLDER[Deliver to spam folder]
    VERDICT -->|virus detected| QUARANTINE[Quarantine / strip attachment]

    style IPCHECK fill:#0e7490,color:#fff
    style ML fill:#ff6b1a,color:#0a0a0f
    style VERDICT fill:#15803d,color:#fff

Greylisting is a cheap technique that most candidates miss. When you see a new sender triplet (client IP, envelope-from, envelope-to) for the first time, respond with a temporary 4xx — not a rejection, just "try again in a moment." RFC 6647 identifies 421 or 450 as appropriate response codes; 451 is also commonly deployed in practice. Legitimate MTAs will retry within minutes. Cheap spam bots, which are optimizing for volume and usually do not implement retry logic, often do not. You trade a ~5-minute delay for new correspondents against a meaningful reduction in spam volume that even reaches your ML scoring stage.

The ML classifier uses features including: URL reputation in the body, sender domain age, message structure, sending rate patterns, user feedback (explicit spam reports), and language model-based content signals. The training signal comes from user "Report spam" and "Not spam" actions — that feedback loop is worth raising explicitly in an interview.

Per-user inverted index

An inverted index maps each word to the set of message IDs containing it:

"invoice"  → [msg_11, msg_47, msg_203, ...]
"receipt"  → [msg_47, msg_88, ...]

Per-user isolation is the elegant property here: each user's index is entirely independent. Search queries never touch another user's data, and you can shard the search index on user_id using the exact same key as the mailbox. Scaling is trivially proportional to users, not to global message volume.

Indexing pipeline

sequenceDiagram
    participant MW as Mailbox Write
    participant IQ as Index queue
    participant IW as Index Worker
    participant IX as Search index store

    MW->>IQ: emit(user_id, message_id, subject, snippet, body_ref)
    IW->>IQ: consume batch
    IW->>IW: tokenize + normalize + stem
    IW->>IX: merge postings list updates
    Note over IW,IX: async — lag typically seconds

Tokenization: lowercase, remove punctuation, stem (or lemmatize) words, remove stopwords. For email search, also index from: and to: fields as structured filters so queries like from:alice@example.com resolve efficiently.

Index storage: write-optimized store (e.g. an LSM-tree based store, or Lucene-style segments). Per-user segments are small enough to fit in memory for active users; cold users' indexes are paged in on demand.

Consistency: search results may lag writes by a few seconds. This is acceptable — users understand that a message they just received may not yet appear in search. Do not sacrifice write path throughput for synchronous indexing.

Storage choices

Data typeStoreWhy
Metadata (headers, labels, thread_id, flags)Sharded relational DB or wide-column storeStructured queries, per-user sharding, transactional label updates
Message bodies + MIME partsObject store (S3-compatible)Large blobs, cheap at scale, content-addressed dedup
AttachmentsObject store (same or separate bucket, cold-tiered)Same rationale; virus-scanned before write
Per-user search indexLSM-tree / Lucene segments, sharded by user_idWrite-optimized, compaction-friendly
Delivery queue (outbound)Durable queue (Kafka or purpose-built)Persistent, replay on failure, partitioned by domain
User account / auth dataReplicated relational DBLow write volume, strong consistency needed
Spam model weightsML model store (versioned artifacts)Loaded into scoring service; updated offline

Failure modes

Outbound delivery to a temporarily-down server

The delivery queue handles this with exponential backoff. One subtle constraint: when the queue worker retries, it must reconnect and retry from the start of the SMTP transaction — SMTP is not resumable mid-DATA. If the message carries a multi-MB attachment, this means resending the entire payload on each retry. For large messages, consider streaming directly from object storage on retry rather than buffering on the egress server.

Spam false positives (good mail flagged as spam)

False positives destroy user trust far more than letting a piece of spam through. Keep precision extremely high (near 100%) even at the cost of recall. The user "Not spam" button is the primary feedback signal — make it prominent and act on it quickly. Provide a guaranteed-delivered path for known contacts. In an interview: acknowledge the precision/recall trade-off explicitly rather than claiming the ML model just handles it.

Large attachment storms

A single email with a 25 MB attachment sent to 10 000 recipients would naively write 25 MB × 10 000 = 250 GB to blob storage. Content-hash dedup collapses this: the blob is written once (25 MB), and 10 000 metadata rows each hold a pointer to the same key — 10 000 × ~1 KB = ~10 MB — giving roughly 35 MB total, a 7 000× reduction.

Search index lag

On a write burst (bulk import, mailing list storm), the indexing queue grows and search results fall behind. Mitigations: set consumer auto-scaling triggers on queue lag; prioritize indexing for active users; surface a "results may be incomplete" indicator when index lag exceeds a threshold.

Hot mailbox

A single user_id shard receiving very high write volume — a shared corporate inbox, a high-volume transactional account — becomes a write hotspot. Mitigations: virtual shards (split one logical user across sub-shards with a local fan-out service); rate-limit inbound per recipient; route to a dedicated high-throughput shard.

Quota enforcement

Every user has a storage quota (e.g. 15 GB for free tier). Track used_bytes per user in a fast counter store (Redis or a dedicated quota service). On message write, atomically increment the counter. If used_bytes + message_size > quota, reject the ingestion with a 452 SMTP response ("insufficient storage") — this causes the sending MTA to retry later rather than generating a bounce. Quota counts message bytes from the user's perspective, not deduplicated blob bytes — users should not benefit from or be penalized by what other users happen to share. Soft quota alerts at 75%, 90%, and 100% let users clean up before the hard limit hits.

Things to discuss in an interview

  • Protocol boundary: SMTP is server-to-server; users hit HTTP APIs. Mention IMAP for legacy clients.
  • The 250 OK commitment: durability guarantee — do not say 250 until you have persisted.
  • Metadata vs blob split: why you separate the two, and how content-hash dedup works.
  • Append-only write pattern: new messages are inserts; label/read-flag changes are narrow updates. No full-row rewrites.
  • Per-user search isolation: why per-user inverted indexes are architecturally elegant.
  • Outbound retry and SMTP bounce handling: 4xx vs 5xx, NDR generation.
  • Spam as a layered pipeline: early cheap filters before expensive ML; user feedback as training signal.
  • Greylisting: a cheap, effective spam reduction technique most candidates don't mention.

Things you should now be able to answer

  • Why does the ingestion server say 250 OK before writing to the database? (It does not — you must write before you say 250.)
  • What is the purpose of DKIM and how does it differ from SPF?
  • How does content-hash deduplication reduce storage for CC storms?
  • Why is the search index built asynchronously rather than synchronously?
  • What is the difference between a 4xx and a 5xx SMTP response, and what does each mean for the outbound delivery queue?
  • How does greylisting reduce spam without requiring ML?
  • Why shard by user_id rather than by message_id?

Further reading

  • RFC 5321 — Simple Mail Transfer Protocol (the canonical SMTP spec)
  • RFC 6376 — DomainKeys Identified Mail (DKIM Signatures)
  • RFC 7208 — Sender Policy Framework (SPF)
  • RFC 7489 — Domain-based Message Authentication, Reporting, and Conformance (DMARC)
  • "Lessons Learned from Scaling Email Infrastructure" — various engineering blogs (Fastmail, Mailchimp, Postmark)
  • Design a distributed message queue — for delivery queue deep-dive
  • Consistent hashing — for understanding the mailbox sharding strategy
// FAQ

Frequently asked questions

Why must an SMTP ingestion server write the message durably before sending 250 OK?

The 250 OK response is a binding delivery commitment under the SMTP protocol. The moment you send it, you have accepted legal and operational responsibility for the message. Any message lost after saying 250 violates SMTP, so the ingestion server must write the blob and the metadata row and confirm both are durable before issuing the response.

Why shard the mailbox metadata store by user_id rather than by message_id?

All normal mailbox operations — message list, label update, thread view — are scoped to a single user, so sharding by user_id means every query hits exactly one shard with no cross-shard joins. Sharding by message_id would scatter a single user's messages across many shards, requiring expensive fan-out reads for every inbox load.

What is greylisting and how does it reduce spam without an ML classifier?

Greylisting responds to a new sender triplet (client IP, envelope-from, envelope-to) with a temporary 4xx response, asking the sender to retry in a few minutes. Legitimate MTAs implement retry logic and will succeed on the second attempt. Cheap spam bots, optimizing for volume, typically skip retries entirely, so they never deliver. The technique adds roughly a 5-minute delay for new correspondents in exchange for a meaningful reduction in spam volume that ever reaches the ML scoring stage.

How does content-hash deduplication reduce storage for large attachment storms?

Before writing a blob, the system hashes the body or attachment bytes and checks whether that key already exists in object storage. If it does, the write is skipped and a pointer to the existing blob is recorded in the metadata row instead. A 25 MB attachment sent to 10,000 recipients is written once (25 MB) and the 10,000 metadata rows each hold a ~1 KB pointer (~10 MB total), collapsing 250 GB of naive storage down to roughly 35 MB — about a 7,000x reduction.

What is the difference between a 4xx and a 5xx SMTP response from the recipient server, and what does each mean for the outbound delivery queue?

A 4xx is a transient failure indicating the remote server is temporarily unavailable or greylisting the sender; the egress worker requeues the message with exponential backoff and retries for up to 4-5 days per RFC 5321. A 5xx is a permanent failure meaning the address does not exist or the server is explicitly rejecting the message; the worker acknowledges the job as done and generates a Non-Delivery Report back to the original sender.

// RELATED

You may also like