#interview
69 articles
Design an LLM Observability Platform
Build the distributed tracing backbone for non-deterministic, multi-step LLM applications — capturing every prompt, completion, token count, and dollar cost across chains, retrievals, and tool calls so you can debug a failed agent run and account for every cent.
Design an LLM Gateway (AI Gateway & Model Router)
A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.
Design an LLM Fine-Tuning Platform
Turn a base model and a dataset into a deployed fine-tuned adapter at scale — the end-to-end platform covering dataset ingestion, LoRA/QLoRA/DPO training, fault-tolerant distributed GPU scheduling, eval gating, and multi-LoRA serving for hundreds of concurrent fine-tunes.
Design an LLM Evaluation Platform
Build the system that tells a team whether a prompt or model change made the product better or worse — automatically. Covers offline eval with LLM-as-judge, CI regression gating, online production sampling, human annotation queues, and eval for RAG, agents, and classifiers at the scale of 450 million evaluations per month.
Design a GraphRAG System (Knowledge-Graph-Augmented Retrieval)
When vanilla vector RAG fails on "summarize the entire corpus" and multi-hop questions, you build a knowledge graph first — covering entity extraction, Leiden community detection, map-reduce global search, and graph traversal for multi-hop, based on Microsoft GraphRAG and production deployments at Neo4j, LinkedIn, and Writer.
Design a Feature Store
Serve the exact same feature values to model training and online inference — eliminating training-serving skew — across batch, streaming, and on-demand tiers at sub-10ms latency and millions of reads per second. The architecture powering Uber Michelangelo, Airbnb Chronon, and DoorDash Gigascale.
Design an Intelligent Document Processing Pipeline
Turn millions of messy PDFs, scans, and invoices into validated structured JSON at scale — the end-to-end pipeline covering OCR, layout analysis, LLM-based field extraction, confidence-scored routing, human-in-the-loop review, and the cost math that determines build-vs-buy.
Design a Customer-Support AI Assistant
Architect a production customer-support AI that deflects 60–80% of tickets by combining RAG over a help center, real-action tools (refunds, cancellations, account changes), per-session memory, guardrails, and a structured handoff to a human agent — all while keeping hallucination below 2%.
Design a Realtime Voice AI Agent
Build a full-duplex voice agent that answers PSTN and WebRTC calls with sub-800ms voice-to-voice latency — covering the cascade pipeline (STT→LLM→TTS), turn detection, barge-in, telephony transport, and scaling to thousands of simultaneous calls.
Design an AI Guardrails & Safety System
Build the validation layer that wraps every LLM call — detecting prompt injections, redacting PII, catching toxic outputs, and verifying groundedness — while staying inside a 200 ms latency budget for 10 million daily requests.
Design an AI Coding Assistant (Copilot / Cursor)
Architect a system that delivers inline ghost-text completions in under 200ms and drives an autonomous agent that edits dozens of files — the two-product architecture behind GitHub Copilot, Cursor, and Sourcegraph Cody at billions of completions per day.
The ML / GenAI System Design Interview Framework
A 7-step, 45-minute framework for ML and GenAI system design rounds — covering the data/feedback loop, the candidate-generation + ranking funnel, and the GenAI decision tree from prompt-engineering to fine-tune, with a hire vs no-hire signals table.
Design a Vector Database / Semantic Search Service
Index 1 billion 768-dimensional vectors and answer top-k similarity queries in under 20 ms — the ANN indexing, sharding, and filtering architecture behind Pinecone, Weaviate, and pgvector.
Design a RAG (Retrieval-Augmented Generation) Pipeline
Ground an LLM in 10 million documents (50 million chunks) with sub-2-second answers and a hallucination rate measurable by automated eval — the end-to-end ingestion, retrieval, reranking, and generation pipeline powering enterprise knowledge assistants.
Design an LLM Inference & Serving System
Serve token generation for a 70B-parameter model at scale — where KV cache, not FLOPs, caps concurrency and continuous batching is what separates good GPU utilization from terrible utilization.
Design an AI Agent Platform
Build a platform that runs autonomous LLM agents — each capable of planning, calling tools, and completing multi-step tasks lasting minutes to hours — with durable state, idempotent tool execution, and per-tenant safety guardrails.
Design a Centralized Log Aggregation System (ELK / Splunk)
Collect, store, and search logs from thousands of services. Collection agents, a buffered ingestion pipeline, time-based inverted indices, hot-warm-cold tiers, and cost control.
Design a Large-Scale Data Pipeline (ETL / Batch + Streaming)
Move and transform petabytes from sources into a warehouse/lake for analytics. DAG orchestration, Spark shuffles, lake vs warehouse, and idempotent, replayable jobs.
Design a Shopping Cart & Checkout System
Keep a cart consistent across devices, then check out without overselling or double-charging. The available-cart vs consistent-checkout split, inventory holds, and the order saga.
Design a Social Graph Service (Facebook's TAO)
Serve billions of "who follows whom" reads over a graph of trillions of edges. The objects-and-associations model, a cache in front of sharded SQL, and the hot-vertex problem.
Design an Authorization System (Google Zanzibar / RBAC / ReBAC)
Answer "can user U do action A on resource R?" globally, in milliseconds, consistently. RBAC vs ABAC vs ReBAC, Zanzibar relation tuples, and the new-enemy problem.
Design an A/B Testing & Experimentation Platform
Run thousands of controlled experiments at once. Deterministic bucketing, exposure logging, the metrics pipeline, statistical significance, and the peeking problem.
Design a Real-Time Fraud Detection System
Score transactions for fraud inline in milliseconds. Feature stores, streaming velocity features, rules + ML hybrids, graph fraud rings, and the label-delay problem.
Design Ticketmaster (seat booking / reservations)
Sell limited inventory to a stampede of buyers without double-booking a seat. Reservation holds, a conditional-update concurrency guard, and the read-vs-write consistency split.
Design a Distributed Job Scheduler (cron at scale)
Run millions of scheduled and recurring jobs reliably — at-least-once execution, leader election, sharded time-wheels, and exactly-once side effects via idempotency.
Design a Globally-Distributed SQL Database (Spanner / CockroachDB)
SQL transactions that are ACID across continents. How Spanner shards into Paxos groups, runs 2PC on top, and uses TrueTime to give you external consistency — the CP counterpart to Dynamo.
Design a Digital Wallet (PayPal / Venmo / Paytm)
Hold balances, transfer money between users instantly, and never lose a cent. Double-entry ledgers, idempotent transfers, and strong consistency.
Design an Object Storage Service (S3)
Store arbitrary blobs with HTTP GET/PUT at exabyte scale and 11 nines of durability. Metadata vs data separation, erasure coding, and self-healing.
Design an Ad Click Aggregator (real-time analytics)
Ingest billions of ad events, serve per-minute metrics in near-real-time, and produce exact totals for billing — the canonical streaming + lambda/kappa problem.
Design a Payment System (Stripe-style)
Move money correctly. Double-entry ledgers, idempotency keys, the authorize/capture/settle lifecycle, reconciliation, and why money never gets eventual consistency.
Design a Distributed Message Queue (Kafka)
Build a durable, partitioned, replicated commit log like Kafka — ordering, consumer groups, replication (ISR), and exactly-once.
Design a Distributed Counter (view / like counts)
Count likes and views at millions of increments per second without a single hot row melting. Sharded counters, write batching, and approximate vs exact counts.
Design a Calendar System (Google Calendar)
Store events, share calendars, find free slots, and fire reminders — across time zones and recurring rules. The RRULE expansion and free/busy problem.
Design a Distributed Search Engine (Elasticsearch)
Index billions of documents and answer full-text queries in milliseconds. Inverted indexes, sharding + replication, scatter-gather, and relevance scoring.
Design an Email Service (Gmail)
Send, receive, store, and search email for hundreds of millions of users. SMTP ingestion, sharded mailbox storage, full-text search, and spam filtering.
Design a Hotel / Airbnb Booking System
Search available listings and book date ranges without double-booking. Availability as a range problem, reservation holds, and the search vs transaction split.
Design an Online Code Judge (LeetCode)
Run untrusted user code safely against test cases at scale. Sandboxed execution, a judge worker queue, resource limits, and contest-time spikes.
Design Top-K / Trending (heavy hitters)
Find the top-K most frequent items in a massive stream without counting everything exactly. Count-Min Sketch, heavy-hitter algorithms, and approximate streaming aggregation.
Design a Live Streaming System (Twitch)
Ingest one broadcaster and fan out to millions of viewers with seconds of latency. Transcoding ladders, HLS/DASH segmenting, CDN fan-out, and live chat.
Design a Flash Sale / Seckill System
Sell limited stock to a massive, spiky crowd without overselling. Atomic inventory decrement, request shedding, queues, and graceful degradation.
Design a Real-Time Leaderboard (gaming)
Rank millions of players by score and answer "top N" and "my rank" instantly. Redis sorted sets, sharding by score range, and approximate ranks at scale.
Design a Content Delivery Network (CDN)
Serve content from the edge, close to users, at massive scale. Request routing (anycast & DNS), cache hierarchies, invalidation, and origin shielding.
Design a Distributed Lock / Coordination Service (ZooKeeper / etcd)
Provide mutual exclusion and coordination across machines safely. Consensus-backed locks, leases, fencing tokens, and why a lock without fencing is unsafe.
Design a Recommendation System (Netflix / TikTok)
Pick the best items for each user from millions of candidates in milliseconds. The two-stage candidate-generation + ranking architecture, embeddings, and feature stores.
Design a Video Conferencing System (Zoom)
Carry live audio/video among many participants with low latency. WebRTC, the SFU vs MCU vs mesh trade-off, simulcast, and adaptive bitrate.
Design Slack (team chat at scale)
Channels, threads, presence, and search across huge workspaces. Real-time fan-out over WebSockets, the gateway problem, and read-state per user.
Design a Stock Exchange (matching engine)
Match buy and sell orders deterministically with microsecond latency and perfect fairness. The single-threaded matching engine, the order book, and event sourcing for recovery.
Design a Metrics & Monitoring System (Prometheus / Datadog)
Ingest billions of time-series points, store them cheaply, and answer dashboard + alerting queries fast. TSDB internals, cardinality, downsampling, and pull vs push.
Design TikTok / Reels (short-video platform)
A vertical-swipe video feed that feels infinite and clairvoyant. Two-tower retrieval, real-time ranking with Monolith, sub-200ms playback via aggressive preloading, and the For You Page that knows you in 90 seconds.
Design a Distributed File System (GFS / HDFS)
Store petabyte files across thousands of commodity machines for high-throughput batch reads. The single-master + chunkservers design, replication, and append-heavy workloads.
Design Google Docs (real-time collaborative editor)
Multiple people editing the same document simultaneously, every keystroke synced, never a corrupt merge. Operational Transformation, CRDTs, presence, and the architecture that's run quietly at Google for 17+ years.
Design Google Maps (routing & navigation)
Find the fastest route across a continent-scale road graph in milliseconds, with live traffic and ETAs. Contraction hierarchies, live traffic weights, and tile serving.
How to Approach a System Design Interview
A repeatable 45-minute framework — requirements, estimation, high-level design, deep dives, bottlenecks — and the moves that separate a hire from a no-hire.
Design a Distributed Key-Value Store (Dynamo)
Build your own DynamoDB / Cassandra. Sharding, replication, quorum reads/writes, vector clocks, conflict resolution.
Design a Distributed Unique ID Generator
How Twitter/Discord/Instagram generate billions of unique IDs per day with no central coordinator. UUIDs, snowflake, ULIDs.
Design Yelp / Nearby Search (proximity service)
Find restaurants/businesses near a location, fast. Geohash, quadtree, hexagonal cells, and the right index for "within 5 km of me".
Design Instagram (photo sharing)
A photo-first social network — uploads, image processing, feed, stories, and global delivery.
Design Google Drive / Dropbox
File sync that works on every device, blob storage, deduplication, conflict resolution, and how to do all of it efficiently.
Design a Notification System (Push, Email, SMS)
A reliable multi-channel notification platform — fanout, templates, dedup, rate limiting, and the realities of APNS/FCM.
Design a Distributed Cache (like Memcached)
A cache that scales across hundreds of nodes — consistent hashing, replication, eviction, and the operational problems you'll meet.
Design Search Autocomplete (Typeahead)
Sub-100ms autocomplete suggestions across billions of queries — tries, top-k caching, and personalized ranking.
Design a Web Crawler (Googlebot)
Crawl billions of URLs at petabyte scale. URL frontier, politeness, deduplication, 4xx/dead-link handling, and the realities of indexing the web.
Design Uber / Lyft (ride hailing)
Match drivers to riders in real time at city scale. Geohashing, dispatch algorithms, surge pricing, and the realtime location pipeline.
Design YouTube / Netflix (video streaming)
How a billion users watch ~1 billion hours of video a day. Upload pipeline, transcoding, adaptive bitrate, CDN, recommendation.
Design WhatsApp / Chat System
Realtime 1:1 and group messaging at billions-of-users scale. WebSocket gateways, message store, presence, end-to-end encryption.
Design a News Feed (Facebook / Instagram)
How Meta builds the home feed — fanout, ranking, candidate generation, and how the same architecture serves billions.
Design a Rate Limiter
Token bucket, leaky bucket, fixed window, sliding window — and the distributed Redis-based limiter you can copy into production.
Design Twitter / X (the home timeline)
500M users, 500M tweets/day, p99 feed loads under 200ms. The fanout-on-write vs fanout-on-read trade-off that defines the system.
Design a URL Shortener (TinyURL / bit.ly)
A classic FAANG warmup. Generate short codes, store them, redirect fast, scale to billions of URLs.