// TOPIC

#interview

69 articles

Design an LLM Observability Platform

Build the distributed tracing backbone for non-deterministic, multi-step LLM applications — capturing every prompt, completion, token count, and dollar cost across chains, retrievals, and tool calls so you can debug a failed agent run and account for every cent.

#interview#ai#llm

30 min

◆◆◆AdvancedCloudflareOpenAI

Design an LLM Gateway (AI Gateway & Model Router)

A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.

#interview#ai#llm

31 min

◆◆◆AdvancedOpenAIMeta

Design an LLM Fine-Tuning Platform

Turn a base model and a dataset into a deployed fine-tuned adapter at scale — the end-to-end platform covering dataset ingestion, LoRA/QLoRA/DPO training, fault-tolerant distributed GPU scheduling, eval gating, and multi-LoRA serving for hundreds of concurrent fine-tunes.

#interview#ai#llm

37 min

◆◆◆AdvancedOpenAILangChain

Design an LLM Evaluation Platform

Build the system that tells a team whether a prompt or model change made the product better or worse — automatically. Covers offline eval with LLM-as-judge, CI regression gating, online production sampling, human annotation queues, and eval for RAG, agents, and classifiers at the scale of 450 million evaluations per month.

#interview#ai#llm

30 min

◆◆◆AdvancedMicrosoftNeo4j

Design a GraphRAG System (Knowledge-Graph-Augmented Retrieval)

When vanilla vector RAG fails on "summarize the entire corpus" and multi-hop questions, you build a knowledge graph first — covering entity extraction, Leiden community detection, map-reduce global search, and graph traversal for multi-hop, based on Microsoft GraphRAG and production deployments at Neo4j, LinkedIn, and Writer.

#interview#ai#rag

27 min

◆◆◆AdvancedUberAirbnb

Design a Feature Store

Serve the exact same feature values to model training and online inference — eliminating training-serving skew — across batch, streaming, and on-demand tiers at sub-10ms latency and millions of reads per second. The architecture powering Uber Michelangelo, Airbnb Chronon, and DoorDash Gigascale.

#interview#ai#mlops

26 min

◆◆◆AdvancedGoogleAWS

Design an Intelligent Document Processing Pipeline

Turn millions of messy PDFs, scans, and invoices into validated structured JSON at scale — the end-to-end pipeline covering OCR, layout analysis, LLM-based field extraction, confidence-scored routing, human-in-the-loop review, and the cost math that determines build-vs-buy.

#interview#ai#llm

28 min

◆◆◆AdvancedIntercomSierra

Design a Customer-Support AI Assistant

Architect a production customer-support AI that deflects 60–80% of tickets by combining RAG over a help center, real-action tools (refunds, cancellations, account changes), per-session memory, guardrails, and a structured handoff to a human agent — all while keeping hallucination below 2%.

#interview#ai#llm

34 min

◆◆◆AdvancedOpenAIDeepgram

Design a Realtime Voice AI Agent

Build a full-duplex voice agent that answers PSTN and WebRTC calls with sub-800ms voice-to-voice latency — covering the cascade pipeline (STT→LLM→TTS), turn detection, barge-in, telephony transport, and scaling to thousands of simultaneous calls.

#interview#ai#llm

33 min

◆◆◆AdvancedNVIDIAMeta

Design an AI Guardrails & Safety System

Build the validation layer that wraps every LLM call — detecting prompt injections, redacting PII, catching toxic outputs, and verifying groundedness — while staying inside a 200 ms latency budget for 10 million daily requests.

#interview#ai#llm

29 min

◆◆◆AdvancedGitHubCursor

Design an AI Coding Assistant (Copilot / Cursor)

Architect a system that delivers inline ghost-text completions in under 200ms and drives an autonomous agent that edits dozens of files — the two-product architecture behind GitHub Copilot, Cursor, and Sourcegraph Cody at billions of completions per day.

#interview#ai#llm

31 min

◆◆◆AdvancedGoogleMeta

The ML / GenAI System Design Interview Framework

A 7-step, 45-minute framework for ML and GenAI system design rounds — covering the data/feedback loop, the candidate-generation + ranking funnel, and the GenAI decision tree from prompt-engineering to fine-tune, with a hire vs no-hire signals table.

#interview#ai#ml

29 min

◆◆◆AdvancedPineconeGoogle

Design a Vector Database / Semantic Search Service

Index 1 billion 768-dimensional vectors and answer top-k similarity queries in under 20 ms — the ANN indexing, sharding, and filtering architecture behind Pinecone, Weaviate, and pgvector.

#interview#ai#vector-db

23 min

◆◆◆AdvancedOpenAIGoogle

Design a RAG (Retrieval-Augmented Generation) Pipeline

Ground an LLM in 10 million documents (50 million chunks) with sub-2-second answers and a hallucination rate measurable by automated eval — the end-to-end ingestion, retrieval, reranking, and generation pipeline powering enterprise knowledge assistants.

#interview#ai#rag

28 min

◆◆◆AdvancedOpenAIAnthropic

Design an LLM Inference & Serving System

Serve token generation for a 70B-parameter model at scale — where KV cache, not FLOPs, caps concurrency and continuous batching is what separates good GPU utilization from terrible utilization.

#interview#ai#llm

25 min

◆◆◆AdvancedOpenAIAnthropic

Design an AI Agent Platform

Build a platform that runs autonomous LLM agents — each capable of planning, calling tools, and completing multi-step tasks lasting minutes to hours — with durable state, idempotent tool execution, and per-tenant safety guardrails.

#interview#ai#agents

24 min

◆◆◆AdvancedElasticSplunk

Design a Centralized Log Aggregation System (ELK / Splunk)

Collect, store, and search logs from thousands of services. Collection agents, a buffered ingestion pipeline, time-based inverted indices, hot-warm-cold tiers, and cost control.

#interview#observability#search

25 min

◆◆◆AdvancedMetaAirbnb

Design a Large-Scale Data Pipeline (ETL / Batch + Streaming)

Move and transform petabytes from sources into a warehouse/lake for analytics. DAG orchestration, Spark shuffles, lake vs warehouse, and idempotent, replayable jobs.

#interview#big-data#data-engineering

26 min

◆◆◆AdvancedAmazonShopify

Design a Shopping Cart & Checkout System

Keep a cart consistent across devices, then check out without overselling or double-charging. The available-cart vs consistent-checkout split, inventory holds, and the order saga.

#interview#e-commerce#consistency

20 min

◆◆◆AdvancedMetaLinkedIn

Design a Social Graph Service (Facebook's TAO)

Serve billions of "who follows whom" reads over a graph of trillions of edges. The objects-and-associations model, a cache in front of sharded SQL, and the hot-vertex problem.

#interview#graphs#caching

22 min

◆◆◆AdvancedGoogleAuth0

Design an Authorization System (Google Zanzibar / RBAC / ReBAC)

Answer "can user U do action A on resource R?" globally, in milliseconds, consistently. RBAC vs ABAC vs ReBAC, Zanzibar relation tuples, and the new-enemy problem.

#interview#security#distributed-systems

23 min

◆◆◆AdvancedMetaGoogle

Design an A/B Testing & Experimentation Platform

Run thousands of controlled experiments at once. Deterministic bucketing, exposure logging, the metrics pipeline, statistical significance, and the peeking problem.

#interview#experimentation#analytics

23 min

◆◆◆AdvancedStripePayPal

Design a Real-Time Fraud Detection System

Score transactions for fraud inline in milliseconds. Feature stores, streaming velocity features, rules + ML hybrids, graph fraud rings, and the label-delay problem.

#interview#ml#streaming

23 min

◆◆◆AdvancedAmazonBooking

Design Ticketmaster (seat booking / reservations)

Sell limited inventory to a stampede of buyers without double-booking a seat. Reservation holds, a conditional-update concurrency guard, and the read-vs-write consistency split.

#interview#concurrency#inventory

21 min

◆◆◆AdvancedAmazonGoogle

Design a Distributed Job Scheduler (cron at scale)

Run millions of scheduled and recurring jobs reliably — at-least-once execution, leader election, sharded time-wheels, and exactly-once side effects via idempotency.

#interview#distributed-systems#scheduling

20 min

◆◆◆AdvancedGoogleUber

Design a Globally-Distributed SQL Database (Spanner / CockroachDB)

SQL transactions that are ACID across continents. How Spanner shards into Paxos groups, runs 2PC on top, and uses TrueTime to give you external consistency — the CP counterpart to Dynamo.

#interview#distributed-systems#databases

13 min

◆◆◆AdvancedPayPalBlock

Design a Digital Wallet (PayPal / Venmo / Paytm)

Hold balances, transfer money between users instantly, and never lose a cent. Double-entry ledgers, idempotent transfers, and strong consistency.

#interview#payments#consistency

21 min

◆◆◆AdvancedAmazonGoogle

Design an Object Storage Service (S3)

Store arbitrary blobs with HTTP GET/PUT at exabyte scale and 11 nines of durability. Metadata vs data separation, erasure coding, and self-healing.

#interview#storage#distributed-systems

24 min

◆◆◆AdvancedMetaGoogle

Design an Ad Click Aggregator (real-time analytics)

Ingest billions of ad events, serve per-minute metrics in near-real-time, and produce exact totals for billing — the canonical streaming + lambda/kappa problem.

#interview#streaming#analytics

21 min

◆◆◆AdvancedStripeAmazon

Design a Payment System (Stripe-style)

Move money correctly. Double-entry ledgers, idempotency keys, the authorize/capture/settle lifecycle, reconciliation, and why money never gets eventual consistency.

#interview#payments#consistency

20 min

◆◆◆AdvancedAmazonLinkedIn

Design a Distributed Message Queue (Kafka)

Build a durable, partitioned, replicated commit log like Kafka — ordering, consumer groups, replication (ISR), and exactly-once.

#interview#messaging#distributed-systems

21 min

◆◆IntermediateMetaYouTube

Design a Distributed Counter (view / like counts)

Count likes and views at millions of increments per second without a single hot row melting. Sharded counters, write batching, and approximate vs exact counts.

#interview#scale#consistency

19 min

◆◆IntermediateGoogleMicrosoft

Design a Calendar System (Google Calendar)

Store events, share calendars, find free slots, and fire reminders — across time zones and recurring rules. The RRULE expansion and free/busy problem.

#interview#scheduling#consistency

23 min

◆◆◆AdvancedElasticAmazon

Design a Distributed Search Engine (Elasticsearch)

Index billions of documents and answer full-text queries in milliseconds. Inverted indexes, sharding + replication, scatter-gather, and relevance scoring.

#interview#search#indexing

21 min

◆◆◆AdvancedGoogleMicrosoft

Design an Email Service (Gmail)

Send, receive, store, and search email for hundreds of millions of users. SMTP ingestion, sharded mailbox storage, full-text search, and spam filtering.

#interview#storage#search

23 min

◆◆◆AdvancedAirbnbBooking

Design a Hotel / Airbnb Booking System

Search available listings and book date ranges without double-booking. Availability as a range problem, reservation holds, and the search vs transaction split.

#interview#inventory#consistency

21 min

◆◆IntermediateLeetCodeHackerRank

Design an Online Code Judge (LeetCode)

Run untrusted user code safely against test cases at scale. Sandboxed execution, a judge worker queue, resource limits, and contest-time spikes.

#interview#sandboxing#queues

22 min

◆◆◆AdvancedTwitterMeta

Design Top-K / Trending (heavy hitters)

Find the top-K most frequent items in a massive stream without counting everything exactly. Count-Min Sketch, heavy-hitter algorithms, and approximate streaming aggregation.

#interview#streaming#probabilistic

22 min

◆◆◆AdvancedTwitchYouTube

Design a Live Streaming System (Twitch)

Ingest one broadcaster and fan out to millions of viewers with seconds of latency. Transcoding ladders, HLS/DASH segmenting, CDN fan-out, and live chat.

#interview#media#streaming

22 min

◆◆◆AdvancedAmazonAlibaba

Design a Flash Sale / Seckill System

Sell limited stock to a massive, spiky crowd without overselling. Atomic inventory decrement, request shedding, queues, and graceful degradation.

#interview#concurrency#inventory

22 min

◆◆IntermediateRiot GamesActivision

Design a Real-Time Leaderboard (gaming)

Rank millions of players by score and answer "top N" and "my rank" instantly. Redis sorted sets, sharding by score range, and approximate ranks at scale.

#interview#ranking#caching

22 min

◆◆◆AdvancedCloudflareAkamai

Design a Content Delivery Network (CDN)

Serve content from the edge, close to users, at massive scale. Request routing (anycast & DNS), cache hierarchies, invalidation, and origin shielding.

#interview#caching#networking

21 min

◆◆◆AdvancedGoogleAmazon

Design a Distributed Lock / Coordination Service (ZooKeeper / etcd)

Provide mutual exclusion and coordination across machines safely. Consensus-backed locks, leases, fencing tokens, and why a lock without fencing is unsafe.

#interview#distributed-systems#consensus

21 min

◆◆◆AdvancedNetflixAmazon

Design a Recommendation System (Netflix / TikTok)

Pick the best items for each user from millions of candidates in milliseconds. The two-stage candidate-generation + ranking architecture, embeddings, and feature stores.

#interview#ml#ranking

22 min

◆◆◆AdvancedZoomGoogle

Design a Video Conferencing System (Zoom)

Carry live audio/video among many participants with low latency. WebRTC, the SFU vs MCU vs mesh trade-off, simulcast, and adaptive bitrate.

#interview#realtime#media

21 min

◆◆◆AdvancedSlackMicrosoft

Design Slack (team chat at scale)

Channels, threads, presence, and search across huge workspaces. Real-time fan-out over WebSockets, the gateway problem, and read-state per user.

#interview#realtime#messaging

24 min

◆◆◆AdvancedBloombergCitadel

Design a Stock Exchange (matching engine)

Match buy and sell orders deterministically with microsecond latency and perfect fairness. The single-threaded matching engine, the order book, and event sourcing for recovery.

#interview#low-latency#consistency

22 min

◆◆◆AdvancedDatadogGoogle

Design a Metrics & Monitoring System (Prometheus / Datadog)

Ingest billions of time-series points, store them cheaply, and answer dashboard + alerting queries fast. TSDB internals, cardinality, downsampling, and pull vs push.

#interview#observability#time-series

22 min

◆◆◆AdvancedByteDanceMeta

Design TikTok / Reels (short-video platform)

A vertical-swipe video feed that feels infinite and clairvoyant. Two-tower retrieval, real-time ranking with Monolith, sub-200ms playback via aggressive preloading, and the For You Page that knows you in 90 seconds.

#interview#video#recommendation

22 min

◆◆◆AdvancedGoogleMeta

Design a Distributed File System (GFS / HDFS)

Store petabyte files across thousands of commodity machines for high-throughput batch reads. The single-master + chunkservers design, replication, and append-heavy workloads.

#interview#storage#distributed-systems

23 min

◆◆◆AdvancedGoogleMicrosoft

Design Google Docs (real-time collaborative editor)

Multiple people editing the same document simultaneously, every keystroke synced, never a corrupt merge. Operational Transformation, CRDTs, presence, and the architecture that's run quietly at Google for 17+ years.

#interview#realtime#collaboration

21 min

◆◆◆AdvancedGoogleUber

Design Google Maps (routing & navigation)

Find the fastest route across a continent-scale road graph in milliseconds, with live traffic and ETAs. Contraction hierarchies, live traffic weights, and tile serving.

#interview#geo#graphs

22 min

◆BeginnerMetaGoogle

How to Approach a System Design Interview

A repeatable 45-minute framework — requirements, estimation, high-level design, deep dives, bottlenecks — and the moves that separate a hire from a no-hire.

#interview#process#fundamentals

12 min

◆◆◆AdvancedAmazonMeta

Design a Distributed Key-Value Store (Dynamo)

Build your own DynamoDB / Cassandra. Sharding, replication, quorum reads/writes, vector clocks, conflict resolution.

#interview#distributed-systems#storage

19 min

◆◆IntermediateTwitterDiscord

Design a Distributed Unique ID Generator

How Twitter/Discord/Instagram generate billions of unique IDs per day with no central coordinator. UUIDs, snowflake, ULIDs.

#interview#distributed-systems#identifiers

16 min

◆◆IntermediateGoogleYelp

Design Yelp / Nearby Search (proximity service)

Find restaurants/businesses near a location, fast. Geohash, quadtree, hexagonal cells, and the right index for "within 5 km of me".

#interview#geo#search

16 min

◆◆◆AdvancedMetaInstagram

Design Instagram (photo sharing)

A photo-first social network — uploads, image processing, feed, stories, and global delivery.

#interview#feed#storage

18 min

◆◆◆AdvancedGoogleDropbox

Design Google Drive / Dropbox

File sync that works on every device, blob storage, deduplication, conflict resolution, and how to do all of it efficiently.

#interview#storage#sync

17 min

◆◆IntermediateMetaGoogle

Design a Notification System (Push, Email, SMS)

A reliable multi-channel notification platform — fanout, templates, dedup, rate limiting, and the realities of APNS/FCM.

#interview#notifications#messaging

14 min

◆◆◆AdvancedMetaAmazon

Design a Distributed Cache (like Memcached)

A cache that scales across hundreds of nodes — consistent hashing, replication, eviction, and the operational problems you'll meet.

#interview#caching#distributed-systems

18 min

◆◆IntermediateGoogleAmazon

Design Search Autocomplete (Typeahead)

Sub-100ms autocomplete suggestions across billions of queries — tries, top-k caching, and personalized ranking.

#interview#search#trie

15 min

◆◆◆AdvancedGoogleMicrosoft

Design a Web Crawler (Googlebot)

Crawl billions of URLs at petabyte scale. URL frontier, politeness, deduplication, 4xx/dead-link handling, and the realities of indexing the web.

#interview#crawling#distributed-systems

17 min

◆◆◆AdvancedUberLyft

Design Uber / Lyft (ride hailing)

Match drivers to riders in real time at city scale. Geohashing, dispatch algorithms, surge pricing, and the realtime location pipeline.

#interview#geo#realtime

17 min

◆◆◆AdvancedGoogleNetflix

Design YouTube / Netflix (video streaming)

How a billion users watch ~1 billion hours of video a day. Upload pipeline, transcoding, adaptive bitrate, CDN, recommendation.

#interview#video#cdn

17 min

◆◆◆AdvancedMetaDiscord

Design WhatsApp / Chat System

Realtime 1:1 and group messaging at billions-of-users scale. WebSocket gateways, message store, presence, end-to-end encryption.

#interview#realtime#websocket

16 min

◆◆◆AdvancedMetaInstagram

Design a News Feed (Facebook / Instagram)

How Meta builds the home feed — fanout, ranking, candidate generation, and how the same architecture serves billions.

#interview#feed#fanout

16 min

◆◆IntermediateStripeCloudflare

Design a Rate Limiter

Token bucket, leaky bucket, fixed window, sliding window — and the distributed Redis-based limiter you can copy into production.

#interview#rate-limiting#distributed-systems

14 min

◆◆◆AdvancedMetaTwitter

Design Twitter / X (the home timeline)

500M users, 500M tweets/day, p99 feed loads under 200ms. The fanout-on-write vs fanout-on-read trade-off that defines the system.

#interview#feed#fanout

16 min

◆BeginnerGoogleAmazon

Design a URL Shortener (TinyURL / bit.ly)

A classic FAANG warmup. Generate short codes, store them, redirect fast, scale to billions of URLs.

#interview#hashing#caching

13 min