ONLINE·v1.0.0·free to read

Mastersystem design.From zero to senior.

$ learn

A free field guide to building scalable systems. Beginner-friendly fundamentals, deep architectural deep-dives, and 55+ FAANG-grade interview problems — all in one place, all with diagrams.

▸ Open the roadmap Browse all articles →

12 modules

Crash Course

Deep Dives

55+

Interview Problems

30+ hrs

Reading Time

// 01 — START HERE

The crash course

No system design background? Start here. Twelve focused modules that cover the prerequisites — networking, databases, caching, queues — before you tackle full systems.

See the full roadmap →

0113 min

What Is System Design?

Start here. What system design actually means, why it matters, the framework that turns a vague problem into a defensible architecture, and how to read every other article in this course.

0214 min

Back-of-the-Envelope Estimation

Capacity math you can do in your head — QPS, storage, bandwidth, memory — the latency numbers every engineer should know, and worked examples for Twitter, URL shorteners, and a chat system.

0317 min

Networking and HTTP

TCP vs UDP, DNS, TLS, HTTP/1.1 vs HTTP/2 vs HTTP/3, anycast routing, and what actually happens between a browser and a server.

0417 min

APIs and Communication Protocols

REST, gRPC, GraphQL, WebSockets, Server-Sent Events, and webhooks — when to use each, how to design them, and the patterns that keep them sane at scale.

// 02 — DEEP DIVES

Featured articles

Real systems — Twitter, Uber, YouTube, WhatsApp — broken down. Every article is tagged by difficulty and ships with sequence and architecture diagrams.

All articles →

◆◆IntermediateAnthropicOpenAI

Model Context Protocol (MCP) and Tool-Use Infrastructure

How LLMs safely reach the outside world — from raw function calling to MCP, the open standard that collapses N×M bespoke integrations to N+M, with production-grade security, reliability, and a ~88% token reduction via deferred tool loading.

#ai#llm#agents

31 min

◆◆◆AdvancedLangChainLangfuse

Design an LLM Observability Platform

Build the distributed tracing backbone for non-deterministic, multi-step LLM applications — capturing every prompt, completion, token count, and dollar cost across chains, retrievals, and tool calls so you can debug a failed agent run and account for every cent.

#interview#ai#llm

30 min

◆◆◆AdvancedCloudflareOpenAI

Design an LLM Gateway (AI Gateway & Model Router)

A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.

#interview#ai#llm

31 min

◆◆◆AdvancedOpenAIMeta

Design an LLM Fine-Tuning Platform

Turn a base model and a dataset into a deployed fine-tuned adapter at scale — the end-to-end platform covering dataset ingestion, LoRA/QLoRA/DPO training, fault-tolerant distributed GPU scheduling, eval gating, and multi-LoRA serving for hundreds of concurrent fine-tunes.

#interview#ai#llm

37 min

◆◆◆AdvancedOpenAILangChain

Design an LLM Evaluation Platform

Build the system that tells a team whether a prompt or model change made the product better or worse — automatically. Covers offline eval with LLM-as-judge, CI regression gating, online production sampling, human annotation queues, and eval for RAG, agents, and classifiers at the scale of 450 million evaluations per month.

#interview#ai#llm

30 min

◆◆◆AdvancedMicrosoftNeo4j

Design a GraphRAG System (Knowledge-Graph-Augmented Retrieval)

When vanilla vector RAG fails on "summarize the entire corpus" and multi-hop questions, you build a knowledge graph first — covering entity extraction, Leiden community detection, map-reduce global search, and graph traversal for multi-hop, based on Microsoft GraphRAG and production deployments at Neo4j, LinkedIn, and Writer.

#interview#ai#rag

27 min

// 03 — TOPICS

Browse by topic

#interview69 #distributed-systems31 #consistency18 #ai17 #caching15 #llm14 #fundamentals14 #search12 #storage12 #streaming10 #databases10 #realtime8

// 04 — COMPANIES

Tackle FAANG interviews

Ten classic system design problems pulled from interview reports at Meta, Google, Amazon, Netflix, and more. Each comes with requirements, capacity estimation, and architecture deep-dive.

Google41 Amazon40 Meta35 Microsoft17 Uber13 Netflix13 OpenAI11 Stripe9 Airbnb8 Twitter7 Anthropic6 LinkedIn6 Slack5 Cloudflare4 Apple4 Discord4 LangChain3 Datadog3 Shopify3 PayPal3 YouTube3 Cassandra3 Instagram3 Block2 Arize2 Databricks2 DoorDash2 AWS2 NVIDIA2 GitHub2 Spotify2 Notion2 Elastic2 Booking2 CockroachDB2 TikTok2 Confluent2 Nasdaq2 Lyft2 Postgres2 RocksDB2 Langfuse1 LiteLLM1 Portkey1 Hugging Face1 Together AI1 Braintrust1 Neo4j1 LlamaIndex1 Tecton1 Ramp1 Intercom1 Sierra1 Zendesk1 Decagon1 Deepgram1 ElevenLabs1 LiveKit1 Twilio1 Lakera1 Cursor1 Sourcegraph1 Pinecone1 Glean1 Splunk1 eBay1 Auth01 Carta1 Snapchat1 StubHub1 YugabyteDB1 Paytm1 Algolia1 Expedia1 LeetCode1 HackerRank1 Codeforces1 Twitch1 Alibaba1 Riot Games1 Activision1 Akamai1 Fastly1 Zoom1 Cisco1 Bloomberg1 Citadel1 Robinhood1 ByteDance1 Snap1 Figma1 LaunchDarkly1 Statsig1 Flagsmith1 Unleash1 OpenFeature1 Nginx1 Pastebin.com1 MySQL1 Etcd1 Consul1 Kafka1 DynamoDB1 Bitcoin1 Yelp1 Dropbox1 Telegram1

// READY?

Build it. Scale it.
Ship it.

Whether you're prepping for a senior interview or just want to understand how the internet's largest systems work — start here.

▸ Open the roadmap Skip to articles

Mastersystem design.From zero to senior.

The crash course

What Is System Design?

Back-of-the-Envelope Estimation

Networking and HTTP

APIs and Communication Protocols

Featured articles

Model Context Protocol (MCP) and Tool-Use Infrastructure

Design an LLM Observability Platform

Design an LLM Gateway (AI Gateway & Model Router)

Design an LLM Fine-Tuning Platform

Design an LLM Evaluation Platform

Design a GraphRAG System (Knowledge-Graph-Augmented Retrieval)

Browse by topic

Tackle FAANG interviews

Build it. Scale it.Ship it.

Build it. Scale it.
Ship it.