ONLINE·v1.0.0·free to read

Mastersystem design.From zero to senior.

$ learn

A free field guide to building scalable systems. Beginner-friendly fundamentals, deep architectural deep-dives, and 55+ FAANG-grade interview problems — all in one place, all with diagrams.

12 modules
Crash Course
22
Deep Dives
55+
Interview Problems
30+ hrs
Reading Time
// 01 — START HERE

The crash course

No system design background? Start here. Twelve focused modules that cover the prerequisites — networking, databases, caching, queues — before you tackle full systems.

See the full roadmap →
// 02 — DEEP DIVES

Featured articles

Real systems — Twitter, Uber, YouTube, WhatsApp — broken down. Every article is tagged by difficulty and ships with sequence and architecture diagrams.

All articles →
◆◆IntermediateAnthropicOpenAI
01

Model Context Protocol (MCP) and Tool-Use Infrastructure

How LLMs safely reach the outside world — from raw function calling to MCP, the open standard that collapses N×M bespoke integrations to N+M, with production-grade security, reliability, and a ~88% token reduction via deferred tool loading.

#ai#llm#agents
31 min
◆◆◆AdvancedLangChainLangfuse
02

Design an LLM Observability Platform

Build the distributed tracing backbone for non-deterministic, multi-step LLM applications — capturing every prompt, completion, token count, and dollar cost across chains, retrievals, and tool calls so you can debug a failed agent run and account for every cent.

#interview#ai#llm
30 min
◆◆◆AdvancedCloudflareOpenAI
03

Design an LLM Gateway (AI Gateway & Model Router)

A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.

#interview#ai#llm
31 min
◆◆◆AdvancedOpenAIMeta
04

Design an LLM Fine-Tuning Platform

Turn a base model and a dataset into a deployed fine-tuned adapter at scale — the end-to-end platform covering dataset ingestion, LoRA/QLoRA/DPO training, fault-tolerant distributed GPU scheduling, eval gating, and multi-LoRA serving for hundreds of concurrent fine-tunes.

#interview#ai#llm
37 min
◆◆◆AdvancedOpenAILangChain
05

Design an LLM Evaluation Platform

Build the system that tells a team whether a prompt or model change made the product better or worse — automatically. Covers offline eval with LLM-as-judge, CI regression gating, online production sampling, human annotation queues, and eval for RAG, agents, and classifiers at the scale of 450 million evaluations per month.

#interview#ai#llm
30 min
◆◆◆AdvancedMicrosoftNeo4j
06

Design a GraphRAG System (Knowledge-Graph-Augmented Retrieval)

When vanilla vector RAG fails on "summarize the entire corpus" and multi-hop questions, you build a knowledge graph first — covering entity extraction, Leiden community detection, map-reduce global search, and graph traversal for multi-hop, based on Microsoft GraphRAG and production deployments at Neo4j, LinkedIn, and Writer.

#interview#ai#rag
27 min
// READY?

Build it. Scale it.
Ship it.

Whether you're prepping for a senior interview or just want to understand how the internet's largest systems work — start here.