#streaming
10 articles
Design a Large-Scale Data Pipeline (ETL / Batch + Streaming)
Move and transform petabytes from sources into a warehouse/lake for analytics. DAG orchestration, Spark shuffles, lake vs warehouse, and idempotent, replayable jobs.
Design a Real-Time Fraud Detection System
Score transactions for fraud inline in milliseconds. Feature stores, streaming velocity features, rules + ML hybrids, graph fraud rings, and the label-delay problem.
Design an Ad Click Aggregator (real-time analytics)
Ingest billions of ad events, serve per-minute metrics in near-real-time, and produce exact totals for billing — the canonical streaming + lambda/kappa problem.
Design Top-K / Trending (heavy hitters)
Find the top-K most frequent items in a massive stream without counting everything exactly. Count-Min Sketch, heavy-hitter algorithms, and approximate streaming aggregation.
Design a Live Streaming System (Twitch)
Ingest one broadcaster and fan out to millions of viewers with seconds of latency. Transcoding ladders, HLS/DASH segmenting, CDN fan-out, and live chat.
Backpressure & Flow Control
What happens when a fast producer overwhelms a slow consumer? Backpressure, bounded buffers, load shedding, and why unbounded queues are a trap.
Design TikTok / Reels (short-video platform)
A vertical-swipe video feed that feels infinite and clairvoyant. Two-tower retrieval, real-time ranking with Monolith, sub-200ms playback via aggressive preloading, and the For You Page that knows you in 90 seconds.
Design Search Autocomplete (Typeahead)
Sub-100ms autocomplete suggestions across billions of queries — tries, top-k caching, and personalized ranking.
Design Uber / Lyft (ride hailing)
Match drivers to riders in real time at city scale. Geohashing, dispatch algorithms, surge pricing, and the realtime location pipeline.
Design YouTube / Netflix (video streaming)
How a billion users watch ~1 billion hours of video a day. Upload pipeline, transcoding, adaptive bitrate, CDN, recommendation.