// TOPIC

#big-data

6 articles

Design a Centralized Log Aggregation System (ELK / Splunk)

Collect, store, and search logs from thousands of services. Collection agents, a buffered ingestion pipeline, time-based inverted indices, hot-warm-cold tiers, and cost control.

#interview#observability#search

25 min

◆◆◆AdvancedMetaAirbnb

Design a Large-Scale Data Pipeline (ETL / Batch + Streaming)

Move and transform petabytes from sources into a warehouse/lake for analytics. DAG orchestration, Spark shuffles, lake vs warehouse, and idempotent, replayable jobs.

#interview#big-data#data-engineering

26 min

◆◆◆AdvancedMetaGoogle

Design an Ad Click Aggregator (real-time analytics)

Ingest billions of ad events, serve per-minute metrics in near-real-time, and produce exact totals for billing — the canonical streaming + lambda/kappa problem.

#interview#streaming#analytics

21 min

◆◆◆AdvancedNetflixAmazon

Design a Recommendation System (Netflix / TikTok)

Pick the best items for each user from millions of candidates in milliseconds. The two-stage candidate-generation + ranking architecture, embeddings, and feature stores.

#interview#ml#ranking

22 min

◆◆◆AdvancedDatadogGoogle

Design a Metrics & Monitoring System (Prometheus / Datadog)

Ingest billions of time-series points, store them cheaply, and answer dashboard + alerting queries fast. TSDB internals, cardinality, downsampling, and pull vs push.

#interview#observability#time-series

22 min

◆◆◆AdvancedGoogleMeta

Design a Distributed File System (GFS / HDFS)

Store petabyte files across thousands of commodity machines for high-throughput batch reads. The single-master + chunkservers design, replication, and append-heavy workloads.

#interview#storage#distributed-systems

23 min