// TOPIC

#big-data

6 articles

◆◆◆AdvancedElasticSplunk
01

Design a Centralized Log Aggregation System (ELK / Splunk)

Collect, store, and search logs from thousands of services. Collection agents, a buffered ingestion pipeline, time-based inverted indices, hot-warm-cold tiers, and cost control.

#interview#observability#search
25 min
◆◆◆AdvancedMetaAirbnb
02

Design a Large-Scale Data Pipeline (ETL / Batch + Streaming)

Move and transform petabytes from sources into a warehouse/lake for analytics. DAG orchestration, Spark shuffles, lake vs warehouse, and idempotent, replayable jobs.

#interview#big-data#data-engineering
26 min
◆◆◆AdvancedMetaGoogle
03

Design an Ad Click Aggregator (real-time analytics)

Ingest billions of ad events, serve per-minute metrics in near-real-time, and produce exact totals for billing — the canonical streaming + lambda/kappa problem.

#interview#streaming#analytics
21 min
◆◆◆AdvancedNetflixAmazon
04

Design a Recommendation System (Netflix / TikTok)

Pick the best items for each user from millions of candidates in milliseconds. The two-stage candidate-generation + ranking architecture, embeddings, and feature stores.

#interview#ml#ranking
22 min
◆◆◆AdvancedDatadogGoogle
05

Design a Metrics & Monitoring System (Prometheus / Datadog)

Ingest billions of time-series points, store them cheaply, and answer dashboard + alerting queries fast. TSDB internals, cardinality, downsampling, and pull vs push.

#interview#observability#time-series
22 min
◆◆◆AdvancedGoogleMeta
06

Design a Distributed File System (GFS / HDFS)

Store petabyte files across thousands of commodity machines for high-throughput batch reads. The single-master + chunkservers design, replication, and append-heavy workloads.

#interview#storage#distributed-systems
23 min