~/articles/design-news-feed

◆◆◆Advancedasked at Metaasked at Instagramasked at LinkedIn

Design a News Feed (Facebook / Instagram)

Q: What is the hybrid fanout strategy and when does it switch from push to pull?

Meta fans post IDs into per-user Redis sorted sets at write time for authors with fewer than 10k followers. For anyone above that threshold — celebrities — fanout-on-write is skipped entirely, and the Feed Service fetches their recent posts directly at read time and merges them with the prebuilt candidate set. This caps write amplification per post without blowing up per-request read cost.

Q: Why are candidates cached but the ranked feed never cached?

Ranking signals like recency, recent interactions, and engagement rates change too quickly for the ranked output to stay valid between requests. The candidates — post IDs stored in Redis sorted sets — are stable enough to precompute, but the final ordering must be produced fresh on every feed load to reflect the latest signals.

Q: What is the multi-stage ranking pipeline and what latency does each stage target?

The pipeline has three stages: a light retrieval rank prunes roughly 5,000-10,000 raw candidates down to about 1,000 using cheap heuristics in milliseconds; a heavy ML model (DLRM or transformer) scores those 1,000 survivors and produces about 100 ranked results; a re-rank layer applies diversity and business rules to return the final 50. The full ranking step is budgeted at 50ms, keeping the total hot-path feed response around 105ms before network overhead.

Q: What cache hit rate does TAO achieve, and why does that matter for the social graph?

TAO, Meta's graph store built on MySQL and Memcache, achieves a 96.4% cache hit rate per the 2013 USENIX ATC paper. Reads outnumber writes by more than 99:1, so the underlying database is almost never on the hot path — the aggressive Memcache layer absorbs virtually all graph traversal traffic.

Q: How large is the candidate cache in memory, and what is the practical mitigation for its cost?

Storing raw 8-byte post IDs for 2 billion users at 1,000 candidates each is 16 TB, but Redis sorted-set overhead (skiplist node plus hashtable entry plus robj wrapper) adds roughly 136 bytes per entry, pushing the true in-memory footprint to around 288 TB. The practical fix is to keep only active users warm — roughly 20-30% of total users active in the last 30 days — and lazily compute candidates for the rest, with a compact off-heap store like a RocksDB-backed sorted set for the long tail.

How Meta builds the home feed — fanout, ranking, candidate generation, and how the same architecture serves billions.

16 min read2026-02-08Ironclad Academy

#interview #feed #fanout #ranking #caching

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The problem

Facebook's home feed serves roughly 2 billion people each day. On each feed load, the system needs to pull together posts from hundreds of friends, dozens of pages, and a handful of celebrities — then sort them by what the user is most likely to engage with, not by the time they were posted. Instagram does the same for photos and Reels; LinkedIn does it for professional updates. The product looks deceptively simple: a scrollable list. The engineering behind it is not.

The first hard problem is fanout. When a regular user publishes a post, that content needs to propagate to hundreds of followers' feeds within seconds. Multiply that by 1 billion posts per day and you have roughly 12,000 writes per second flowing into fan-out queues — with peaks three times higher. For celebrities with 50 million followers, naive fanout-on-write would mean 50 million cache insertions per post, which quickly becomes untenable.

The second hard problem is ranking. A chronological feed is easy — sort by timestamp. A ranked feed requires the system to score each candidate post against a model that considers hundreds of signals: how often the viewer interacts with this author, how quickly similar content has gathered engagement, whether the post contains a video or a link, how old it is. Running a heavyweight ML model on 1,000 candidates per request, at 700,000 feed loads per second, means the inference fleet alone could consume thousands of A100-class GPUs.

These two problems — write amplification from fanout and inference cost from ranking — are what make this question interesting. The architecture exists not to store or retrieve posts (that part is solved), but to answer the question: "What 50 things should I show this specific user, right now, in under 500ms?"

Functional requirements

Post status updates, photos, videos, links.
View home feed: posts from friends/people you follow.
Feed is ranked — most relevant first, not chronological.
React, comment, share.
Notifications on engagement.

Non-functional requirements

2B DAU (Facebook scale).
p99 feed load < 500ms.
Posts durable forever.
Ranking quality must improve over time (ML feedback loops).

Capacity estimation

Dimension	Estimate	How we got there
Users	2B DAU	given
Post writes (avg)	~12k/sec	`2B × 0.5 posts/day = 1B/day ÷ 86,400 s ≈ 12k/s`
Post writes (peak)	~35k/sec	~3× avg burst factor
Feed reads (avg)	~700k/sec	`2B × 30 feeds/day = 60B/day ÷ 86,400 s ≈ 700k/s`
Feed reads (peak)	~2M/sec	~3× avg burst factor
Avg post size	~1 KB	200 chars text + media reference
Text storage	~1 TB/day	`1B posts/day × 1 KB/post`
Media storage	100s of TB/day	video/image payloads at scale

Takeaway: The numbers are in the same league as Twitter. The architecture rhymes.

Building up to the design

The full Facebook feed has many moving parts: candidate generation, light rank, heavy ML rank, diversity rules, engagement counters. Walking forward from a college-project version makes it obvious why each one exists.

V1: SQL JOIN

SELECT p.*
FROM posts p JOIN friends f ON f.b = p.author
WHERE f.a = $me
ORDER BY p.created_at DESC LIMIT 50;

This works for 100 users with no infrastructure. The moment you hit real scale — where each user follows 200+ people and the posts table has billions of rows — this JOIN degrades into a full table scan on every feed load. Latency explodes well before you hit 10k posts/sec write load.

V2: Materialize timelines on write

The fix is to move the work to write time. When a user posts, push the post ID into every follower's pre-built feed:user_X Redis sorted set — fanout-on-write. Now a feed read is just ZRANGE ... REV on a single Redis key, which takes about a millisecond regardless of how many people the user follows.

The price: a user with 10 million followers triggers 10 million Redis writes per post. That's fine for ordinary users; it's catastrophic for celebrities.

V3: Hybrid fanout

Above some follower threshold — Meta historically uses around 10k — you skip the write-time fanout entirely. On read, the feed service merges the user's pushed candidates with recent posts from the celebrities they follow. You cap the write amplification per post; you cap the per-read celebrity fetch. This is the same trick Twitter uses, and it holds up at every scale.

But now you notice that chronological order isn't actually what users want. A friend's wedding photo from six hours ago is more important than a cousin's meme from thirty minutes ago. Fanout gave you speed. It didn't give you quality.

V4: Ranking on top of candidates

The insight: stop thinking of that cached per-user list as "the feed." It's a candidate set — three hundred to a thousand items — from which the ranker will choose the fifty to show. At read time, score each candidate with a model (recency, affinity, predicted engagement) and return the top 50.

flowchart LR
    CACHE[Candidate Cache] --> R[Ranker]
    R --> FS[(Feature Store)]
    R --> M[ML Model]
    R --> FEED[Top 50]
    style R fill:#ff6b1a,color:#0a0a0f
    style FS fill:#0e7490,color:#fff

Feed quality jumps. Engagement metrics improve. Then you do the math: 700k feed fetches per second, each scoring 1000 candidates through a heavy ML model. That's 700M model evaluations per second — thousands of A100-class GPUs just for inference.

V5: Two-stage ranking + feature store

You need a cheaper first filter. A lightweight linear model runs on pre-computed features and prunes the full raw pool (~5k–10k) down to ~1000 in a few milliseconds. Only those survivors go to the heavy model, which scores them and hands ~100 to a re-ranking layer that applies diversity rules and returns 50. A precomputed feature store serves user features and per-post features at scoring time, so the ranker is mostly a join plus a forward pass. Latency comes back under 100ms; the expensive model still runs, just on a much smaller window.

V6: Production-grade everything else

Add engagement counters (Kafka → batched Cassandra), notifications (Kafka consumer per event type), moderation in the post path, multi-region replication for the feed cache, and deduplication/diversity rules in re-rank.

flowchart LR
    V1[V1: SQL JOIN<br/>100 users] --> V2[V2: + fanout on write<br/>fast reads]
    V2 --> V3[V3: + hybrid push/pull<br/>celebs handled]
    V3 --> V4[V4: + ML rank<br/>quality]
    V4 --> V5[V5: + light/heavy rank<br/>latency]
    V5 --> V6[V6: + counters, mod, multi-region]
    style V1 fill:#0e7490,color:#fff
    style V3 fill:#15803d,color:#fff
    style V4 fill:#ff6b1a,color:#0a0a0f
    style V6 fill:#a855f7,color:#fff

High-level architecture

flowchart TD
    U[User] --> CDN
    CDN --> APIGW[API Gateway]
    APIGW --> POST[Post Service]
    APIGW --> FEED[Feed Service]
    APIGW --> ENG[Engagement Service]
    APIGW --> NOTIF[Notification Service]

    POST --> POSTDB[(Posts DB<br/>sharded)]
    POST --> KAFKA[Kafka events]

    KAFKA --> FANOUT[Fanout Workers]
    FANOUT --> CAND[(Candidate Cache<br/>per user)]

    FEED --> CAND
    FEED --> RANKER[Ranker Service]
    RANKER --> FS[(Feature Store)]
    RANKER --> ML[ML Model Server]

    KAFKA --> NOTIF
    KAFKA --> COUNTERS[Engagement Counters]

    style POST fill:#ff6b1a,color:#0a0a0f
    style FEED fill:#0e7490,color:#fff
    style RANKER fill:#a855f7,color:#fff

Three things happen in parallel:

Ingestion: posts come in, get stored, fanned out to candidate caches.
Candidate generation: each user has a precomputed list of "things to maybe show."
Ranking (at read time): the model decides what to show first.

Candidate generation

The fanout decision happens post-write. After a post lands in the Posts DB and a NewPost event appears on Kafka, the Fanout Workers pick it up and check the author's follower count.

flowchart TD
    K[Kafka: NewPost event] --> FW[Fanout Worker]
    FW --> CHK{follower count}
    CHK -->|"< 10k followers"| PUSH["Push post ID into<br/>each follower's candidate cache<br/>(fanout-on-write)"]
    CHK -->|"> 10k followers"| SKIP[Skip fanout<br/>mark as celebrity]
    PUSH --> CC[(Candidate Cache<br/>Redis sorted sets)]
    SKIP --> CP[(Celebrity Posts<br/>fetched at read time)]
    style FW fill:#ff6b1a,color:#0a0a0f
    style CC fill:#15803d,color:#fff
    style CP fill:#0e7490,color:#fff

At read time, the Feed Service merges both pools: the pre-built candidate list from the sorted set plus recent posts from any celebrities the user follows. Together these become the 300–1000 item candidate set the ranker works with. The candidates aren't the final feed — they're an ordered superset from which the ranker will choose ~50 to show.

sequenceDiagram
    participant User
    participant Feed Svc
    participant Candidate Cache
    participant Celeb Cache
    participant Ranker
    participant Model
    User->>Feed Svc: GET /feed
    Feed Svc->>Candidate Cache: get top-1000 candidates
    Feed Svc->>Celeb Cache: get celeb posts
    Candidate Cache-->>Feed Svc: candidates
    Celeb Cache-->>Feed Svc: celeb posts
    Feed Svc->>Ranker: rank(candidates, user_features)
    Ranker->>Model: predict_engagement
    Model-->>Ranker: scores
    Ranker-->>Feed Svc: ranked top 50
    Feed Svc-->>User: feed

Ranking

This is where Meta spends most of its compute budget.

What signals matter?

Feature group	Examples
User → post	does the user click on this author often? does the post topic match user interest?
User → user	how strong is the friendship (likes exchanged, messages, time on profile)?
Post quality	engagement rate so far, content-policy score, video watch-through
Recency	post is 2 hours old → fresh; 2 days → stale
Diversity	don't show 5 posts from one author in a row

A mature ranking model (DLRM, transformer over user history) takes hundreds of features and emits a probability of "user will engage with this if shown."

Multi-stage ranking

You can't run the heavy model on everything. Before any filtering, the raw pool is actually larger than the candidate cache alone — when you union the per-user sorted set with celebrity pulls, you might start with ~5,000–10,000 candidates. Running the full DLRM on all of them, for every user, at 700k req/sec, is not a budget you can afford.

flowchart LR
    C[~10000 raw candidates] --> RR[Light retrieval rank]
    RR --> CC[~1000 candidates]
    CC --> HR[Heavy ML model]
    HR --> CCC[~100 ranked]
    CCC --> RB[Re-rank + diversity rules]
    RB --> F[~50 final feed]
    style RR fill:#ffaa00,color:#0a0a0f
    style HR fill:#ff6b1a,color:#0a0a0f
    style RB fill:#15803d,color:#fff

The light retrieval rank is cheap on purpose: simple heuristics or a lightweight linear model filter the large pool down to ~1000 using signals like raw recency, social proximity, and content-type preference. Fast enough that the overhead is negligible.

The heavy ranking stage gets the 1000 survivors. A full DLRM or transformer scores them with hundreds of features. This is where the compute budget goes; embedding tables alone can exceed memory on a single host, so model servers are sharded.

Re-ranking is the step the ML model shouldn't have to learn — business rules enforced after scoring. Don't show five posts from the same author back-to-back. Inject ads at the right cadence. Suppress content that failed a policy check since the model last saw it. Mix Reels, Stories, and text posts in the right ratio.

Why does the two-stage split matter so much? At 700k feed fetches/sec with 1000 candidates each, running the heavy model end-to-end means 700M model inferences/sec — roughly thousands of A100-class GPUs just for the inference fleet. The light-rank stage cuts that by 5–10× at a fraction of the cost.

Operational failure modes staff engineers should know:

Feature store staleness: if the feature store is lagging — say a pipeline is delayed — the ranker silently degrades to stale signals. Instrument feature freshness as a first-class metric and alarm on p99 age of user features.

Model/serving skew: training uses batch-computed features; serving uses real-time features. If these drift (different aggregation windows, different defaults for missing keys), model performance degrades invisibly. Track training-serving skew in shadow mode.

Ranker latency spike cascade: the feed request blocks on the ranker. If a model server is slow, p99 feed latency blows up. Mitigate with hedged requests (send to two shards, take the first response) and per-request timeouts with fallback to light-rank-only output.

Storage choices

Data	Store	Why
Posts	Cassandra / TAO	High write, append-mostly
Friend graph	TAO (Meta), MyRocks (MySQL+RocksDB)	TAO is the graph layer; MyRocks is the storage engine underneath — LSM-tree reduces storage by ~62% vs InnoDB for social-graph workloads
Candidate cache	Redis sorted sets	In-memory ranking
Feature store	RocksDB / Bigtable	High-throughput key-value lookup
Engagement counters	Cassandra / Manhattan	Eventually consistent, fast incr
Model embeddings	Specialized stores (FAISS / vector DB)	Approximate nearest neighbor
Search index	Elasticsearch / Meta's Unicorn	Full-text

The friend graph (TAO)

Meta's TAO is a graph database built on MySQL + Memcache, optimized for an "objects + associations" model.

object: user, post, photo
association: friend(user_a, user_b), liked(user, post), commented_on(user, post)

Queries look like:

TAO.assoc_get(user_a, "friend", limit 100)        → my friends
TAO.assoc_get(post, "liked", limit 100)           → who liked this
TAO.obj_get(post_id)                               → post payload

TAO caches aggressively in Memcache, with cache invalidation tied to write paths. The production system achieves a 96.4% cache hit rate (per the 2013 USENIX ATC paper), and reads vastly outnumber writes — more than 99:1 — so the database is almost never on the hot path.

Hot path: viewing the feed

Step by step, with the latency budget:

Auth check (5ms)
Read candidate cache (10ms — Redis sorted set ZRANGE)
Hydrate candidates with post bodies (15ms — Memcache MGET)
Read user features (10ms — feature store)
Rank (50ms — model server, with vector embedding lookups)
Apply re-ranking rules (5ms)
Serialize + return (10ms)

Total: ~105ms. Add network RTT and TLS on top of that.

Hot path: posting

sequenceDiagram
    participant U as User
    participant API
    participant POST as Post Svc
    participant DB as Posts DB
    participant K as Kafka
    participant FAN as Fanout
    participant CC as Candidate Cache
    U->>API: POST /post
    API->>POST: create
    POST->>DB: INSERT
    POST->>K: NewPost event
    POST-->>API: 201
    Note over K,FAN: async
    K->>FAN: deliver
    FAN->>CC: ZADD candidate for each follower (within reason)

For a power user with millions of followers, fanout writes happen in batches and may take seconds. The rest of the system shouldn't block on it — the Kafka event decouples the fast write acknowledgment from the slow propagation.

Trade-offs worth knowing

Push vs. pull model

	Push (fanout-on-write)	Pull (fanout-on-read)
Read latency	Low (precomputed)	High (compute at read)
Write cost	Proportional to followers	Constant
Storage	High (per-user lists)	Low
Fits	Most users	Celebs / inactive followers

Use both. That's what every production system does.

Eventual consistency

Your friend posts and it might take 2–10 seconds before it appears in your feed. That's fine. But if you post, you should see your own post on next refresh — otherwise the product feels broken even if it's technically consistent. The solution is to maintain a "self" cache updated synchronously on post creation; on feed load, the Feed Service merges it with the regular candidate set.

flowchart LR
    ME[Your POST /post] --> POSTDB[(Posts DB)]
    ME --> SELF[(Self cache<br/>sync write)]
    ME --> K2[Kafka → async fanout<br/>for your followers]
    SELF --> FEED[Your own feed load]
    FEED -->|merge| CAND[(Candidate Cache)]
    style SELF fill:#ffaa00,color:#0a0a0f
    style FEED fill:#0e7490,color:#fff

Storage cost

Storing per-user candidate lists for 2B users gets expensive quickly. Raw post IDs alone are 2B users × 1000 candidates × 8B ≈ 16 TB — manageable in isolation. But Redis sorted sets use skiplist encoding once a key exceeds 128 members, and the overhead per entry (skiplist node + hashtable entry + robj wrapper, excluding the member string itself) is roughly 136 bytes. For 8-byte post IDs that's ~144 bytes per entry total, pushing the full in-memory figure to 2B × 1000 × 144 B ≈ 288 TB — far beyond what you'd want warm in Redis.

The practical mitigations: only store candidates for active users (last 30 days, roughly 20–30% of total users), compute lazily for stale users, and consider a more compact off-heap store like a RocksDB-backed sorted set for the long tail.

Ranking infrastructure (the part nobody talks about)

A feed ranker is itself a small distributed system worth understanding:

flowchart LR
    REQ[Feed request] --> SHARDED[Sharded model servers]
    SHARDED --> EMB[Embedding lookup<br/>vector DB]
    SHARDED --> COUNTERS[Counter lookup<br/>'how many likes does this post have?']
    SHARDED --> SCORE[Score]
    style SHARDED fill:#ff6b1a,color:#0a0a0f
    style EMB fill:#a855f7,color:#fff

The model has billions of parameters, so it's sharded across hosts. Embeddings — per-user, per-post — live in a vector store and are batched for lookup. Engagement counters are read with eventual consistency; a post's like count being off by a few hundred for a few seconds is not a correctness problem.

Edge cases

Feed update mid-scroll

The user is reading. Five new posts arrive at the top. Inject them inline? Reorder everything they've already read? The practical answer is a banner: "5 new posts — tap to refresh." Don't reflow what they're already reading.

Re-rank on every load vs. cache the ranked feed?

Cache the candidates, re-rank fresh on every load. Ranking signals — recency, recent interactions, engagement rates — change too quickly to cache the outputs. For pull-to-refresh, the same applies: re-rank, but make sure to include the latest posts.

Ad insertion

Ads are interleaved into the ranked feed by the same ranker, with a separate "expected revenue" signal layered in. Real systems jointly optimize "user value × ad value" — the re-ranking stage is where that tradeoff gets resolved.

Things you should now be able to answer

Why is candidate generation separate from ranking?
What's the difference between push and pull, and why use both?
Why is multi-stage ranking necessary? What does the light rank actually save?
Why is TAO a graph store and not a SQL DB?
What happens between the user clicking "post" and seeing it in their friend's feed?
Why do you cache candidates but re-rank on every load?
What three failure modes can silently degrade ranking quality without throwing errors?
How does the storage cost of per-user candidate lists change when you account for Redis overhead vs. raw ID size?

Frequently asked questions

▸What is the hybrid fanout strategy and when does it switch from push to pull?