Design a Distributed Counter (view / like counts)
Count likes and views at millions of increments per second without a single hot row melting. Sharded counters, write batching, and approximate vs exact counts.
The problem
YouTube reports "2.3M views" on a video within minutes of it going viral. Facebook shows "47k likes" on a post — updated in near-real-time as people react. Twitter shows retweet counts climbing by the second during a breaking news event. Behind all of these is the same deceptively simple operation: take a number, add one, store it, read it back fast.
The naive implementation is a single database row with an integer column: UPDATE items SET view_count = view_count + 1 WHERE id = ?. This works perfectly at low traffic — a moderately popular video, a few hundred concurrent viewers. The moment a video goes viral and 100,000 users hit play in the same second, that one row becomes a hot spot. Every increment tries to acquire the same row-level lock. Writes serialize. Latency climbs into seconds. The database melts.
That hot-row problem is the central engineering challenge, but it's not the only one. Likes and views have fundamentally different semantics. A like must be at-most-once per user — a user can't like the same post twice — and users notice if the count goes backward. A view count can be approximate and is best-effort: losing a few increments in a crash is acceptable; serializing every write is not. Choosing one counter architecture for both types gets you either the wrong correctness guarantees or the wrong throughput. The interesting design work is recognizing that difference and routing each type to the right mechanism.
The full solution layers three techniques: sharding the counter across N rows to eliminate the hot-row bottleneck, buffering writes in Redis and flushing aggregated deltas to the database every second to reduce DB write volume by 100–1,000×, and storing likes as idempotent (user, item) edges rather than raw increments. This is excellent interview territory because it forces candidates to reason about write throughput, consistency levels, idempotency, and failure modes — all at once.
Functional requirements
POST /counter/{item_id}/increment— record one view / reaction event for an item.GET /counter/{item_id}→ returns the current count (or approximate count) for the item.- Likes/reactions are at-most-once per user: a user cannot like the same item twice.
- Views are best-effort: each play increments the count, no per-user deduplication required for the raw count (though unique-viewer cardinality is a separate metric).
- Count must not decrease due to a normal failure (no negative partial updates).
Non-functional requirements
- Very high write throughput: platform-wide 2M+ increments/sec; single hot item up to 100k+ increments/sec.
- Low read latency: p99 < 50ms for "what is the current count?"
- Eventual consistency: displayed count may lag by seconds — not minutes.
- Monotonicity: once a count reaches N, it must not drop below N (under normal operation).
- High availability: counter reads and writes should survive single-node failures.
Capacity estimation
| Dimension | Estimate | How we got there |
|---|---|---|
| Avg write rate (platform-wide) | ~23k increments/sec | 2B ÷ 86,400 s |
| Peak write rate (platform-wide) | ~2M increments/sec | Viral spikes are ~100× average |
| Single viral item peak | ~100k increments/sec | e.g., a live-streamed event |
| Active items at any moment | ~500k | Out of 500M total videos/posts on the platform |
| Counter reads (peak) | ~20M read QPS | Read-to-write ≈ 10:1; virtually all served from cache |
| Raw counter storage | 4 GB | 500M items × 8 bytes (int64) |
| Sharded counter storage (N=100) | 400 GB | 500M × 100 shards × 8 bytes; manageable in Cassandra |
| DB write rate after batching | ≤500k writes/sec (platform) | Flush every 1s → at most 1 write per active item per second; 4× reduction at minimum (down from 2M raw events/sec), 100–1,000× for hot items where hundreds of events collapse to one flush |
Takeaway: The platform write load is dominated by a relatively small number of viral items. Sharding across 100 rows per item combined with a 1-second Redis flush window keeps Cassandra write pressure at or below 500k writes/sec — well within a multi-node cluster's capacity — while tolerating at most 1 second of counter lag.
Building up to the design
The distributed counter is deceptively simple. Every layer below breaks for a specific, nameable reason. Walking that path in an interview shows you understand the actual failure mode at each scale.
V1: One row, one increment
UPDATE items SET view_count = view_count + 1 WHERE id = ?;
Simple. Works fine at low QPS — a single Postgres row handles thousands of increments/sec. Each UPDATE acquires a row-level lock, increments, and releases.
This works at neighborhood scale. The failure mode is fundamental though: every increment serializes on that one row lock. Under heavy contention, Postgres throughput on a single hot row typically reaches a few thousand updates/sec in production conditions — the ceiling varies by hardware, WAL fsync settings, and index overhead, but the bottleneck is serialized durable commits and lock contention, not CPU or I/O. Turning off synchronous_commit helps at the margins (typically 10–20%, not orders of magnitude) and does not address the lock-serialization problem under contention. A viral video receiving 100k+ increments/sec will queue everything behind that lock — writes stack up, latency climbs to seconds, and the database melts. This is the hot-row problem.
V2: Sharded (striped) counters
The fix is to stop fighting over one row. Instead of one counter, maintain N sub-counters — one per shard. Each increment picks a random (or hash-assigned) shard and updates only that shard's row. Reads SUM all N shards.
Write: shard = random(0, N-1)
UPDATE counter_shards SET n = n + 1
WHERE item_id = ? AND shard_id = shard;
Read: SELECT SUM(n) FROM counter_shards WHERE item_id = ?;
With N = 100 shards, each shard receives ~1/100th of the write load. A 100k increment/sec item now spreads 1k writes/sec across 100 rows — each row well within a single DB's hot-row threshold. Write throughput scales linearly with N; the hot spot is gone.
flowchart LR
INC[Increment request] --> RNG["random shard 0..N-1"]
RNG --> S0["shard 0\ncounter row"]
RNG --> S1["shard 1\ncounter row"]
RNG --> SN["shard N-1\ncounter row"]
S0 --> SUM["SUM all shards\n(read path)"]
S1 --> SUM
SN --> SUM
SUM --> CNT["Total count"]
style INC fill:#ff6b1a,color:#0a0a0f
style SUM fill:#15803d,color:#fff
style CNT fill:#a855f7,color:#fff
The read is now a scatter-gather across N rows (or N partitions). For 100 shards that's a small SELECT SUM — still fast. The real problem is that we're still doing one DB write per increment. At 2M events/sec platform-wide, that's 2M DB writes/sec, which saturates even a well-tuned Cassandra cluster.
V3: Write batching with a buffer tier
Once two servers share nothing, their buffered counts stay local — but here we want them to share one buffer so we can collapse many increments into one DB flush. The solution is to buffer in Redis and flush periodically.
Increment:
INCR counter:buffer:{item_id} ← atomic Redis increment, sub-millisecond
Flusher (every 1s):
delta = GETDEL counter:buffer:{item_id} ← atomic GET+DEL; requires Redis 6.2+
(on older Redis, use a Lua script for atomicity)
if delta > 0:
UPDATE counter_shards SET n = n + delta
WHERE item_id = ? AND shard_id = hash(item_id) % N;
A video receiving 10,000 views in one second sends exactly 1 DB write per shard per flush interval — not 10,000. Platform-wide, 2M events/sec collapses to at most 500k flushes/sec (one per active item per second), and in practice far fewer because most items have low write rates and batch into a single delta.
Redis handles millions of INCR ops/sec comfortably; the DB stays quiet. The cost is durability: if the Redis node crashes before flushing, you lose up to 1 second of increments. For view counts, losing roughly 1s of data is acceptable. For financial or like counts, you need a stronger durability story — which leads to the next layer.
V4: Exact likes — idempotent edge storage
Likes are categorically different from views. A user should only be able to like once — this is at-most-once per (user, item) pair. The count must be exact: users notice "5 likes" going to "4 likes" or getting a like they didn't give. Raw incrementing, even with the buffer tier, doesn't enforce uniqueness.
The correct model: store the (user, item) edge, not a raw counter.
-- Postgres or DynamoDB
CREATE TABLE likes (
user_id BIGINT NOT NULL,
item_id BIGINT NOT NULL,
liked_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (user_id, item_id) -- composite PK enforces uniqueness
);
-- Like count = cardinality of the set:
SELECT COUNT(*) FROM likes WHERE item_id = ?;
INSERT INTO likes ... ON CONFLICT DO NOTHING is idempotent: double-tapping "like" is safe. The count is always the true cardinality of the edge set — it cannot be artificially inflated by retries, network replays, or duplicate events.
Counting by SELECT COUNT(*) at read time is expensive for high-like items. The fix is to maintain a denormalized cached count in a separate table, updated via a trigger or event stream — but the source of truth is still the edge set. Views have a weaker uniqueness requirement: you might want to count unique viewers as a separate metric, not deduplicate every view. For that, exact set storage (one row per user per video) is too expensive at YouTube scale.
V5: Unique-view cardinality — HyperLogLog
If you want "distinct viewers" without storing every (user, video) pair, use HyperLogLog (HLL). HLL is a probabilistic cardinality estimator. Redis's implementation (which uses 2^14 = 16,384 registers) provides a standard error of ~0.81% and requires at most 12 KB of memory regardless of set size, using a sparse representation at low cardinalities that transitions to the fixed 12 KB dense structure as cardinality grows.
ADD user_id to HLL for item_id:
PFADD unique_viewers:{item_id} user_id
Count distinct viewers:
PFCOUNT unique_viewers:{item_id}
Redis natively supports HyperLogLog with PFADD and PFCOUNT. For very large items (billions of users), HLLs can be merged across shards.
Related: Bloom filters serve a similar role for membership testing — "has this user viewed this video before?" — and can gate increments before they reach the counter pipeline.
flowchart LR
V1["V1: single UPDATE<br/>hot-row bottleneck"] --> V2["V2: sharded counter<br/>N rows, N× throughput"]
V2 --> V3["V3: + Redis buffer<br/>100–1000× write reduction"]
V3 --> V4["V4: likes as edges<br/>idempotent, exact"]
V4 --> V5["V5: + HyperLogLog<br/>unique viewers, 12 KB"]
style V1 fill:#0e7490,color:#fff
style V2 fill:#15803d,color:#fff
style V3 fill:#ff6b1a,color:#0a0a0f
style V5 fill:#a855f7,color:#fff
High-level architecture
flowchart TD
CLIENT[Client] --> GW[API Gateway / Load Balancer]
GW -->|"view event"| VIEW[View Counter Service]
GW -->|"like / unlike"| LIKE[Like Service]
GW -->|"GET count"| READ[Read Service]
VIEW --> RBUF[(Redis<br/>write buffer<br/>INCR per item)]
RBUF -.flush every 1s.-> FLUSH[Flusher Workers]
FLUSH --> CASS[(Cassandra<br/>sharded counter rows)]
LIKE --> PG[(Postgres / DynamoDB<br/>likes edge table)]
PG -.count.-> LCACHE[(Redis<br/>like count cache)]
READ --> RCACHE[(Redis<br/>read cache)]
RCACHE -.miss.-> CASS
CASS -.aggregate.-> RCACHE
style GW fill:#ff6b1a,color:#0a0a0f
style RBUF fill:#15803d,color:#fff
style CASS fill:#0e7490,color:#fff
style PG fill:#0e7490,color:#fff
style RCACHE fill:#a855f7,color:#fff
style FLUSH fill:#ffaa00,color:#0a0a0f
Write path — views
- Client fires
POST /view/{item_id}. The View Counter Service runsINCR counter:buf:{item_id}in Redis — atomic, sub-millisecond, no DB touch. - A flusher worker runs every ~1 second, picks items with a non-zero buffer, reads and resets the delta atomically with
GETDEL(Redis 6.2+; on older versions, use a Lua script that executes GET+DEL atomically), and writes the delta to Cassandra viaUPDATE counter_shards SET n = n + ? WHERE item_id = ? AND shard_id = ?. - Cassandra row is chosen by
shard_id = hash(item_id) % N(N = 100 for high-write items, configurable). The update uses Cassandra'sCOUNTERcolumn type, which handles concurrent increments from multiple replicas at the partition level. Note: Cassandra counter tables do not support lightweight transactions (IF conditions / CAS) — the counter's conflict-resolution semantics make CAS redundant and it is explicitly unsupported.
Write path — likes
- Client fires
POST /like/{item_id}with authenticated user_id. - Like Service issues
INSERT INTO likes (user_id, item_id) ON CONFLICT DO NOTHING. Returns 200 whether or not the like was new (idempotent to the caller). - A change-data-capture (CDC) stream or a DB trigger increments a cached like count in Redis. The cached count is the fast-read answer; the edge table is the source of truth.
sequenceDiagram
participant U as User
participant LS as Like Service
participant PG as Postgres "likes" table
participant CDC as CDC / trigger
participant RC as Redis like-count cache
U->>LS: POST /like/item_42
LS->>PG: INSERT (user_id, item_42) ON CONFLICT DO NOTHING
PG-->>LS: OK (new or duplicate)
LS-->>U: 200 OK
Note over CDC: async, near-real-time
CDC->>RC: INCR like_count:item_42
RC-->>CDC: new cached total
Read path
GET /counter/{item_id}hits the Read Service.- Redis cache lookup. Hit → return cached sum directly. Miss → query Cassandra with
SELECT SUM(n) FROM counter_shards WHERE item_id = ?, write result back to Redis with a short TTL (5–30s), return. - For likes, same pattern but sourced from the like-count Redis key (backed by the edge table).
Sharded counter deep-dive
The number of shards N is a key tuning parameter:
| N (shards) | Max hot-item writes/sec | Read cost (SUM) | When to use |
|---|---|---|---|
| 1 | ~5k–20k | 1 row | Low-traffic items, default |
| 10 | ~50k–200k | 10 rows | Moderate popularity |
| 100 | ~500k–2M | 100 rows | Viral items |
| 1000 | ~5M–20M | 1000 rows | Extreme edge cases |
In practice, counters are assigned N dynamically based on recent write rate. A lightweight "hot item detector" (see design-top-k-heavy-hitters) identifies items that exceed a threshold and upgrades their shard count. Shard count changes require a brief migration window (split existing rows, update routing).
For the read SELECT SUM(n) FROM counter_shards WHERE item_id = ?, N = 100 means reading 100 rows. In Cassandra, those rows are in the same partition (item_id is the partition key, shard_id is the clustering key), so the full aggregate is a single partition scan — fast regardless of N.
-- Cassandra schema (CQL)
CREATE TABLE counter_shards (
item_id BIGINT,
shard_id INT,
n COUNTER,
PRIMARY KEY (item_id, shard_id)
);
-- Increment shard 37 for item 12345
UPDATE counter_shards SET n = n + 1
WHERE item_id = 12345 AND shard_id = 37;
-- Read total
SELECT SUM(n) FROM counter_shards WHERE item_id = 12345;
Cassandra's COUNTER column type is inspired by CRDT research: each coordinator node tracks its own contribution independently, and replicas reconcile by summing those per-node delta contributions — so no acknowledged increment is lost during a replica divergence and reconciliation. The implementation is not a last-write-wins operation; it is a version-tracked, per-node-delta scheme where each node's tally is summed, not overwritten. Be aware of practical caveats: counter tables cannot be mixed with non-counter columns, counter deletes are unsupported, counter updates are not idempotent (retried writes can double-count), and the implementation has historically had reliability issues under high load. For read-heavy workloads, the displayed count is eventually consistent across replicas.
Exact vs approximate — the trade-off table
The most common interview mistake is conflating all counter types and picking one approach for everything. The right answer is "it depends on the semantics" — and then naming the semantic clearly.
| Count type | Semantics | Acceptable error | Recommended approach |
|---|---|---|---|
| Video view count | "2.3M views" display | Seconds of lag, ~1% imprecision | Write buffer → sharded counter |
| Like count | Social signal, user-visible | Should not go backward; near-exact | Idempotent edge store + cached count |
| Unique viewer count | Analytics / advertisers | ~0.81% (HLL error for Redis implementation) | HyperLogLog |
| Story reaction count | Ephemeral, short-lived | Approximate fine | Redis counter (no DB flush needed for short TTL) |
| Revenue-affecting metric | Billing, ads impressions | Exact required | Transactional store with deduplication |
flowchart TD
Q{"What are you counting?"}
Q -->|"likes / reactions\n(user-visible, at-most-once)"| EDGE["Edge store\n(user_id, item_id) PRIMARY KEY\nON CONFLICT DO NOTHING"]
Q -->|"views / plays\n(best-effort, high volume)"| BUF["Redis buffer → sharded\nCassandra counter"]
Q -->|"unique viewers\n(distinct cardinality)"| HLL["HyperLogLog\nPFADD / PFCOUNT\n~0.81% error, 12 KB"]
Q -->|"billing / ad impressions\n(exact required)"| TXN["Transactional store\nwith deduplication"]
style Q fill:#ff6b1a,color:#0a0a0f
style EDGE fill:#15803d,color:#fff
style BUF fill:#0e7490,color:#fff
style HLL fill:#a855f7,color:#fff
style TXN fill:#ffaa00,color:#0a0a0f
Sequence diagram: view increment + flush
sequenceDiagram
participant C as Client
participant API as View API
participant R as Redis buffer
participant F as Flusher
participant DB as Cassandra
participant RC as Read cache
C->>API: POST /view/item_42
API->>R: INCR counter:buf:item_42
R-->>API: value = 1 (ack)
API-->>C: 200 OK
Note over F: every ~1 second
F->>R: GETDEL counter:buf:item_42
R-->>F: delta = 847
F->>DB: UPDATE counter_shards SET n = n + 847 WHERE item_id=42 AND shard_id=7
DB-->>F: OK
F->>RC: DEL cached_count:item_42 (invalidate)
C->>API: GET /counter/item_42
API->>RC: GET cached_count:item_42
RC-->>API: miss
API->>DB: SELECT SUM(n) WHERE item_id=42
DB-->>API: 2301456
API->>RC: SET cached_count:item_42 = 2301456 EX 10
API-->>C: { "count": 2301456 }
Failure modes
Hot key in Redis
Every increment for a viral item hits the same Redis key (counter:buf:{item_id}). Redis is single-threaded per key. A single key can handle roughly 100k–200k INCR operations/sec on modern hardware before becoming a bottleneck.
Apply the same sharding idea to the buffer layer: maintain N buffer keys per item (counter:buf:{item_id}:{shard}), increment a random one, and SUM them during flush. For items below 100k increments/sec, a single key is fine.
Buffer node crash — lost increments
If a Redis node crashes before the flusher runs, in-memory increments are lost. The magnitude: up to 1 second of writes, for items receiving the flush-interval's worth of events. There are three ways to handle this, depending on how much you care:
- Accept the loss (view counts): 1s of lost views on a viral item is imperceptible. This is the correct trade-off for view counts.
- Redis AOF persistence (fsync every second): survives most crashes; increases write latency slightly.
- Kafka as a durable event log: publish every view event to Kafka before buffering in Redis. Flusher reads from Kafka with committed offsets — guaranteed at-least-once delivery. More complex but eliminates the loss window.
Double counting
At-least-once delivery (Kafka retries, HTTP retries from clients) can send the same increment event multiple times. For views, double counting is tolerable — a viewer hitting play twice should probably count twice anyway. The count may be slightly inflated; this is industry-accepted. For likes, the idempotent edge store (ON CONFLICT DO NOTHING) makes duplicates harmless by construction. The like count is always the cardinality of unique (user, item) pairs. For unique viewers, HyperLogLog PFADD is also idempotent — adding the same element twice has no effect on the estimate.
Monotonicity — count going backward
A count that decreases is a trust-destroying bug. Two scenarios where this can happen:
- Shard migration gone wrong: if you split a shard and the new routing kicks in before the data is fully replicated, some reads will SUM a subset of shards and return a lower number. Fix: during migration, serve reads from the old schema until migration is complete, then atomically cut over.
- Cache serving stale data after a count reset: if an admin resets a counter for abuse remediation and the cache is still warm, reads return the old (higher) count. Fix: cache invalidation must happen as part of the admin action, not asynchronously.
Read-after-write inconsistency
A user likes a post, then immediately views their profile and sees "0 likes." This violates user expectations even if eventual consistency is acceptable for anonymous viewers.
Write-behind the like count to a sticky read-your-own-writes (RYOW) cache key per user session, or route the user's next read to the DB (or a synchronous replica) for a short window after a mutation.
Storage choices
| Data | Store | Why |
|---|---|---|
| View increment buffer | Redis (INCR) | Sub-ms atomic increments; millions of ops/sec |
| Durable view counters | Cassandra counter columns | Wide-column; COUNTER type handles concurrent increments; natural partition by item_id |
| Like edges | Postgres or DynamoDB | ACID for uniqueness enforcement; composite PK prevents duplicate likes |
| Like count cache | Redis | Fast cached integer; invalidated on change |
| HyperLogLog (unique viewers) | Redis PFADD/PFCOUNT | Native HLL support; 12 KB per item regardless of cardinality |
| Analytics / time-series | ClickHouse or BigQuery (offline) | Historical view/like trends; not on hot path |
Things to discuss in an interview
The single UPDATE counter = counter + 1 fails under load because the row-level lock serializes all concurrent writers — throughput is bounded by lock-release rate, not hardware capacity.
Sharded counters and write batching are complementary, not alternatives. Sharding eliminates the hot row; batching eliminates per-event DB writes. Candidates who pick one and ignore the other miss half the solution.
For likes, the right frame is that incrementing a raw counter is not idempotent, but storing an edge is. The like count becomes a derived property (cardinality of unique pairs), not a mutable field — which is why at-most-once correctness falls out naturally.
On the durability vs throughput dial: Redis buffer with AOF is a middle ground; Kafka is the durable extreme; pure in-memory is the fast-but-lossy extreme. For view counts, in-memory is right. For ad impression billing, Kafka is right.
For unique-viewer counting, the probabilistic trade-off is HyperLogLog's 0.81% error at up to 12 KB (Redis's implementation, 16,384 registers) vs exact counting at O(N) storage. HLLs can also be merged across time windows — daily unique viewers equals the merge of 24 hourly HLLs.
Things you should now be able to answer
- Why does a single
UPDATErow serialize writes, and at what QPS does it become a problem? - How do sharded counters preserve write throughput — and what does the read path look like?
- Why are likes modeled as edges rather than raw increments?
- What does "eventual consistency" mean for a view count — and what is an acceptable staleness window?
- When would you choose HyperLogLog over an exact set for counting, and what is the error guarantee?
- What happens to buffered view counts if the Redis node crashes mid-second?
Further reading
- "Cassandra COUNTER columns" — Apache Cassandra documentation (counter column semantics and tombstone behavior)
- "HyperLogLog in Practice" — Google Research (2013) — the engineering improvements behind practical HLL implementations
- Design Top-K Heavy Hitters — hot item detection is the prerequisite for dynamic shard assignment
- Bloom Filters — probabilistic membership testing; complement to HyperLogLog for deduplication gates
- "An Analysis of Facebook Photo Caching" — SOSP 2013 — real-world write-heavy counter and caching patterns at social scale
Frequently asked questions
▸What is the hot-row problem in distributed counters?
A single UPDATE counter = counter + 1 WHERE id = ? forces every concurrent increment to acquire the same row-level lock, serializing all writes. Under viral load — for example a single item absorbing 100k+ increments per second — writes queue behind that lock, latency climbs to seconds, and the database melts. Sharded counters and write batching both exist specifically to eliminate this bottleneck.
▸When should you model likes as edges instead of raw increments?
Use an idempotent edge store — a (user_id, item_id) composite primary key with INSERT ON CONFLICT DO NOTHING — any time at-most-once semantics per user matter and the count must not go backward. Raw incrementing, even with batching, cannot prevent double-counting across retries or network replays, whereas the edge table makes the like count the cardinality of a unique set, which is naturally exact.
▸What error rate does Redis HyperLogLog give for unique-viewer counting, and how much memory does it use?
Redis's HyperLogLog implementation, which uses 2^14 = 16,384 registers, provides a standard error of about 0.81% and requires at most 12 KB of memory per counter regardless of set size. It uses a sparse representation at low cardinalities and transitions to the fixed 12 KB dense structure as cardinality grows, making it practical even for items with billions of viewers.
▸How much does write batching reduce database writes for a viral item?
Buffering increments in Redis with INCR and flushing aggregated deltas to Cassandra every second collapses a high-write item to at most one DB write per shard per second. A video receiving 10,000 views in one second sends exactly one DB write per flush interval instead of 10,000 — a 10,000× reduction for that item. Platform-wide, the 2M raw increments per second collapse to at most 500k flushes per second — a 4× reduction overall, with individual hot items seeing 100 to 1,000× or more depending on their write rate.
▸What is the durability risk of the Redis write buffer, and how do you mitigate it?
If the Redis node crashes before the flusher runs, up to one second of buffered increments are lost. For view counts this loss is industry-acceptable and the simplest choice is to accept it. Stronger options are enabling Redis AOF persistence with fsync every second, or publishing every event to Kafka as a durable log before buffering, which eliminates the loss window at the cost of added complexity.
You may also like
Design an LLM Observability Platform
Build the distributed tracing backbone for non-deterministic, multi-step LLM applications — capturing every prompt, completion, token count, and dollar cost across chains, retrievals, and tool calls so you can debug a failed agent run and account for every cent.
Design an LLM Gateway (AI Gateway & Model Router)
A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.
Design an LLM Fine-Tuning Platform
Turn a base model and a dataset into a deployed fine-tuned adapter at scale — the end-to-end platform covering dataset ingestion, LoRA/QLoRA/DPO training, fault-tolerant distributed GPU scheduling, eval gating, and multi-LoRA serving for hundreds of concurrent fine-tunes.