APIs and Communication Protocols
REST, gRPC, GraphQL, WebSockets, Server-Sent Events, and webhooks — when to use each, how to design them, and the patterns that keep them sane at scale.
The previous module covered the transport layer — how bytes move between machines. This module covers the layer above: how programs talk to each other over those bytes. APIs are the contracts; protocols are the conventions. Get them right and you can change anything underneath without breaking callers. Get them wrong and every change becomes a coordinated multi-team migration.
The four big families
Almost every system you'll design uses one or more of these:
flowchart TD
API[Inter-service communication] --> SYNC[Synchronous<br/>request-response]
API --> ASYNC[Asynchronous<br/>fire-and-forget]
API --> PERSIST[Persistent<br/>full-duplex channel]
SYNC --> REST[REST<br/>HTTP + JSON]
SYNC --> GRPC[gRPC<br/>HTTP/2 + Protobuf]
SYNC --> GQL[GraphQL<br/>HTTP + flexible queries]
ASYNC --> SSE[SSE<br/>server → client only]
ASYNC --> WH[Webhooks<br/>server → server callbacks]
PERSIST --> WS[WebSocket<br/>bi-directional, persistent]
style REST fill:#ff6b1a,color:#0a0a0f
style GRPC fill:#0e7490,color:#fff
style GQL fill:#ff2e88,color:#fff
style WS fill:#15803d,color:#fff
The rough rule for picking one: reach for REST when you're building a public API and want something every client can hit with curl. Lean toward gRPC for internal microservices where you control both ends and care about payload size and strict contracts. Choose GraphQL when multiple clients (web, iOS, Android) each want different slices of deeply nested data. Upgrade to WebSockets when you need the server to push updates in real time and the client also needs to send events back. Use SSE when the server just needs to push a stream one-way. And webhooks are for when someone else's server needs to notify yours — you hand them a URL, they call it.
REST in depth
REST (Representational State Transfer) is less a protocol than a set of conventions on top of HTTP. Done well, it gives you a uniform, cacheable, debug-friendly API. Done poorly, it gives you JSON-over-HTTP without the benefits.
The six REST constraints (in plain English)
- Client–server — separation of concerns; clients evolve independently.
- Stateless — every request carries everything the server needs. No session memory between calls.
- Cacheable — responses are explicitly marked cacheable or not.
- Uniform interface — resources, methods, representations are predictable.
- Layered — proxies, CDNs, gateways can sit between client and origin.
- Code on demand (optional, rarely used).
The big one is stateless. If your "REST" API needs server-side session state to make sense of the next call, you can't horizontally scale it without sticky sessions, and you've thrown away half the value.
Resource-oriented design
A REST API models the world as resources addressed by URLs, manipulated through a small set of HTTP methods.
GET /orders → list orders
POST /orders → create order
GET /orders/123 → read order 123
PUT /orders/123 → replace order 123
PATCH /orders/123 → modify fields of order 123
DELETE /orders/123 → delete order 123
GET /orders/123/items → list items in order 123
POST /orders/123/items → add an item to order 123
Anti-pattern: POST /createOrder, POST /updateOrderStatus, POST /deleteOrder. That's RPC dressed up in HTTP — you've lost the uniform interface.
Verb-action mismatches that bite people
| You want to... | Actually do |
|---|---|
| Cancel an order | POST /orders/123/cancellations (creates a cancellation resource) or PATCH /orders/123 {"status":"cancelled"} |
| Send a password reset email | POST /password-resets (creates a reset request, returns a token) |
| Search | GET /orders?status=open&customer_id=42 (filters as query params) |
| Bulk update | POST /orders/bulk-update — pure REST has no good answer; this is the pragmatic exception |
Status codes that pull their weight
You don't need all 60+. You need twelve:
| Code | Meaning | When |
|---|---|---|
| 200 | OK | Successful read or update |
| 201 | Created | Successful POST that created a resource (return Location: /orders/123) |
| 202 | Accepted | Async work queued; client should poll |
| 204 | No Content | Successful DELETE or empty PUT |
| 400 | Bad Request | Malformed body, missing fields, validation failed |
| 401 | Unauthorized | No / bad credentials (yes, the name is wrong — should be "Unauthenticated") |
| 403 | Forbidden | Authenticated but not allowed |
| 404 | Not Found | Resource missing |
| 409 | Conflict | Optimistic concurrency conflict, duplicate key |
| 429 | Too Many Requests | Rate limited (include Retry-After header) |
| 500 | Internal Server Error | Unexpected bug — fix it |
| 503 | Service Unavailable | Overloaded or in maintenance — retry later |
Pagination — the four flavors
Listing resources without pagination is a footgun. Four ways to paginate, and only one of them holds up at scale:
flowchart TD
P[Pagination strategies] --> OL[Offset/Limit<br/>?page=3&size=20]
P --> CB[Cursor-based<br/>?cursor=eyJpZCI6MTIz]
P --> KB[Keyset / seek<br/>?after_id=123&limit=20]
P --> TB[Time-based<br/>?since=2026-01-01]
style OL fill:#ff2e88,color:#fff
style CB fill:#15803d,color:#fff
style KB fill:#0e7490,color:#fff
Offset/limit (?page=3&size=20) is the beginner's choice. Simple to implement, easy to explain — and a trap at scale. OFFSET 1,000,000 forces the database to scan and discard a million rows on every page request. Worse, new inserts shift page boundaries while you're paginating, so rows get skipped or duplicated.
Cursor-based pagination is the fix. The server returns an opaque next_cursor token with each page; the client passes it back to get the next one. Under the hood, the cursor usually encodes the keyset value (the last id seen). No offset scan, no drift under inserts, no ability to jump to page N — which turns out to be fine for nearly every real use case.
Keyset / seek (WHERE id > last_id ORDER BY id) is essentially cursor-based made explicit — same database behavior, just without the opaque token.
Time-based (?since=...) works naturally for event feeds but only when data is genuinely chronological and you don't need to navigate backwards.
For a public list endpoint with mutating data, cursor-based wins. Always include a next_cursor in the response and let the client pass it back.
Versioning without breaking the world
Three strategies:
- URL versioning:
/v1/orders,/v2/orders. Most common, most explicit, easiest to route at the edge. - Header versioning:
Accept: application/vnd.example.v2+json. Cleaner URLs, harder to debug. - No versioning, evolve forward only: never break, only add. Stripe's approach (with date-based "API versions" as fallback). Highest discipline cost; lowest cognitive overhead for callers.
The forgotten rule: never break v1 while v2 is live. Run both for as long as a real customer is on v1.
Filtering, sorting, sparse fields
GET /orders?status=paid&customer_id=42 ← filtering
GET /orders?sort=-created_at,total ← sorting (- = desc)
GET /orders?fields=id,total,status ← sparse fields (only return these)
GET /orders?include=customer,items ← side-load related resources
These let one endpoint serve many UI needs without proliferating endpoints. Pick a convention early and apply it consistently.
Idempotency keys (the production-grade superpower)
Networks fail mid-request. The client retries. Did the first attempt go through? Without idempotency, you've now charged the customer twice.
The fix: the client generates a UUID and sends it in an Idempotency-Key header. The server stores (key, response) for ~24h. If the same key arrives again, the server returns the original response without re-executing.
POST /payments
Idempotency-Key: 7f3c2e80-4b5d-4a8e-9f2c-1234567890ab
Content-Type: application/json
{"amount": 4200, "currency": "usd", "source": "tok_..."}
Stripe pioneered this; everybody copied it. It is the single most important pattern for making a write API safe to retry.
gRPC: the internal-service workhorse
gRPC is Google's RPC framework: HTTP/2 transport, protobuf for serialization, code generation in every major language. You define your API once in a .proto file:
syntax = "proto3";
service Orders {
rpc Get(GetOrderRequest) returns (Order);
rpc List(ListOrdersRequest) returns (stream Order);
rpc Create(CreateOrderRequest) returns (Order);
}
message Order {
string id = 1;
int64 customer_id = 2;
int32 total_cents = 3;
Status status = 4;
}
A protoc compiler generates client and server stubs in Go, Java, Python, Rust, etc. Calls look like local function calls.
Why teams choose gRPC for internal services: Protobuf binary is typically 3–10× smaller than equivalent JSON, so payloads are smaller and parsing is faster, while HTTP/2 multiplexing lets many in-flight requests share one connection and cuts connection overhead significantly. The schema is the contract — breaking changes surface at compile time, not at 2 a.m. when a caller starts returning garbage. Streaming (client, server, or bi-directional) is a first-class feature, not a bolt-on. And deadlines and cancellation are built into the protocol, so a client's timeout propagates through the entire call chain.
The flip side: you can't debug a binary protobuf frame with curl — you need grpcurl and dedicated tooling. Browsers can't speak gRPC natively, so client-facing endpoints need a gRPC-Web proxy in front. And schema evolution requires discipline: you cannot reuse a protobuf field number, ever, or you silently corrupt data on older clients.
gRPC streaming patterns
flowchart LR
subgraph S1[Unary]
C1[Client] -->|1 req| S
S -->|1 resp| C1
end
subgraph S2[Server streaming]
C2[Client] -->|1 req| Sa
Sa -->|stream| C2
end
subgraph S3[Client streaming]
C3[Client] -->|stream| Sb
Sb -->|1 resp| C3
end
subgraph S4[Bi-directional]
C4[Client] <-->|stream| Sc
end
style Sa fill:#0e7490,color:#fff
style Sb fill:#15803d,color:#fff
style Sc fill:#ff6b1a,color:#0a0a0f
Server streaming is great for live tailing (logs, prices). Bi-directional is how chat services and live game servers move data.
GraphQL: the client-controlled query
GraphQL is a query language for APIs. Instead of the server defining endpoints, the client declares the shape of the response:
query {
order(id: "123") {
id
total
customer {
name
email
}
items {
product { name price }
quantity
}
}
}
Server returns exactly those fields, in that shape, in one round-trip. No more "GET /orders/123, GET /orders/123/customer, GET /orders/123/items" cascades.
What GraphQL gets right: Mobile apps love it — one request, exactly the bytes needed, no over-fetching. Multiple frontends (web/iOS/Android) can each ask for what they need without backend changes. The schema is strongly typed and introspectable; generators give you typed clients automatically.
What GraphQL gets wrong: The N+1 problem. A naive resolver for items.product runs a separate DB query per item in the list. The fix is DataLoader, which batches lookups within a single event loop tick. You will write or import this; you will not get away with not. Caching is also harder — HTTP caching is path and query-string-based, but all GraphQL traffic is POST /graphql. You need persisted queries or a GraphQL-aware cache. Authorization is per-field rather than per-endpoint, which is more flexible but means more code. And exposing a fully flexible query API to the public is a footgun — limit query depth and complexity or someone will hit you with users { posts { comments { user { posts { ... } } } } }.
Use GraphQL when you have multiple clients with diverging needs and one team owning the schema. Avoid it for simple, single-client CRUD — REST is less rope to hang yourself with.
REST vs gRPC vs GraphQL — the side-by-side
| Aspect | REST | gRPC | GraphQL |
|---|---|---|---|
| Transport | HTTP/1.1 or HTTP/2 | HTTP/2 (required) | HTTP/1.1, HTTP/2 |
| Payload | JSON, XML | Protobuf (binary) | JSON |
| Schema | OpenAPI (optional) | .proto (required) | SDL (required) |
| Client codegen | Optional | Built-in | Built-in |
| Browser-friendly | Yes | No (needs gRPC-Web) | Yes |
| Streaming | SSE, WS bolt-ons | Native | Subscriptions (over WS) |
| Caching | HTTP caching just works | DIY | Hard |
| Debug with curl | Yes | No | Sort of |
| Best for | Public APIs | Internal microservices | Multi-client apps with deep data |
Real-time: WebSockets vs SSE vs long polling
When the server needs to push to the client, you have three options — and the right one is often simpler than you'd expect.
Long polling (1990s technology, still works)
Client opens a request; server holds it open until there's something to send (or timeout); client immediately reopens it.
sequenceDiagram
participant C as Client
participant S as Server
C->>S: GET /events?since=42
Note over S: holds open<br/>up to 30s
S-->>C: {"event": "new message"}
C->>S: GET /events?since=43
Note over S: holds open
S-->>C: timeout / empty
C->>S: GET /events?since=43
Long polling works through every proxy, firewall, and ancient client — it's plain HTTP with a long timeout. The cost is reconnect latency: each new batch of events requires a fresh round-trip, and many holding connections can strain server resources.
Server-Sent Events (SSE)
A long-lived HTTP response that streams text/event-stream. One-way server → client.
GET /events HTTP/1.1
Accept: text/event-stream
HTTP/1.1 200 OK
Content-Type: text/event-stream
event: message
data: {"id": 42, "text": "hi"}
event: ping
data: {}
SSE is dead simple: a native browser API (new EventSource('/events')), automatic reconnect with Last-Event-ID, and no extra libraries needed. The constraint is directionality — it's server to client only. If the client also needs to send events, add a separate POST endpoint or step up to WebSocket. One practical footnote: HTTP/1.1 limits browsers to six concurrent connections per origin total — shared across all open tabs. If you have three tabs each holding two SSE connections to the same origin, you've hit the ceiling and every other request from every tab queues. This is a known "Won't fix" in Chrome and Firefox; the answer is HTTP/2, which multiplexes streams over a single connection.
WebSockets
A full-duplex TCP connection that starts as HTTP and "upgrades" to WS:
GET /chat HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
HTTP/1.1 101 Switching Protocols
After the upgrade, both sides exchange framed messages until either closes. Bi-directional, low per-message overhead, near-zero latency once connected. The cost: the connection is stateful, which pins it to one server and complicates load balancing (see Load Balancers for sticky sessions). Proxies can behave oddly with long-lived WebSocket connections. You also own reconnection and resubscribe logic — the protocol doesn't hand you that.
Picking between them
flowchart TD
Q1{Need server → client push?}
Q1 -->|No| REST[Plain REST or polling]
Q1 -->|Yes| Q2{Need client → server too?}
Q2 -->|No, just push| SSE[Server-Sent Events]
Q2 -->|Yes, bi-directional| Q3{Latency-critical?}
Q3 -->|Yes| WS[WebSocket]
Q3 -->|No| WS2[WebSocket or LP]
style SSE fill:#15803d,color:#fff
style WS fill:#0e7490,color:#fff
SSE is criminally underused. For "send me notifications as they happen" — the case for 80% of "real-time" features in web apps — it's simpler and safer than WebSocket. Reach for WebSocket only when you genuinely need the client to push data back at high frequency (chat, collaborative editing, live gaming).
Webhooks: someone else's server calls yours
A webhook is a callback URL you register with someone else's service. When an event happens there, they POST to your URL.
sequenceDiagram
participant App as Your App
participant S as Stripe
participant U as User
Note over App,S: setup time
App->>S: register webhook URL
Note over App,S: later
U->>S: completes payment
S->>App: POST /webhooks/stripe<br/>(signed payload)
App-->>S: 200 OK (within 10s — Stripe's limit)
Five rules for receiving webhooks:
- Verify the signature. Every reputable provider signs payloads (HMAC). Reject anything unsigned.
- Respond within their timeout (usually 5–30s). Do the actual work asynchronously — enqueue a job, return 200, process later.
- Be idempotent. Webhooks retry. Use the event ID to dedupe.
- Handle out-of-order events. The provider does not guarantee order on retries.
- Replay-safe. Persist the raw event before processing, so you can rebuild state from the log.
Sending webhooks (you become the provider) adds: signed payloads, exponential-backoff retries, a delivery dashboard, and a mechanism for customers to disable a flapping endpoint.
How a request flows through these layers
It helps to see all of this together. Here's the path a typical API request takes from a mobile client through to the database — every component from this module appears somewhere in that chain.
flowchart LR
MOB[Mobile client] -->|REST over HTTPS| GW[API Gateway]
GW -->|rate limit check| RL[Rate Limiter]
RL -->|allowed| AUTH[Auth middleware]
AUTH -->|JWT verified| SVC[Service]
SVC -->|gRPC| DS[Downstream service]
SVC -->|WebSocket push| WS[Realtime clients]
SVC --> DB[(Database)]
SVC -->|webhook callback| EXT[External partner]
style GW fill:#ff6b1a,color:#0a0a0f
style RL fill:#ffaa00,color:#0a0a0f
style SVC fill:#0e7490,color:#fff
style DS fill:#a855f7,color:#fff
style WS fill:#15803d,color:#fff
The mobile client speaks REST to the API gateway, which enforces rate limits and verifies auth before the request reaches your service. Internally, that service calls downstream services over gRPC. It pushes updates out to connected browser clients over WebSocket. And if an external partner needs to know about the event, the service fires a signed webhook.
Authentication patterns (a tour)
Every API needs to identify callers. The four patterns you'll see:
| Pattern | How it works | When |
|---|---|---|
| API key | Long random string in header | Internal services, simple SaaS |
| Bearer token (JWT) | Signed token; server verifies signature | User-facing, stateless |
| OAuth 2.0 | Token issued by a third-party identity provider | "Sign in with Google", delegated access |
| mTLS | Each side presents a TLS cert | Service mesh, B2B integrations |
JWT pitfall: putting too much in the token. JWTs cannot be revoked once issued. Either keep them short-lived (15 min) with a refresh token, or maintain a revocation list (and you've lost statelessness).
Error envelopes and partial failures
A consistent error shape is worth more than any specific spec. RFC 7807 (application/problem+json) is a good default:
{
"type": "https://api.example.com/errors/insufficient-funds",
"title": "Insufficient funds",
"status": 402,
"detail": "Account 1234 has balance $42.10; requested charge $50.00.",
"instance": "/accounts/1234/charges",
"trace_id": "01HX..."
}
The trace_id is the killer field — it's the breadcrumb your support team uses to find the request in your logs.
For batch endpoints that partially fail, return per-item statuses:
{
"results": [
{"id": "a", "status": 200},
{"id": "b", "status": 409, "error": "duplicate"}
]
}
Don't return 500 for "9 of 10 worked" — that throws away the 9 successes.
Rate limiting (a preview)
Every public API limits requests per caller. The two algorithms you'll see:
- Token bucket: bucket holds N tokens, refills at R per second. Each request consumes one. Allows bursts up to N.
- Sliding window: count requests in the last 60 seconds; reject if over limit.
When rate limited, return 429 Too Many Requests with:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1736284800
Full coverage is in Reliability Patterns and the rate limiter article.
A quick smell test for any API design
Before you ship, walk through this checklist:
- Stateless? No server-side session required to interpret the next request.
- Idempotent writes? Critical write endpoints accept an
Idempotency-Key. - Versioned? Either URL-versioned or with a clear forward-evolution rule.
- Paginated? Every list endpoint, with a default page size and a max.
- Rate limited? Per-caller, with predictable headers.
- Error envelope? Consistent shape with a trace ID.
- Authenticated? And the auth method appropriate for the audience.
- Documented? OpenAPI / proto / SDL — machine-readable, generated client libs.
- Cacheable where possible? GETs return appropriate
Cache-Control. - Backward-compatible deprecation path? When you change something, callers learn how to migrate.
Tick these and you have an API that won't be the source of next quarter's incidents.
Things you should now be able to answer
- A POST request times out — your client retries. How does the server know not to charge twice?
- Why is offset/limit pagination dangerous on a busy table?
- Your mobile app needs three pieces of data on one screen and you currently have three REST endpoints. What are the trade-offs of consolidating?
- A coworker proposes WebSockets for "real-time notifications". Why might SSE be a better fit?
- Why can't you safely "revoke" a JWT?
- gRPC streams beat REST polling for live data — but what's the cost in operational complexity?
→ Next: Databases
Frequently asked questions
▸What is an idempotency key and why does it matter for write APIs?
An idempotency key is a UUID the client generates and sends in an Idempotency-Key header on write requests. The server stores the key paired with the original response for roughly 24 hours; if the same key arrives again — because the client retried after a timeout — the server returns the original response without re-executing the operation. This is the single most important pattern for making a write API safe to retry without double-charging a customer.
▸Why is offset/limit pagination dangerous at scale, and what should you use instead?
OFFSET 1,000,000 forces the database to scan and discard a million rows on every page request, and concurrent inserts shift page boundaries so rows get skipped or duplicated. Cursor-based pagination fixes both problems: the server returns an opaque next_cursor token encoding the last ID seen, the client passes it back, and the database uses a keyset seek with no offset scan and no drift under inserts.
▸When should you choose gRPC over REST for inter-service communication?
Choose gRPC when you control both ends of the connection and care about payload size and strict contracts. Protobuf binary is typically 3 to 10 times smaller than equivalent JSON, HTTP/2 multiplexing lets many in-flight requests share one connection, and breaking schema changes surface at compile time rather than at runtime. The trade-off is that you cannot debug gRPC frames with curl and browsers cannot speak gRPC natively without a gRPC-Web proxy.
▸What is the key difference between SSE and WebSockets, and when should you prefer SSE?
SSE is a one-way server-to-client stream over a long-lived HTTP response, while WebSocket is a full-duplex TCP connection where both sides send frames freely. For the common case of pushing notifications to a browser, SSE is simpler: it uses a native browser API, reconnects automatically with Last-Event-ID, and requires no extra libraries. Reach for WebSocket only when the client must also send data back at high frequency, such as in chat, collaborative editing, or live gaming.
▸What are the three versioning strategies for a REST API?
URL versioning places the version in the path such as /v1/orders and is the most explicit and easiest to route at the edge. Header versioning encodes the version in the Accept header, producing cleaner URLs but harder debugging. The third approach is no versioning at all, evolving forward only by never breaking existing behavior and only adding — Stripe's method, which has the lowest cognitive overhead for callers but the highest discipline cost internally.