~/articles/design-authorization-system

◆◆◆Advancedasked at Googleasked at Auth0asked at Airbnbasked at Carta

Design an Authorization System (Google Zanzibar / RBAC / ReBAC)

Answer "can user U do action A on resource R?" globally, in milliseconds, consistently. RBAC vs ABAC vs ReBAC, Zanzibar relation tuples, and the new-enemy problem.

23 min read2026-06-08Ironclad Academy

#interview #security #distributed-systems #consistency

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The problem

Authorization is the question every product eventually has to answer at the hot path of every request: "is this user allowed to do this thing to this resource?" Google Drive, GitHub, Notion, Carta, and Airbnb all run dedicated authorization systems because the answer to that question is not as simple as a row in a permissions table. A user can be a member of a group, that group can have access to a folder, and the folder can grant inherited access to ten thousand documents nested inside it — so "can alice view doc:readme?" requires traversing a graph of relationships, not just checking a single row.

The Google Zanzibar paper, published in 2019, is the clearest public account of how to build this at planet scale. The core model it introduced — relation tuples of the form (object, relation, subject) — has since been adopted or independently reinvented by nearly every authorization-as-a-service product: OpenFGA (Auth0/Okta), SpiceDB (Authzed), Ory Keto, and Carta's internal system. Understanding Zanzibar is understanding the state of the art.

The two things that make this hard: latency and consistency after revocation. Authorization sits on the critical path of every API call, so the p95 check time must be under 10 ms globally — which means caching is not optional, it is the system. But aggressive caching creates a window where a freshly-revoked user still passes checks, which is a security defect, not just a bug. Closing that window without destroying latency is the central engineering tension this article works through.

The patterns described here — userset rewrites, the Leopard membership index, zookies as snapshot tokens, push-based cache invalidation — solve real problems that emerge at real scale. Each one exists because the simpler approach failed in a specific, concrete way.

Functional requirements

check(subject, relation, object) → bool — the hot path, called on every request.
write(tuple) / delete(tuple) — grant or revoke access.
expand(object, relation) → tree of all subjects who have that relation.
lookup_resources(subject, relation, type) → all objects subject can access (the "list what I can see" query).
lookup_subjects(object, relation, type) → all subjects who can access an object (the "who has access?" admin query).

Non-functional requirements

Extremely low check latency — p95 < 10 ms globally; p50 < 3 ms. Authorization sits on the critical path of every API request.
High availability — authorization being down is indistinguishable from the entire product being down.
Prompt revocation — after a tuple is deleted, stale cached grants must not persist beyond a few seconds. This is a hard security requirement.
Scalability — trillions of tuples, millions of check QPS.
Expressiveness — must model RBAC roles, attribute conditions, org hierarchies, and nested group memberships without a bespoke schema per policy.

Capacity estimation

Dimension	Estimate	How we got there
Tuple storage (design scale)	128 GB raw	`1B tuples × 128 bytes/tuple`
Tuple storage per node	64 MB / node	`128 GB ÷ 2,000 nodes` — fits in memory
Tuple storage (Zanzibar actual)	>100 TB, >2T tuples	From the 2019 Zanzibar paper
Check QPS (design scale)	20M checks/sec global	Illustrative design-exercise scale
Check QPS (Zanzibar actual)	~4.2M peak check QPS	Total client QPS (all types) >10M
Check QPS per region	4M checks/sec	`20M ÷ 5 geographic regions`
Check QPS per pod	20,000 checks/sec	`4M ÷ 200 auth server pods per region`
Tuple lookups per check	~4 lookups	Avg group expansion depth ≈ 4
Cache hit rate target	≥ 95%	Design target
DB reads/sec (cache misses)	4M DB reads/sec	`20M checks/sec × 5% miss rate × 4 lookups/check`
Tuple fetches/sec from storage	40M tuple fetches/sec	`4M DB reads × ~10 tuples per read`
Write QPS	~10k writes/sec	New document shares, group membership changes
Write propagation deadline	~5 s	Each write must reach all pods within ~5s → fan-out invalidation

The math shows why the cache is load-bearing: without it, 20M check QPS drives 80M tuple DB reads per second (4 lookups each), which is impractical. The cache is not an optimization — it is the system.

Authorization models — a brief taxonomy

Before the architecture, it's worth understanding the policy model options, because the model you pick dictates everything downstream.

RBAC: Role-based access control

Subjects have roles. Roles have permissions. Permissions are (action, resource-type) pairs.

user:alice  → role:editor
role:editor → can: doc:write, doc:read, comment:create

Simple, auditable, and sufficient for most enterprise software. Where RBAC breaks down is per-resource, per-instance permissions: "Alice is an editor of this specific document but not all documents." Expressing that requires either duplicating roles per resource or bolting on a separate exception table — both of which become unmaintainable at scale.

ABAC: Attribute-based access control

Policy rules evaluate arbitrary attributes of the subject, resource, and environment.

allow if subject.department == resource.owning_department
         AND environment.time_of_day BETWEEN 09:00 AND 18:00

Maximally expressive. The cost is auditability — "why was Alice denied?" requires tracing through a mini-interpreter's policy evaluation — and the cognitive overhead of writing and reviewing those rules. Common in government and healthcare compliance use cases where the expressiveness is worth the complexity.

ReBAC: Relationship-based access control

Access is determined by the graph of relationships between subjects and objects. Google Drive is the canonical example: Alice can read a document because Alice is a member of a group that has viewer access to a parent folder that contains the document. The chain is traversable as a graph; the check is a reachability query.

ReBAC subsumes RBAC (a role is just a group relationship). It expresses resource-instance permissions naturally and scales to billions of objects. Chains that go many hops deep require engineering care, but that is a solvable problem — which is what the Leopard index (covered later) addresses.

For a global auth service: build on ReBAC. RBAC becomes a special case. ABAC attribute conditions can be layered as a post-filter after the ReBAC check passes.

flowchart TD
    RBAC["RBAC: flat role assignment<br/>user → role → permission"]
    ABAC["ABAC: rule evaluation<br/>allow if attr(user) ∧ attr(resource)"]
    REBAC["ReBAC: graph traversal<br/>user reachable from resource via relations?"]

    RBAC -->|special case of| REBAC
    ABAC -->|post-filter on top of| REBAC

    style REBAC fill:#ff6b1a,color:#0a0a0f
    style RBAC fill:#0e7490,color:#fff
    style ABAC fill:#a855f7,color:#fff

Zanzibar relation tuples

The fundamental unit of the Zanzibar data model is a relation tuple:

object # relation @ subject

Where:

object is a typed entity: doc:readme, folder:eng, repo:linux.
relation is a string naming the relationship: viewer, editor, owner, member.
subject is either a user (user:alice) or a userset (group:eng#member).

Concrete examples

doc:readme#viewer@user:alice          -- alice is a viewer of readme
doc:readme#editor@user:bob            -- bob is an editor of readme
doc:readme#viewer@group:eng#member   -- every member of group:eng is a viewer
group:eng#member@user:carol           -- carol is a member of group:eng
folder:root#viewer@user:alice         -- alice can view the root folder
doc:readme#parent@folder:root         -- readme is a child of root folder

From just these tuples, the system can answer "can carol view doc:readme?" by expanding doc:readme#viewer → includes group:eng#member → carol is a member of group:eng → allow.

Userset rewrites (inheritance rules)

A namespace configuration defines how relations inherit from each other. For a "doc" namespace:

viewer:
  union of:
    - direct viewers          // doc:readme#viewer@user:alice
    - editors                 // anyone who is an editor is also a viewer
    - parent folder viewers   // inherit from parent via 'parent' relation

This is a userset rewrite: a set algebra (union, intersection, exclusion) over other relations. Every "is Alice a viewer?" check expands this rule recursively until all branches resolve to concrete users, or the expansion short-circuits on a match.

The check algorithm

Here is the question: given a check like check(user:carol, viewer, doc:readme), how does the system actually evaluate it?

It starts by loading the direct tuples for doc:readme#viewer. If carol appears directly, done. If not, it expands each userset reference it finds — checking if carol is in group:eng, then trying the parent folder rule, running independent branches in parallel, and short-circuiting the moment any branch succeeds.

flowchart TD
    REQ["check(user:carol, viewer, doc:readme)"] --> LOOKUP[Lookup tuples for doc:readme#viewer]
    LOOKUP --> FOUND{"Direct match<br/>user:carol?"}
    FOUND -->|yes| ALLOW[return allow]
    FOUND -->|no| EXPAND[Expand usersets]
    EXPAND --> GROUP["doc:readme#viewer includes group:eng#member"]
    GROUP --> MEM[Is carol in group:eng?]
    MEM --> TUPLEQ["Lookup group:eng#member"]
    TUPLEQ --> MATCH{"carol found?"}
    MATCH -->|yes| ALLOW
    MATCH -->|no| PARENT["Try parent folder rule"]
    PARENT --> FOLDER["doc:readme#parent → folder:root"]
    FOLDER --> FOLDERCHK["check(user:carol, viewer, folder:root)"]
    FOLDERCHK -->|"recurse..."| ALLOW
    MATCH2[No match in any branch] --> DENY[return deny]
    style ALLOW fill:#15803d,color:#fff
    style DENY fill:#ff2e88,color:#fff
    style REQ fill:#ff6b1a,color:#0a0a0f

A few properties worth noting. First, parallel expansion: independent branches (direct viewer check and group expansion) fan out concurrently — the first one to match wins. Second, cycle detection: a cycle in the tuple graph (A is parent of B, B is parent of A) must be detected to prevent infinite loops. A visited-set per check request handles this. Third, in most real permission chains, the depth is shallow — 3 to 6 hops — so the recursion terminates quickly. The Leopard index handles the pathological case of large flat groups, covered below.

The new-enemy problem

This is the most subtle consistency hazard in authorization systems, and interviewers love it.

Scenario:

Document doc:secret has viewer@group:eng#member.
Alice was a member of group:eng an hour ago. She was just removed. The tuple group:eng#member@user:alice is deleted.
Milliseconds later, Alice's request arrives at an auth server pod that hasn't yet received the invalidation.
That pod's cache still says "alice is a member of group:eng". The check passes. Alice reads a document she should no longer be able to access.

Alice is the "new enemy" — she appears after the ACL was evaluated, because the evaluation happened on stale data.

Zanzibar's solution: zookies (snapshot tokens)

When a tuple write is committed, the system returns a zookie — an opaque token encoding a globally meaningful timestamp that identifies the database snapshot that includes this write.

The application stores the zookie alongside the resource (or passes it in the request). When it calls check(), it passes the zookie. The auth service guarantees it will evaluate the check on a snapshot that is at least as fresh as the write that produced the zookie.

sequenceDiagram
    participant App
    participant WriteService
    participant CheckService
    participant DB

    App->>WriteService: delete(group:eng#member@user:alice)
    WriteService->>DB: commit tuple deletion
    DB-->>WriteService: zookie z42 (timestamp t=1000)
    WriteService-->>App: zookie z42

    Note over App: Store z42 with this session/resource

    App->>CheckService: check(alice, viewer, doc:secret) with zookie z42
    CheckService->>DB: read at snapshot >= t=1000
    DB-->>CheckService: alice NOT in group:eng (fresh read)
    CheckService-->>App: deny

Without zookies, the check might have hit a cached pod that was at t=999. With zookies, the service is forced to either use a cached entry populated after t=1000, or do a fresh read from storage at that snapshot. This is external consistency at the granularity of a causally-related set of changes — not full global linearizability (which would kill latency), but enough to close the new-enemy vulnerability.

Caching strategy and cache invalidation

The check hot path is:

1. Check in-process memory (pod-local cache of recent tuples + decisions)
2. Check a shared distributed cache (Redis or equivalent)
3. Read from tuple storage

Pod-local watch cache

Each auth server pod maintains an in-memory watch cache: a snapshot of tuples for the objects it has recently been queried about, kept up-to-date via a change log subscription.

flowchart LR
    WRITE[Write Service] -->|tuple mutation| CHLOG[(Change Log<br/>e.g. Kafka/Spanner CDC)]
    CHLOG -->|subscribe| POD1[Auth Pod 1<br/>watch cache]
    CHLOG -->|subscribe| POD2[Auth Pod 2<br/>watch cache]
    CHLOG -->|subscribe| POD3[Auth Pod 3<br/>watch cache]
    style CHLOG fill:#ffaa00,color:#0a0a0f
    style POD1 fill:#15803d,color:#fff
    style POD2 fill:#15803d,color:#fff
    style POD3 fill:#15803d,color:#fff

When a tuple is written, the write service appends to the change log. Every subscribed pod receives the mutation within seconds and applies it to its local snapshot. This is push-based invalidation: stale entries are evicted as soon as the change propagates, not when a TTL expires.

The trade-off is that each pod must maintain a subscription to the change log and keep a non-trivial in-memory structure. A pod restart must replay recent history before serving checks, or fall back to reading directly from storage. The Zanzibar paper calls this the "watch" mechanism and notes it is key to achieving both low latency and correctness.

Consistency vs latency

Strategy	Latency	Staleness risk
Always read from storage	High (~5–20ms)	None — always fresh
TTL-based cache (no invalidation)	Very low (~0.1ms)	Up to TTL (minutes) — unacceptable for revocations
TTL-based cache + eager invalidation on write	Low (~0.5ms)	Bounded by change propagation (~1–5s)
Watch cache (push invalidation, no TTL)	Very low (~0.1ms)	Bounded by subscription lag (~1–5s)
Zookie-enforced fresh read for causally-related checks	Depends on cache state	Zero for the causal chain that matters

The watch cache plus zookie combination is the right answer: serve most checks from the fast pod-local cache, but fall back to a guaranteed-fresh read when the caller has a zookie newer than the cache snapshot.

The Leopard index — scaling group expansion

The naive group expansion — recursively fetch tuples depth-first — has poor performance when a group has millions of members or when membership chains are 10+ levels deep. Zanzibar describes a specialized index called Leopard to handle this.

Leopard pre-computes flattened membership for groups: for every (group, relation) pair, it maintains a materialized set of all directly and transitively included users. Updates to the membership graph trigger incremental recomputation.

group:eng#member = {alice, bob, carol, dan, ...10M users}
(materialized, incrementally maintained)

For a check like "is alice a viewer of doc:readme" where viewer includes group:eng#member, the check reduces to: "is alice in this precomputed set?" — an O(1) lookup instead of a graph traversal over millions of edges.

The cost is write amplification. Adding one user to a large group requires updating Leopard entries for every object that group has direct or inherited access to. For very large groups and very large object sets, this fan-out must be rate-limited and applied asynchronously. During the window between write and Leopard update, the system falls back to recursive expansion.

Reverse index queries — listing resources and subjects

The check() API is a point lookup: given (subject, relation, object), is there a path? But applications also need:

"List all documents Alice can view" (lookup_resources)
"List all users who can view doc:readme" (lookup_subjects)

These are fan-out queries, not point lookups. The tuple store schema optimized for check (indexed by object+relation) does not efficiently answer "all objects with a particular subject." You need a reverse index.

Forward index (for check): (object, relation) → [subjects]
Reverse index (for listing): (subject, relation, type) → [objects]

The reverse index is harder to keep consistent because a single tuple mutation — say, adding alice to a group — must update the reverse index for every object that group has access to, which may be millions of objects. In practice: for small sets (Alice has explicit access to under 10,000 objects), the reverse index is cheap and maintained synchronously. For large sets (Alice is in a group with access to millions of objects), listing is served by intersecting the group membership reverse index with an object-type-scoped reverse index — a two-level fan-out that is expensive but bounded. Listing is explicitly a lower-SLA API with a latency target of 100ms to 1s, not 10ms. It does not belong on the hot check path.

Full architecture

flowchart TD
    APP[Application / API Servers] -->|"check(u,a,r) + zookie"| LB[Load Balancer]
    APP -->|"write(tuple)"| WS[Write Service]

    LB --> AUTH1[Auth Pod]
    LB --> AUTH2[Auth Pod]
    LB --> AUTH3[Auth Pod]

    AUTH1 --> WCACHE[(Watch Cache<br/>in-memory per pod)]
    AUTH2 --> WCACHE2[(Watch Cache<br/>in-memory per pod)]
    AUTH3 --> WCACHE3[(Watch Cache<br/>in-memory per pod)]

    WCACHE -.miss.-> STORAGE[(Tuple Store<br/>globally sharded<br/>e.g. Spanner / CockroachDB)]
    WCACHE2 -.miss.-> STORAGE
    WCACHE3 -.miss.-> STORAGE

    WS --> STORAGE
    WS --> CHLOG[Change Log]
    CHLOG --> AUTH1
    CHLOG --> AUTH2
    CHLOG --> AUTH3

    STORAGE --> LEOPARD[(Leopard Index<br/>flat group membership)]
    AUTH1 --> LEOPARD
    AUTH2 --> LEOPARD
    AUTH3 --> LEOPARD

    STORAGE --> REVIDX[(Reverse Index<br/>for listing)]
    APP2[Admin UI] -->|"lookup_resources / lookup_subjects"| REVIDX

    style APP fill:#ff6b1a,color:#0a0a0f
    style STORAGE fill:#0e7490,color:#fff
    style WCACHE fill:#15803d,color:#fff
    style WCACHE2 fill:#15803d,color:#fff
    style WCACHE3 fill:#15803d,color:#fff
    style LEOPARD fill:#a855f7,color:#fff
    style CHLOG fill:#ffaa00,color:#0a0a0f

Building up to the design

It helps to see how a naive implementation breaks step by step, because each failure mode motivates the next piece.

V1: ACL table per resource

CREATE TABLE acls (
  resource_id  TEXT,
  user_id      TEXT,
  permission   TEXT,
  PRIMARY KEY (resource_id, user_id, permission)
);

SELECT 1 FROM acls WHERE resource_id=? AND user_id=? AND permission=?

One query, low latency. Works for thousands of resources. The immediate problem is that there is no group support: "revoke alice from all of engineering's documents" requires deleting millions of rows one at a time. No inheritance, no hierarchy, and the schema explodes as new resource types appear.

V2: Roles (RBAC)

Add a users_roles and roles_permissions table. Now a permission change to a role propagates to all users with that role. This is an improvement — but per-resource, per-instance control is still missing. "Alice is an editor of this specific document but not all documents" requires a row per (user, resource), which drags you right back to V1's problems for large object counts.

V3: Relation tuples (ReBAC)

Replace ACL tables with a unified tuples table:

CREATE TABLE tuples (
  object_type   TEXT,
  object_id     TEXT,
  relation      TEXT,
  subject_type  TEXT,  -- 'user' or 'group'
  subject_id    TEXT,
  subject_rel   TEXT,  -- non-null when subject is a userset (group:eng#member)
  PRIMARY KEY (object_type, object_id, relation, subject_type, subject_id, subject_rel)
);

Check is now a recursive lookup. Groups, inheritance, and RBAC roles are all expressed uniformly as tuples. A new resource type is just a new object_type string — no schema migration. Recursive expansion becomes expensive at depth, and group membership queries require table scans. Cache and index.

V4: Caching and change propagation

Deploy auth server pods with in-memory watch caches. Subscribe to a change log. Accept and store zookies. Now more than 95% of checks hit the cache at ~0.1ms; revocations propagate in 1 to 5 seconds. Large flat groups — 10M+ members — still force the recursive expansion to visit 10M tuples.

V5: Leopard + reverse index

Pre-materialize group membership (Leopard). Build a reverse index for listing. Accept that listing is a lower-SLA API served asynchronously. This is the production design.

flowchart LR
    V1["V1: ACL table<br/>no groups, no hierarchy"] --> V2["V2: + RBAC roles<br/>group permissions, no per-resource"]
    V2 --> V3["V3: Relation tuples<br/>uniform model, slow expansion"]
    V3 --> V4["V4: + watch cache + zookies<br/>fast, consistent revocation"]
    V4 --> V5["V5: + Leopard + reverse index<br/>large groups, listing queries"]
    style V1 fill:#0e7490,color:#fff
    style V3 fill:#15803d,color:#fff
    style V4 fill:#ff6b1a,color:#0a0a0f
    style V5 fill:#a855f7,color:#fff

Storage choices

Data	Store	Why
Relation tuples	Globally sharded relational (Spanner, CockroachDB, or Postgres + Vitess)	Needs ACID writes, consistent secondary index, multi-region reads
Watch cache	In-process memory per pod	Sub-millisecond lookup; invalidated by change log
Leopard index	Separate store (Redis / in-memory)	Read-heavy, point lookups; write-amplified on group changes
Reverse index	Separate materialized store	Fan-out access pattern distinct from point-check path
Change log	Kafka or Spanner change streams	Durable, ordered, fanout to all subscribed pods
Namespace configs (userset rewrites)	Config store (git-backed, propagated via service mesh)	Rarely changes; must be consistent across pods

Failure modes

Stale permission after revocation

The cache has not yet received the invalidation after a tuple is deleted. The watch cache with push invalidation bounds the stale window to the change log lag, typically 1 to 5 seconds. For security-critical resources, the caller passes a zookie and the check service enforces a fresh read if the cache snapshot is older than the zookie. As a safety backstop, entries that have not been refreshed within a short fixed window (a common design choice is 30–60 seconds) should be evicted regardless — the Zanzibar paper does not specify a hard TTL, but any production deployment needs one.

New-enemy problem

A check evaluated at a snapshot that does not include a recent revocation. The fix is zookies — the application must pass the zookie from the revocation write back to the check call. The auth service contract guarantees freshness for any check that carries a zookie.

Hot object

Millions of checks arrive against a single high-profile resource — a shared company-wide policy document, say. The tuple for that resource is cached in every pod after the first miss, so it inherently becomes a broadcast cache hit. If the policy is immutable (which it usually is), cache invalidation is rare. For write-heavy hot objects, writes go to one shard primary and fan out via the change log, keeping the read path fast regardless.

Deep group nesting

A check requires expanding 20 levels of group membership before resolving. Set a maximum expansion depth limit (10 is reasonable). Deeper chains are rejected at policy authoring time. Leopard pre-materializes common deep chains, so recursive expansion is only needed for changes not yet reflected in Leopard.

Cache invalidation race

Two writes arrive in quick succession; a pod serves a check between the two change log events. Because the change log is ordered — Kafka partitioned by object key — all mutations to the same tuple key arrive in order at every subscriber. No reordering is possible within a key.

Auth service pod outage

A pod crashes and loses its watch cache. On startup, the pod reads recent change log history (the last N minutes) to rebuild its cache before accepting traffic. If the rebuild window is too long, the pod falls back to serving checks directly from storage — higher latency, but correct. The load balancer routes around unhealthy pods within seconds.

Namespace configuration (policy schema)

Unlike SQL schemas, relation tuple namespaces are defined in configuration, not DDL. A namespace defines the resource type, its valid relations, and the userset rewrite rules:

# doc namespace
name: doc
relations:
  owner:
    this: {}        # direct owners only

  editor:
    union:
      - this: {}    # direct editors
      - computed_userset:
          relation: owner  # owners are also editors

  viewer:
    union:
      - this: {}    # direct viewers
      - computed_userset:
          relation: editor  # editors are also viewers
      - tuple_to_userset:
          tupleset:
            relation: parent         # follow 'parent' relation
          computed_userset:
            relation: viewer         # and check viewer on the parent

This expresses: "if you can view the parent folder, you can view the child document." The config is version-controlled, peer-reviewed, and rolled out like code. A mis-configured rewrite that accidentally grants broad access is a production security incident.

API design

POST /v1/check
{
  "subject": { "type": "user", "id": "alice" },
  "relation": "viewer",
  "object":   { "type": "doc", "id": "readme" },
  "consistency": { "at_least_as_fresh": "CAESBAgBEAE=" }  // zookie
}

→ 200 OK
{ "allowed": true, "checked_at": "CAESBAgBEAI=" }  // new zookie

POST /v1/write
{
  "writes":  [{ "object": "doc:readme", "relation": "viewer", "subject": "user:alice" }],
  "deletes": [{ "object": "doc:readme", "relation": "viewer", "subject": "group:eng#member" }]
}

→ 200 OK
{ "zookie": "CAESBAgBEAE=" }

POST /v1/lookup_resources
{
  "subject":          { "type": "user", "id": "alice" },
  "relation":         "viewer",
  "resource_type":    "doc"
}

→ 200 OK
{ "resources": ["doc:readme", "doc:design", ...], "continuation_token": "..." }

The check endpoint is synchronous and must return in under 10ms at p95. write waits only for the durable commit; change propagation is async. lookup_resources is paginated with a relaxed latency SLA (seconds). Callers should never put it in the hot path.

RBAC vs ABAC vs ReBAC — when to use which

Criterion	RBAC	ABAC	ReBAC
Per-instance permissions	Needs workaround	Natively supported	Natively supported
Group nesting / hierarchy	Limited	Via attribute logic	Native
Policy explainability	Easy to audit	Hard (rule evaluation trace)	Graph traversal trace
Policy authoring complexity	Low	High	Medium
Runtime check cost	O(user roles)	O(policy rules evaluated)	O(graph depth × branching)
Best fit	Enterprise SaaS, B2B tenant roles	Healthcare, government compliance	SaaS with shared resources, files, social graphs

A common production pattern: ReBAC as the core model, with ABAC conditions evaluated as a post-filter after the ReBAC check passes. The ReBAC check confirms Alice is a viewer of the document. An ABAC rule confirms the document is not classified above Alice's clearance level. Both must pass.

Global replication and read locality

Authorization servers must be co-located with the application servers they serve. A cross-region network round-trip from us-east to eu-west adds 80–100ms — that alone blows the p95 target.

flowchart LR
    US[US-EAST<br/>App servers] --> USAUTH[US-EAST<br/>Auth cluster]
    EU[EU-WEST<br/>App servers] --> EUAUTH[EU-WEST<br/>Auth cluster]
    AP[AP-SOUTHEAST<br/>App servers] --> APAUTH[AP-SOUTHEAST<br/>Auth cluster]

    USAUTH --> USREP[(US tuple replica)]
    EUAUTH --> EUREP[(EU tuple replica)]
    APAUTH --> APREP[(AP tuple replica)]

    USREP <-->|"async replication<br/>~50–200ms lag"| EUREP
    EUREP <-->|async| APREP

    style USAUTH fill:#ff6b1a,color:#0a0a0f
    style EUAUTH fill:#ff6b1a,color:#0a0a0f
    style APAUTH fill:#ff6b1a,color:#0a0a0f

The replication lag means a write in US-EAST may take 50–200ms to be visible in EU-WEST. For most authorization use cases this is acceptable — a new share notification typically takes seconds to process anyway. For security-critical revocations, the application should wait for cross-region replication to confirm before treating the revocation as globally effective, or use a synchronous multi-region commit (available in systems like Spanner, but expensive).

Things to discuss in an interview

Why ReBAC over flat ACLs: groups, inheritance, and RBAC are all special cases of the tuple graph. One model to rule them all.
The new-enemy problem: most candidates have not thought about this. Mention it proactively. Explain zookies as causal tokens, not vector clocks.
Caching is the system: without the watch cache, you can't hit the latency SLA. The question is how you keep it consistent — TTL alone is not enough for revocations.
Listing vs checking are fundamentally different: checking is a point lookup; listing is a fan-out. They need different indexes and have different SLAs.
Namespace config is a security boundary: a mis-configured userset rewrite is a privilege escalation bug. Treat it as code, with review and rollout gates.
Where Leopard matters: a group with 10M members cannot be expanded recursively in milliseconds. Pre-materialized membership is the answer, with the trade-off of write amplification.

See also: leader election and consensus for how the globally-sharded tuple store maintains linearizable writes across regions.

Things you should now be able to answer

What is a relation tuple and how does it express RBAC, group membership, and folder inheritance uniformly?
What is the new-enemy problem and why does a TTL-based cache not solve it?
What is a zookie and what guarantee does it provide?
Why is the Leopard index necessary and what trade-off does it introduce?
Why are lookup_resources and check served differently?
What is a userset rewrite and where is it defined?
Why must the authorization service be regionally co-located with the application?

Frequently asked questions

▸What is a relation tuple and how does it represent permissions?

A relation tuple is a triple of the form (object, relation, subject) — for example, doc:readme#viewer@group:eng#member, meaning every member of group:eng is a viewer of doc:readme. The subject can be a direct user or a userset (another group relation), which lets a single tuple model group membership, role inheritance, and folder hierarchy uniformly without any schema change per new resource type.

▸What is the new-enemy problem in authorization systems?

The new-enemy problem occurs when a check is evaluated on a stale cache snapshot that does not include a recent revocation. For example, if Alice is removed from a group and an auth pod has not yet received that invalidation, its cached check still passes and Alice reads a document she no longer has access to. Zanzibar closes this with zookies: the application passes the opaque token returned by the revocation write to the subsequent check call, forcing the auth service to evaluate on a DB snapshot at least as fresh as that write.

▸When should I use ReBAC over RBAC or ABAC?

Use ReBAC when you need per-resource, per-instance permissions or deep group nesting — the canonical cases are SaaS products with shared files, social graphs, or org hierarchies. RBAC is sufficient for flat enterprise role assignments (B2B tenant roles) but breaks down at per-document granularity. ABAC fits healthcare or government compliance where attribute conditions are unavoidable, and in practice it is layered as a post-filter on top of a ReBAC check rather than used alone.

▸What is the Leopard index and what trade-off does it introduce?

Leopard is a pre-materialized index of flattened group membership: for every (group, relation) pair it stores the complete set of transitively included users, so a check reduces to an O(1) set lookup instead of a recursive graph traversal over millions of edges. The cost is write amplification — adding one user to a large group requires updating Leopard entries for every object that group has access to, so those updates must be applied asynchronously, and the system falls back to recursive expansion during the window before the index is updated.

▸Why does listing resources require a separate reverse index rather than reusing the check path?

The check API is a point lookup indexed by (object, relation) and returns whether a specific subject has access. Listing queries like lookup_resources fan out in the opposite direction — given a subject, find all matching objects — which the forward index cannot answer efficiently. A separate reverse index keyed by (subject, relation, type) serves this access pattern, but it carries a higher write cost and an explicitly relaxed latency SLA of 100ms to 1s, making it unsuitable for the hot check path.

← previous

Design a Social Graph Service (Facebook's TAO)

Design an A/B Testing & Experimentation Platform

// RELATED

Frequently asked questions

You may also like

Design an LLM Observability Platform

Design an LLM Gateway (AI Gateway & Model Router)

Design an LLM Fine-Tuning Platform