~/articles/design-google-maps

◆◆◆Advancedasked at Googleasked at Uberasked at Lyftasked at Apple

Design Google Maps (routing & navigation)

Q: What is contraction hierarchies and why does Google Maps use it instead of Dijkstra?

Contraction hierarchies is a preprocessing technique introduced by Geisberger et al. in 2008 that iteratively removes low-importance nodes from the road graph and adds shortcut edges, leaving only the highway-level backbone. A bidirectional query then climbs upward through this hierarchy from both source and destination, settling a few hundred nodes per direction instead of tens of millions. On a 250-million-edge continental graph, plain Dijkstra takes several seconds per query; CH resolves the same query in sub-millisecond to low-millisecond time.

Q: How does live traffic get incorporated without re-running the full CH preprocessing?

Each road segment carries a current speed ratio computed from phone GPS pings, and the routing service multiplies the base edge weight by that ratio at query time: current_weight = base_weight x (base_speed / current_speed). The CH shortcuts themselves are only periodically re-preprocessed for heavily affected regions; for permanent closures, the segment weight is set to effectively infinite via a Kafka event consumed by routing servers, with no full rebuild required.

Q: What is the traffic telemetry scale, and how is it processed?

100 million active navigation sessions each emit a GPS ping every 5 seconds, producing roughly 20 million events per second and about 4 GB per second of raw ingestion. An ingest service map-matches each ping to a road segment and publishes it to Kafka; a streaming aggregator (Flink or Spark Streaming) maintains a rolling window of observed speeds and outputs a per-segment estimate every 30 to 60 seconds to a Redis traffic state store holding roughly 2 GB of state across 250 million segments.

Q: Why is an ML model used for ETA instead of just summing current edge weights?

Summing current edge weights underestimates congestion propagation and misses predictable patterns. The ML model (gradient-boosted trees or a neural network) takes per-segment features including current speed ratio, hour of day, day of week, historical p50/p90 for that time slot, and recent incident flags, then outputs a predicted travel time per segment. The model is retrained daily on historical predicted-vs-actual arrival observations and recalculated every 30 seconds or on deviation during active navigation.

Q: How many map tiles exist at zoom level 14, and how does the CDN handle serving them?

At zoom level 14 there are approximately 268 million tiles total, of which roughly 78 million cover the Earth's land area. Tiles are pre-rendered offline from road and geometry data and stored in object storage; all GET requests go through a CDN with a TTL of days to weeks. The CDN absorbs over 95 percent of tile reads, and a tile miss falls through to the tile service backed by object storage.

Find the fastest route across a continent-scale road graph in milliseconds, with live traffic and ETAs. Contraction hierarchies, live traffic weights, and tile serving.

22 min read2026-03-04Ironclad Academy

#interview #geo #graphs #routing

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The problem

Google Maps, Apple Maps, and Waze answer a deceptively simple question: what is the fastest way to drive from point A to point B right now? Open the app on a Tuesday morning in Los Angeles and in under a second you get a turn-by-turn route that accounts for a fender-bender on the 405, a lane closure near downtown, and the surge of school-run traffic — across a road network spanning hundreds of millions of edges.

The scale is staggering. Roughly one billion route requests hit Google's infrastructure every day. The underlying road graph for North America and Europe combined holds around 100 million nodes (intersections) and 250 million directed edges (road segments). Running Dijkstra cold on a graph that size for a cross-continent query would take several seconds per request — orders of magnitude over the 100 ms latency target. So the first engineering tension is query speed vs. graph size: the graph is too large for any online shortest-path algorithm, which means all the heavy lifting must happen in a preprocessing phase.

The second tension is freshness vs. stability. The preprocessed graph is built offline on a slow schedule; live traffic data flows in at 20 million events per second from phones actively navigating. Those two realities have to be reconciled so that the precomputed structure stays valid while traffic weights change continuously beneath it. Solve both tensions and you get a system that returns an accurate, traffic-aware route across a continent in under 100 ms.

The entire design is built around one insight: precompute as much as possible so that the online query does as little work as possible. Everything else — contraction hierarchies, the telemetry pipeline, the tile CDN — follows from that.

Functional requirements

GET /route?origin=...&destination=...&mode=driving → turn-by-turn directions + ETA.
GET /eta?origin=...&destination=... → ETA in seconds (lighter, faster).
Real-time re-routing when the user deviates from the suggested path.
Map tile rendering at configurable zoom levels.
(Optional) Geocoding: search by address or place name → lat/lng.
(Optional) Multiple route alternatives ranked by time/distance.
(Optional) Route modes: driving, walking, cycling, transit.

Non-functional requirements

Low latency: p99 route query < 100 ms; tile load < 200 ms.
High availability: 99.99% — navigation failure in a moving car is dangerous.
Graph freshness: road changes (new roads, closures) reflected within hours.
Traffic freshness: segment weights updated within 30–60 seconds.
Scale: tens of thousands of route queries per second globally.

Capacity estimation

Dimension	Estimate	How we got there
Route requests (avg)	~11,600 req/sec	`1 B/day ÷ 86,400 s`
Route requests (peak)	~35,000 req/sec	3× average
Node data	~10 GB	`~100 M nodes × ~100 B each`
Edge data	~12.5 GB	`~250 M edges × ~50 B each`
CH shortcuts	~12–25 GB	Augmentation factor ~2–3× on road networks (1–2× original edge count)
Graph total per region shard	~35–48 GB	Node + edge + shortcuts → fits in RAM on a large server
Traffic telemetry events	~20 M events/sec	`100 M active sessions × 1 ping/5 s`
Traffic raw ingestion	~4 GB/sec	`20 M events/sec × ~200 B each`
Traffic aggregated state	~2 GB	`~250 M segments × 8 B speed`
Map tiles (zoom 0–14)	~358 M tiles	`(4^15 − 1) / 3 ≈ 358 M`
Map tiles (zoom 15–20)	~10 B tiles total (sparser land coverage)	Much lower density outside populated areas
Hot tiles (zoom 10–15, populated)	~tens of millions	Fits in CDN edge cache with a warm-up period

Takeaway: The graph for a continent fits comfortably in the RAM of a large server (64–128 GB RAM is sufficient; 256 GB servers are standard in this tier), which is why the routing service keeps the entire graph in memory.

The road graph data model

The fundamental abstraction is a weighted directed graph:

Node: an intersection or road endpoint, stored with (lat, lng) and a unique integer ID.
Edge: a directed road segment from node u to node v, with a weight representing expected travel time in seconds (not distance — we optimize for time). Edge metadata includes: speed limit, road class (highway / arterial / local), number of lanes, turn restrictions, one-way flag.

G = (V, E, w)
V = {intersection IDs}
E = {(u, v) : there is a directed road segment u → v}
w(u, v) = travel_time_seconds(u → v)  [dynamic, updated with traffic]

Road data is sourced from a combination of official government datasets, commercial providers, and community-edited sources such as OpenStreetMap. The graph is rebuilt (re-preprocessed) on a schedule — typically nightly for major structural changes — and traffic weights are patched continuously without re-preprocessing.

Why plain Dijkstra is too slow

Dijkstra's algorithm on a graph with 250 M edges runs in O(E log V) time. A continental-scale query (SF → NYC, ~4 500 km) would need to relax tens of millions of edges before the destination is settled. Even with a fast priority queue, this takes several seconds — far too slow.

A* with a geographic heuristic helps somewhat. Using Euclidean or great-circle distance as the heuristic steers the search toward the destination, but on real road networks the heuristic is loose (roads don't go in straight lines), so A* still explores a large fraction of the graph on long queries.

The solution is preprocessing.

Contraction hierarchies (CH): the key algorithmic insight

Contraction hierarchies is a preprocessing technique that enables sub-millisecond to low-millisecond shortest-path queries on continental road graphs. It was introduced by Geisberger et al. in 2008 (building on earlier highway hierarchies work by Sanders and Schultes, 2005) and has become the dominant algorithm for road routing in production systems.

Preprocessing phase

The idea is to iteratively remove ("contract") nodes in order of importance, adding shortcut edges to preserve shortest-path distances. Think of it like collapsing a road network: local residential streets get contracted first, then arterials, until only the highway backbone remains.

Concretely:

Assign an importance score to every node. Low-importance nodes are contracted first. Importance heuristics typically combine: edge difference (how many new shortcut edges contracting this node adds), number of settled neighbors, and road class of incident edges.
To contract node u: for each pair of neighbors (v, w) where the shortest path from v to w goes through u, add a shortcut edge (v, w) with weight w(v, u) + w(u, w). The shortcut means we can skip u entirely.
After contraction, assign each node a level (or rank) equal to the order in which it was contracted. Uncontracted nodes have higher rank; the highway-level backbone survives as the highest-rank nodes.
The result: a hierarchical graph where long-distance queries only touch high-rank nodes (the highway skeleton), and short local queries only touch low-rank nodes.

flowchart TD
    subgraph "Before contraction"
        A1[A] -- 3 --> B1[B]
        B1 -- 5 --> C1[C]
    end
    subgraph "After contracting B"
        A2[A] -- "8 (shortcut A→C)" --> C2[C]
    end
    style A1 fill:#0e7490,color:#fff
    style B1 fill:#ff6b1a,color:#0a0a0f
    style C1 fill:#0e7490,color:#fff
    style A2 fill:#0e7490,color:#fff
    style C2 fill:#0e7490,color:#fff

Query phase

Once preprocessed, a query runs bidirectional search: one search expands upward through the hierarchy from the source; another expands upward from the destination. The two searches meet at or near the top of the hierarchy — the highway layer that both searches reach quickly.

forward_search(source):  only relax edges going to higher-rank nodes
backward_search(dest):   only relax edges going to higher-rank nodes (in reverse)
answer = min over all meeting nodes m: d_fwd(m) + d_bwd(m)

The following diagram shows how both searches "climb" toward the high-rank highway layer and meet in the middle, instead of fanning out across the entire graph:

flowchart LR
    SRC[Source node<br/>rank: low] -->|"forward: climb ranks"| H1[Highway A<br/>rank: high]
    DST[Dest node<br/>rank: low] -->|"backward: climb ranks"| H1
    H1 -->|"meeting point → unpack path"| PATH[Full route polyline]
    SRC --> H2[Highway B<br/>rank: high]
    DST --> H2
    H2 --> PATH
    style SRC fill:#0e7490,color:#fff
    style DST fill:#0e7490,color:#fff
    style H1 fill:#ff6b1a,color:#0a0a0f
    style H2 fill:#ff6b1a,color:#0a0a0f
    style PATH fill:#15803d,color:#fff

Because both searches only move "up" the hierarchy, they settle quickly — typically a few hundred nodes per direction even for a cross-continent query, versus tens of millions for plain Dijkstra. Academic benchmarks on European road graphs (~18 M nodes) report query times well under 1 ms (often 100–500 µs on the hardware of the era, down to tens of microseconds on modern hardware); on larger continental graphs (~100 M nodes) queries typically resolve in the low-millisecond range or better in practice.

The precomputed shortcuts encode the implicit intermediate path; when the user wants turn-by-turn directions, the routing engine unpacks the shortcut chain to recover the full sequence of road segments.

Live traffic: dynamic edge weights

The CH preprocessing assumes fixed edge weights. Traffic breaks this assumption. The solution is a careful separation: the static CH shortcuts are computed on base weights (free-flow or historical average travel time) and rarely change. Traffic correction factors are computed per segment and applied on top at query time.

Scale factor approach

For each road segment (u, v), maintain a current speed ratio:

current_weight(u, v) = base_weight(u, v) × (base_speed / current_speed)

When the live speed on segment (u, v) is half the free-flow speed, its weight doubles. This factor is applied when relaxing edges during the bidirectional query.

The CH shortcuts must be invalidated or reweighted when the underlying segment weights change significantly. In practice, shortcuts are recomputed periodically (e.g., every few hours) for the most heavily affected parts of the graph, or a traffic-aware variant of CH (sometimes called time-dependent CH) is used where shortcuts store weight functions over time instead of scalars.

Traffic telemetry pipeline

Every phone running navigation is a sensor. Each device emits a GPS ping every few seconds containing timestamp, lat/lng, speed, heading, and trip ID. That's 100 M active sessions pinging every ~5 seconds — about 20 M events per second flowing into the system.

flowchart LR
    PHONES[Navigation apps<br/>on phones] -->|GPS pings| INGEST[Ingest Service]
    INGEST --> KAFKA[Kafka topic<br/>raw pings]
    KAFKA --> AGG[Streaming Aggregator<br/>Flink / Spark Streaming]
    AGG -->|"speed per segment, per 30s"| TRAF[(Traffic State Store<br/>Redis / in-memory)]
    TRAF --> ROUTER[Routing Service]
    style PHONES fill:#0e7490,color:#fff
    style KAFKA fill:#a855f7,color:#fff
    style AGG fill:#ff6b1a,color:#0a0a0f
    style TRAF fill:#15803d,color:#fff

The ingest service map-matches each raw GPS coordinate to the nearest road segment (using a hidden Markov model or similar), then emits a (segment_id, observed_speed) event to Kafka. The streaming aggregator consumes these events and maintains a rolling window (e.g., last 3–5 minutes) of observed speeds per segment, outputting a per-segment speed estimate every 30–60 seconds to the traffic state store.

The routing service reads these weights from the traffic cache on every query — so every route computed reflects traffic from the last ~30–60 seconds.

Scale check: 20 M telemetry events/sec × ~200 B ≈ 4 GB/s ingested (at the 1 ping/5 s rate; 2 GB/s at 1 ping/10 s). This is well within the capacity of a Kafka cluster with adequate partition count (~200–500 partitions). The aggregated output is ~250 M segments × 8 B = ~2 GB of state, easily kept in a Redis cluster or in-process memory on the routing servers.

ETA prediction

A naive ETA is just the sum of current edge weights along the route. Production systems do better by layering multiple signals into an ML model:

Signal	What it captures
Current edge weights (traffic)	Real-time congestion
Historical speed by hour + day	Predictable patterns (rush hour, weekend)
Incident data (accidents, closures)	Step-function slowdowns
Weather	Rain/snow increase travel times
ML model per segment	Non-linear interactions between the above

A trained model (gradient-boosted trees or a neural network) takes a feature vector per segment — current speed ratio, hour of day, day of week, historical p50/p90 for that slot, recent incident flag — and outputs a predicted travel time. These per-segment predictions are summed to produce the route ETA.

The ML model is retrained offline (e.g., daily) on historical observations of "predicted ETA vs. actual arrival time." During navigation, ETA is recalculated at each re-routing check (every ~30 seconds or whenever the user deviates).

Building up to the design

V1: A* on the live graph, single server

Run A* with a great-circle heuristic. Works for a city. Continental routes take 5–15 seconds. Every query touches RAM-resident edge data; no preprocessing.

The problem is latency, 10–100× too slow for long routes, and a single server is a SPOF.

V2: Partition the graph by region

Divide the continent into cells (e.g., 100 km × 100 km). Precompute shortest paths between cell boundary nodes. A query routes within the source cell, crosses between cells via precomputed boundary paths, then routes within the destination cell. Inter-cell queries are much faster — but path quality degrades near cell boundaries, and partitioning artifacts produce suboptimal routes that are hard to maintain as the graph changes.

V3: Contraction hierarchies, static weights

Run full CH preprocessing on the entire graph. Queries drop to sub-millisecond to low single-digit ms. Preprocess nightly; reload without downtime using a blue/green deploy of the routing servers.

The remaining gap: routes ignore traffic. ETA is inaccurate during rush hour or incidents.

V4: CH + live traffic weights

Add the telemetry pipeline. Patch traffic weights on top of CH shortcuts at query time. ETA becomes accurate; re-routing happens in milliseconds. What's still missing is the full product — tile serving and geocoding.

V5: Production maps system

Add tile serving (pre-rendered, CDN), geocoding service, and the full ML ETA model. This is the design described in the rest of this article.

flowchart LR
    V1["V1: A* on live graph<br/>city-scale, seconds"] --> V2["V2: + region partitioning<br/>faster but lower quality"]
    V2 --> V3["V3: + CH preprocessing<br/>sub-ms to low-ms queries"]
    V3 --> V4["V4: + live traffic<br/>accurate ETA"]
    V4 --> V5["V5: + tiles + geocoding<br/>full maps product"]
    style V1 fill:#0e7490,color:#fff
    style V3 fill:#15803d,color:#fff
    style V4 fill:#ff6b1a,color:#0a0a0f
    style V5 fill:#a855f7,color:#fff

Full architecture

flowchart TD
    CLIENT[Mobile / Web Client] --> CDN[CDN edge]
    CLIENT --> GW[API Gateway / Load Balancer]

    GW --> GEO_SVC[Geocoding Service]
    GW --> ROUTE_SVC[Routing Service]
    GW --> ETA_SVC[ETA Service]
    GW --> NAV_SVC[Navigation / Re-route Service]

    ROUTE_SVC --> CH_GRAPH[(CH Graph<br/>in-process RAM)]
    ROUTE_SVC --> TRAF_CACHE[(Traffic Cache<br/>Redis)]
    ROUTE_SVC --> ML_ETA[ML ETA Model]

    ETA_SVC --> TRAF_CACHE
    ETA_SVC --> ML_ETA

    NAV_SVC --> ROUTE_SVC

    CDN -->|tile miss| TILE_SVC[Tile Service]
    TILE_SVC --> OBJ_STORE[(Object Store<br/>S3 / GCS)]

    PHONES[GPS pings from phones] --> INGEST_SVC[Ingest Service]
    INGEST_SVC --> KAFKA[Kafka]
    KAFKA --> AGG[Traffic Aggregator<br/>Flink]
    AGG --> TRAF_CACHE

    GEO_SVC --> GEO_IDX[(Geocoding Index<br/>Elasticsearch / custom)]

    style CDN fill:#ff6b1a,color:#0a0a0f
    style ROUTE_SVC fill:#15803d,color:#fff
    style CH_GRAPH fill:#0e7490,color:#fff
    style KAFKA fill:#a855f7,color:#fff
    style TRAF_CACHE fill:#ffaa00,color:#0a0a0f
    style ML_ETA fill:#ff2e88,color:#fff

The routing service in detail

Graph storage

The CH graph is kept entirely in process memory on the routing server. For a continental graph (~35–48 GB after preprocessing), routing servers need ~64–128 GB RAM — standard hardware for server-class machines today.

The routing process holds two edge arrays:

Forward edges (for the forward search from source).
Backward edges (for the backward search from destination).

Both are stored as adjacency arrays sorted by source node ID, so looking up all outgoing edges of node u is a simple range scan.

A route request, step by step

sequenceDiagram
    participant Client
    participant GW as API Gateway
    participant Geo as Geocoding Service
    participant Router as Routing Service
    participant TC as Traffic Cache
    participant ML as ML ETA Model

    Client->>GW: GET /route?origin="123 Main St"&dest="JFK Airport"
    GW->>Geo: resolve origin address → (lat, lng)
    GW->>Geo: resolve dest address → (lat, lng)
    Geo-->>GW: origin_node_id, dest_node_id
    GW->>Router: route(origin_node, dest_node)
    Router->>TC: fetch current speeds for relevant segments
    TC-->>Router: speed ratios
    Router->>Router: bidirectional CH query (sub-ms to ~5 ms)
    Router->>ML: predict ETA for each segment on path
    ML-->>Router: ETA per segment
    Router-->>GW: route polyline + total ETA
    GW-->>Client: JSON response

Notice that the geocoding step resolves addresses to graph node IDs before the routing query even starts — geocoding and routing are cleanly separated. The total latency budget:

Step	Typical ms
Geocoding (cached)	5–20
Traffic cache read	2–5
Bidirectional CH query	0.1–5
ETA model inference	5–20
Serialize + network	5–10
Total p50	~25–60 ms

Re-routing

When a driver deviates from the suggested route, the navigation service detects the deviation (current position no longer matches the expected segment) and triggers a new route query from the current position. Because CH queries are so fast, re-routing is imperceptible to the user.

Re-routing also happens proactively: the navigation service periodically checks whether a different route would now be significantly faster given current traffic, and presents the alternative if so.

Map tile serving

The visible map is composed of tiles — small square image (raster) or geometry (vector) chunks served independently and stitched by the client.

Zoom levels and quadtrees

The world is divided using a quadtree scheme: at zoom level 0, the entire world is one tile (256×256 pixels). Each zoom increment divides each tile into 4 sub-tiles. At zoom level z, there are 4^z tiles total.

Zoom 0:  1 tile   (world overview)
Zoom 10: ~1 M tiles (city level)
Zoom 14: ~268 M tiles (street level)
Zoom 18: ~68 B tiles (building level)

At zoom 18, a 256×256-pixel tile covers roughly 153 m × 153 m on the ground at the equator (0.6 m/pixel resolution) — enough to distinguish individual buildings and lanes. In practice, tiles beyond zoom 16 are only rendered on demand over populated areas.

Tile coordinates are expressed as (zoom, x, y) — the column and row within the grid at that zoom level. This triple uniquely identifies every tile in the world. The quadtree structure means zooming in on a tile simply queries the four children at the next level:

flowchart TD
    Z0["Zoom 0: world (1 tile)"] --> Z1A["Zoom 1: NW"]
    Z0 --> Z1B["Zoom 1: NE"]
    Z0 --> Z1C["Zoom 1: SW"]
    Z0 --> Z1D["Zoom 1: SE"]
    Z1A --> Z2A["Zoom 2: NW·NW"]
    Z1A --> Z2B["Zoom 2: NW·NE"]
    Z1A --> Z2C["Zoom 2: NW·SW"]
    Z1A --> Z2D["Zoom 2: NW·SE"]
    style Z0 fill:#ff6b1a,color:#0a0a0f
    style Z1A fill:#0e7490,color:#fff
    style Z1B fill:#0e7490,color:#fff
    style Z1C fill:#0e7490,color:#fff
    style Z1D fill:#0e7490,color:#fff
    style Z2A fill:#15803d,color:#fff
    style Z2B fill:#15803d,color:#fff
    style Z2C fill:#15803d,color:#fff
    style Z2D fill:#15803d,color:#fff

Raster vs. vector tiles

Property	Raster tiles (PNG/JPEG)	Vector tiles (PBF/MVT)
Content	Pre-rendered pixels	Raw geometry + labels
Tile size	~20–100 KB per tile	~5–50 KB per tile
Client rendering	Trivial (draw image)	Requires a WebGL/Metal renderer
Style changes	Requires re-rendering all tiles	Client re-styles without new tiles
Retina / scaling	Must regenerate at 2× resolution	Client scales geometry losslessly
Bandwidth	Higher	Lower

Modern mapping clients (web and mobile) increasingly use vector tiles because they allow smooth zoom animations, night/day mode, and custom styles without re-downloading tiles. The server stores pre-generated vector tiles in object storage.

Tile generation and caching

Tiles are pre-generated offline from the road and map geometry data:

Map data pipeline: raw OSM or commercial data → geometry normalization → feature extraction (roads, buildings, water, parks).
Tile renderer: for each zoom level, render (or serialize geometry into) each tile covering populated areas. Store to object storage.
CDN: all tile GETs go through the CDN. A tile not in CDN cache is fetched from the tile service, which reads from object storage and responds. The CDN caches the tile with a long TTL (days to weeks — tiles change only when the underlying data changes).

The vast majority of tile reads are CDN hits. Tile cache-miss rate is low because tiles at the popular zoom levels over populated areas are warm in the CDN. Updating a tile (because a new road was built) requires invalidating the CDN cache entry for that tile's (z, x, y) key.

Geocoding: the hidden complexity

Geocoding is a separate service that sits upstream of routing — routing only understands node IDs, not street names.

Forward geocoding: "1600 Pennsylvania Ave NW, Washington DC" → (38.8977, -77.0366).
Reverse geocoding: (37.7749, -122.4194) → "San Francisco, CA".

The geocoding service maintains an inverted index over address tokens and place names. Queries are fuzzy (typos, abbreviations, partial names). In practice, systems like this use a combination of an Elasticsearch or custom inverted index for text search, plus a ranking model that scores candidates by query match quality, population, and proximity to the user.

Geocoding resolves to a node ID in the road graph, which is then passed to the routing service.

Failure modes

Stale traffic data

If the Kafka pipeline or aggregator falls behind, the routing service may use outdated speed estimates. The fix is layered: set a TTL on each segment's traffic entry in the cache (e.g., 5 minutes) and fall back to the historical average for that hour if it expires. Monitor Kafka consumer lag; alert if lag exceeds 2 minutes. The fallback degrades ETA quality but keeps routing functional.

Graph update propagation

When a road closes (accident, construction), the update must reach all routing servers quickly. An operations team or automated incident feed publishes a "segment closure" event to Kafka; routing servers consume this event and mark the edge weight as effectively infinite (blocked) at query time — no full re-preprocessing needed. When the closure is lifted, the override is removed. For permanent changes (new roads), a full CH re-preprocessing is scheduled and the new graph binary is deployed with a blue/green swap.

Very long routes

Routes spanning multiple continents (e.g., through ferry connections or international roads) may touch graph regions stored on separate region shards. A routing coordinator decomposes the query into legs (one per region shard), executes each leg's CH query in parallel, and stitches the results.

Region failover

Routing servers are replicated across availability zones. The CH graph is read-only at query time (all writes are offline preprocessing), so any replica can serve any query. The load balancer health-checks each routing server and routes around failures.

Storage choices

Data	Store	Rationale
CH graph (static)	Mmap'd binary file, in-process RAM	Fast random access; read-only after load
Traffic weights (live)	Redis cluster	Sub-millisecond reads; ~2 GB total state
Traffic raw events	Kafka	Durable stream; replay on aggregator restart
Map tiles	Object storage (S3/GCS) + CDN	Immutable after generation; CDN absorbs reads
Geocoding index	Elasticsearch / custom inverted index	Fuzzy text search; horizontal scaling
Road geometry (raw)	Postgres / custom spatial DB	Source of truth for preprocessing pipeline
ML ETA model weights	Object storage, loaded into routing server RAM	Model updated daily; versioned deploys

Connection to ride-hailing

The ETA service described here is exactly what the Uber/Lyft dispatch system needs to rank driver candidates. The difference is scale and freshness: ride-hailing needs ETAs for thousands of (driver, rider) pairs per second in a single metro area; Google Maps needs route computation globally but can tolerate the route being computed once and cached for the navigation session.

In both cases, the routing backbone is a contraction hierarchy on a road graph with live traffic weights. Ride-hailing systems often run a lighter-weight, less precise routing engine (direct-line ETA estimates with traffic correction factors) for the candidate-ranking phase, reserving the full CH query for the confirmed route.

Things to discuss in an interview

Why CH over plain Dijkstra/A*: preprocessing reduces query work from O(E log V) on a 250 M-edge graph to settling a few hundred nodes per direction. The preprocessing-vs-query trade-off is the crux.
Traffic weight updates without full re-preprocessing: scale factors patched at query time; periodic partial re-preprocessing for heavily affected regions.
ETA quality: naive sum of current edge weights underestimates congestion propagation; ML models trained on historical arrival times outperform static estimates.
Tile caching economics: pre-render offline, serve via CDN; only popular tiles ever leave the CDN. Invalidation on road changes is the hard part.
Re-routing latency: because CH queries are so fast (~ms), re-routing is imperceptible; no need for incremental re-routing algorithms.
Graph sharding: split by geography; a routing coordinator stitches cross-region queries.

Things you should now be able to answer

Why does Dijkstra fail at continental scale, and what does contraction hierarchies do differently?
How are traffic weights incorporated into a preprocessed graph without full re-preprocessing?
What is a map tile, and why are tiles served from CDN rather than generated on demand?
How does the routing service handle road closures in near-real-time?
Why is ETA from an ML model better than simply summing current edge weights?
How does geocoding connect to the routing graph?

Frequently asked questions

▸What is contraction hierarchies and why does Google Maps use it instead of Dijkstra?

Contraction hierarchies is a preprocessing technique introduced by Geisberger et al. in 2008 that iteratively removes low-importance nodes from the road graph and adds shortcut edges, leaving only the highway-level backbone. A bidirectional query then climbs upward through this hierarchy from both source and destination, settling a few hundred nodes per direction instead of tens of millions. On a 250-million-edge continental graph, plain Dijkstra takes several seconds per query; CH resolves the same query in sub-millisecond to low-millisecond time.

▸How does live traffic get incorporated without re-running the full CH preprocessing?

Each road segment carries a current speed ratio computed from phone GPS pings, and the routing service multiplies the base edge weight by that ratio at query time: current_weight = base_weight x (base_speed / current_speed). The CH shortcuts themselves are only periodically re-preprocessed for heavily affected regions; for permanent closures, the segment weight is set to effectively infinite via a Kafka event consumed by routing servers, with no full rebuild required.

▸What is the traffic telemetry scale, and how is it processed?

100 million active navigation sessions each emit a GPS ping every 5 seconds, producing roughly 20 million events per second and about 4 GB per second of raw ingestion. An ingest service map-matches each ping to a road segment and publishes it to Kafka; a streaming aggregator (Flink or Spark Streaming) maintains a rolling window of observed speeds and outputs a per-segment estimate every 30 to 60 seconds to a Redis traffic state store holding roughly 2 GB of state across 250 million segments.

▸Why is an ML model used for ETA instead of just summing current edge weights?

Summing current edge weights underestimates congestion propagation and misses predictable patterns. The ML model (gradient-boosted trees or a neural network) takes per-segment features including current speed ratio, hour of day, day of week, historical p50/p90 for that time slot, and recent incident flags, then outputs a predicted travel time per segment. The model is retrained daily on historical predicted-vs-actual arrival observations and recalculated every 30 seconds or on deviation during active navigation.

▸How many map tiles exist at zoom level 14, and how does the CDN handle serving them?

At zoom level 14 there are approximately 268 million tiles total, of which roughly 78 million cover the Earth's land area. Tiles are pre-rendered offline from road and geometry data and stored in object storage; all GET requests go through a CDN with a TTL of days to weeks. The CDN absorbs over 95 percent of tile reads, and a tile miss falls through to the tile service backed by object storage.

← previous

Change Data Capture (CDC) & the Outbox Pattern

Designing a Feature Flag Service

// RELATED

Frequently asked questions

You may also like

Design an LLM Observability Platform

Design an LLM Gateway (AI Gateway & Model Router)

Design an LLM Fine-Tuning Platform