How Uber Computes ETA?

May 16th, 2026

The magic of Uber doesn’t begin when the car arrives. It begins the instant the app tells you “how long” the wait will be. A tiny estimate — “2 minutes away” or “6 minutes away” — flashes onto your screen so casually that most people never think twice about it. Yet producing that single number requires a planet-scale system constantly processing live GPS streams, road traffic, driver movement, rider demand, map intelligence, and prediction models in real time. What looks like a simple countdown is actually the visible tip of one of the most advanced distributed systems ever engineered for everyday consumers.

That number is the Estimated Time of Arrival, and computing it correctly — at global scale, in real time, across millions of concurrent users — is genuinely one of the hardest problems in applied engineering.

Alt text

This post is a deep walkthrough of how a system like Uber’s ETA engine works. We will go through the GPS infrastructure, the map matching algorithms, the routing engines, the machine learning prediction pipelines, the streaming systems, the geo-spatial indexing, and the tradeoffs that engineers make every day to keep that number accurate and fast.

Introduction

Before we talk about systems, let us talk about why this problem is hard.

ETA is not just “distance divided by speed.” That formula might work on an empty highway at 3am. It does not work in Manhattan at 5pm on a Tuesday when there is a Yankees game and a water main just broke on 8th Avenue.

Real ETA computation has to account for live traffic conditions, driver behavior, road geometry, traffic signals, historical patterns for that specific road segment at that specific hour, weather, road closures, construction zones, the accuracy of GPS signals bouncing off buildings, the time it takes a driver to find parking near a pickup, and the probability that the driver will take a slightly longer route because the GPS told them to avoid a slow road that just cleared up thirty seconds ago.

At Uber’s scale, millions of trips are happening simultaneously across hundreds of cities. Each active driver sends GPS pings every few seconds. Each trip involves continuous ETA recalculations. The system has to handle all of this in under 200 milliseconds, because if the ETA on your screen is stale by more than a few seconds, the experience falls apart.

And here is the thing that most people miss: a wrong ETA does not just frustrate users. It creates a cascade of downstream problems. Riders show up at the wrong time. Drivers sit waiting. Pool pickups get miscoordinated. Pricing algorithms based on ETA drift become inaccurate. The entire marketplace depends on ETA being right.

Core Features Behind ETA Calculation

Let us ground ourselves in what the system actually has to compute before we get into the architecture.

Driver location tracking is the most fundamental input. You need to know where every active driver is, right now, with as little latency as possible. This is a continuous stream problem, not a query problem.

Rider pickup estimation is the first ETA users see. How long until the driver reaches the pickup location? This depends on the driver’s current position, the route to the pickup, live traffic, and how long the driver takes to pull over and start moving.

Trip duration estimation is the second ETA users care about. Once the trip starts, how long until they arrive? This needs to account for the entire route ahead, not just current conditions.

Real-time rerouting is what happens when conditions change mid-trip. A road closes. Traffic suddenly backs up. The system has to detect this, compute a new route, and update the ETA without the user even noticing the recalculation.

Traffic prediction fills in the gaps between what the system knows right now and what it expects will happen in the next ten, twenty, thirty minutes as the trip progresses through different parts of the city.

High-Level ETA System Architecture

Here is how the major systems talk to each other:

flowchart TD; %% Nodes A[Driver Mobile App]; B[GPS Ingestion Service]; C[Map Matching Service]; D[Traffic Aggregation Service]; E[Routing Engine]; F[ETA Prediction Service]; G[ML Inference Service]; H[Rider Mobile App]; I[Trip Orchestration Service]; J[Kafka Event Bus]; K[Redis Cache]; L[Cassandra DB]; %% Flows A –>|GPS ping every 4s| B; B –>|Raw location event| J; J –> C; J –> D; C –>|Matched road position| E; D –>|Traffic snapshot| E; E –>|Candidate routes| F; F –> G; G –>|Adjusted ETA| F; F –>|ETA result| K; I –>|Trip state| F; H –>|ETA request| I; F –>|Cached ETA| H; F –> L; %% Styles style A fill:#ffedd5,stroke:#f97316,stroke-width:2px,color:#7c2d12; style H fill:#ffedd5,stroke:#f97316,stroke-width:2px,color:#7c2d12; style B fill:#dbeafe,stroke:#2563eb,stroke-width:3px,color:#1e3a8a; style I fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e3a8a; style C fill:#cffafe,stroke:#0891b2,stroke-width:2px,color:#164e63; style D fill:#cffafe,stroke:#0891b2,stroke-width:2px,color:#164e63; style E fill:#cffafe,stroke:#0891b2,stroke-width:3px,color:#164e63; style F fill:#fee2e2,stroke:#dc2626,stroke-width:3px,color:#7f1d1d; style G fill:#fecaca,stroke:#dc2626,stroke-width:2px,color:#7f1d1d; style J fill:#ede9fe,stroke:#7c3aed,stroke-width:3px,color:#4c1d95; style K fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f; style L fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d;

The key insight in this architecture is that the ETA pipeline is event-driven, not request-driven. The system does not wait for someone to ask “what is the ETA?” before computing it. It continuously processes GPS events, traffic updates, and route recalculations in the background. When a rider’s app asks for an ETA, the answer is already sitting in cache, ready to be served in single-digit milliseconds.

This distinction matters enormously at scale. If every ETA request triggered a fresh route computation, you would need orders of magnitude more compute just to serve the read traffic.

GPS Tracking Pipeline

The GPS pipeline is the nervous system of the entire ETA operation. Without accurate, low-latency location data, everything else breaks.

Every driver’s phone runs an SDK that collects GPS readings. On a typical ride, the SDK sends a location update every four seconds. That might not sound like much, but multiply four-second pings across two million active drivers globally and you are processing roughly 500,000 GPS events per second at peak.

flowchart LR; A[Driver Phone GPS]; B[Location SDK]; C[Mobile Network]; D[GPS Ingestion Gateway]; E[Kafka Topic: raw-gps]; F[GPS Validator]; G[Outlier Filter]; H[Kafka Topic: clean-gps]; A –>|raw coordinates| B; B –>|batched pings| C; C –> D; D –> E; E –> F; F –> G; G –> H;

Raw GPS data is messy. Urban environments create what engineers call the “urban canyon” problem: tall buildings reflect and scatter GPS signals, causing the reported position to jump around by tens of meters even when a driver is sitting still. A driver traveling on an elevated highway might briefly appear to be on the surface street directly below. A driver going through a tunnel disappears entirely.

The GPS validator handles basic sanity checks: speed limits (if the reported position implies the car is moving at 300 kilometers per hour, something is wrong), coordinate bounds, timestamp ordering, and signal quality scores.

The outlier filter uses a Kalman filter or similar smoothing algorithm to maintain a best-guess position estimate and flag readings that deviate too far from the expected path. A Kalman filter essentially says: “given where this driver was and how fast they were going, where should they be now?” If the new reading is wildly different, it is weighted down or discarded.

Battery optimization is a real constraint here. Aggressive GPS polling drains the driver’s battery faster, which creates a bad driver experience and can cause drivers to turn off the app. The SDK uses adaptive polling — more frequent updates when the driver is actively navigating, less frequent when they are parked. It also leans on the phone’s motion sensors to detect when the driver has stopped moving and back off the GPS frequency accordingly.

Map Matching System

Raw GPS coordinates tell you where a device thinks it is. Map matching tells you where the driver actually is on the road network.

The difference is significant. If a driver is on a highway overpass, raw GPS might put them somewhere on the surface street below. If they are at an intersection, the GPS might float between three possible roads. Map matching takes the noisy GPS stream and snaps it to the most plausible road in the road graph.

The standard approach is the Hidden Markov Model. The idea is elegant: you have a sequence of GPS observations, and a set of candidate road positions for each observation. The HMM asks: what sequence of road positions is most likely, given both how close each candidate is to the observed GPS point and how plausible the transitions between consecutive candidates are?

The transition probability accounts for things like: is it physically possible to go from road segment A at time T to road segment B at time T+1 given the driver’s speed? Are these segments connected in the road graph? Would this transition require making an illegal turn?

flowchart TD; A[Raw GPS Sequence]; B[Candidate Road Segments]; C[HMM Emission Probability]; D[HMM Transition Probability]; E[Viterbi Algorithm]; F[Matched Road Position]; A –> B; B –> C; B –> D; C –> E; D –> E; E –> F;

Tunnels are the hardest case. The driver disappears from GPS for 30-60 seconds and reappears somewhere else. The map matcher needs to look at the entry point, the exit point, and the road graph to figure out the most likely path through the tunnel. In most cases this works well because tunnel paths are fixed. The failure mode is when a driver exits early or the tunnel has multiple exit points.

Dense urban intersections are also tricky. A driver stopped at a light on a street with four lanes in each direction might be matching to any of eight possible road segments. Getting this wrong does not just affect position accuracy — it feeds incorrect traffic speed data back into the system, which then corrupts future ETA calculations for everyone on that road.

Routing Engine Deep Dive

The routing engine is where ETA computation actually starts to take shape. Given a driver’s current matched position and a destination, the engine needs to find the fastest route in real time.

Road networks are modeled as directed weighted graphs. Every intersection is a node. Every road segment between intersections is a directed edge. The weight of each edge is not just its physical length — it is a composite cost function that includes estimated travel time based on current traffic speeds, turn penalties (left turns across traffic cost more time than right turns), road type (highway vs arterial vs residential), and historical speed data for this time of day.

Classic Dijkstra’s algorithm finds the shortest path in a graph with non-negative edge weights. It is correct, but it explores too many nodes to be practical on a road graph covering an entire city. Starting from a source and expanding outward in all directions, it wastes computation on nodes that are clearly not on the path toward the destination.

A* improves on Dijkstra by using a heuristic — usually straight-line distance to the destination — to guide the search toward the goal. This dramatically reduces the number of nodes explored and makes routing fast enough for real-time use in moderately dense road graphs.

For continental-scale routing or fast query performance over very large graphs, Uber and similar systems use Contraction Hierarchies. The idea is to preprocess the graph by “contracting” less important nodes and adding shortcut edges that represent multi-hop paths. At query time, the search runs on a much smaller graph and reconstructs the full path afterward.

flowchart LR; A[Driver Position]; B[Destination]; C[Road Graph]; D[Traffic Weights]; E[Turn Penalties]; F[A-Star Search]; G[Candidate Route]; H[Route Scorer]; I[Optimal Route]; A –> F; B –> F; C –> F; D –> H; E –> H; F –> G; G –> H; H –> I;

The routing engine also handles rerouting during active trips. When the system detects that a driver has deviated from the planned route or that traffic conditions have changed significantly, it triggers a re-route. The challenge here is avoiding flip-flopping: if two routes have nearly identical costs, you do not want the driver being bounced between them every 30 seconds as minor traffic fluctuations tip the balance. Engineers add hysteresis to the rerouting logic — only reroute if the new path is significantly faster than the current one, not just marginally.

Real-Time Traffic System

Traffic state is the most important input to ETA after the route itself. Getting it right is a continuous data engineering problem.

The primary source of traffic data is the drivers themselves. Every GPS update from every active driver carries implicit speed information: if a driver on a known road segment was at position A four seconds ago and is now at position B, you can compute their speed on that segment. Aggregate this across all drivers on the same segment and you have a real-time speed estimate.

flowchart TD; A[Driver GPS Stream]; B[Speed Inference Service]; C[Road Segment Speed Store]; D[Historical Speed Patterns]; E[Traffic Aggregator]; F[Traffic Snapshot]; G[Routing Engine]; H[ETA Service]; A –> B; B –> C; C –> E; D –> E; E –> F; F –> G; F –> H;

This is a beautiful example of turning a byproduct (location data collected for dispatch purposes) into a product feature (real-time traffic). But it has a cold start problem: in areas with few active Uber drivers, the traffic signal is sparse or absent. In those cases, the system falls back to historical patterns and whatever external traffic data sources it can access.

Historical traffic patterns are stored as speed profiles for each road segment, indexed by day of week and hour of day. Monday morning rush hour on the 101 in San Francisco has a very predictable signature. The system builds these profiles from months of accumulated trip data and uses them to fill in gaps in real-time coverage.

Event detection is another layer. The traffic system watches for sudden changes in speed patterns that might indicate an accident or road closure. If a segment that normally flows at 60 km/h suddenly drops to 10 km/h, that is a signal worth propagating to the routing engine immediately, not waiting for the next scheduled update.

Machine Learning ETA Prediction

Here is the uncomfortable truth about rule-based ETA systems: they are wrong in predictable ways, and machine learning can fix most of those systematic errors.

A pure routing-based ETA takes the planned route, applies current and historical traffic speeds, and computes an expected travel time. This ignores dozens of factors that a trained model can learn: the tendency for pickups near certain venues to take longer because of loading zone situations; the pattern where certain highway on-ramps add 3-4 minutes during specific hours that the speed data does not fully capture; the fact that certain drivers consistently arrive 90 seconds faster than the route prediction suggests.

The ETA prediction model sits on top of the routing output and corrects it.

Feature engineering for this model is where most of the work happens. The features include the raw route-based ETA from the routing engine, real-time speeds on key segments along the route, historical speed variance for those segments at this time and day, weather conditions, the number of other Uber trips currently on the same route segments, the driver’s historical ETA accuracy (some drivers are systematically faster or slower than average), pickup zone type, and more.

flowchart TD; A[Route ETA from Routing Engine]; B[Real-Time Traffic Features]; C[Historical Pattern Features]; D[Driver Behavior Features]; E[Weather Features]; F[City Context Features]; G[Feature Vector Assembly]; H[ML Model Inference]; I[Corrected ETA]; A –> G; B –> G; C –> G; D –> G; E –> G; F –> G; G –> H; H –> I;

The model architecture that tends to work best for this kind of structured prediction problem is gradient boosted trees (XGBoost, LightGBM). These models handle tabular features well, are fast to inference, can be updated incrementally as new trip data arrives, and produce well-calibrated predictions. Deep learning approaches (LSTMs for capturing temporal sequence patterns, graph neural networks for road topology) are used for specific subproblems but add latency and complexity.

One of the most important things the ML system does is quantify uncertainty. Rather than just predicting “14 minutes,” the system can predict “14 minutes with a 90% confidence interval of 12-18 minutes.” This is used downstream for things like showing ranges in the app and for the ride matching algorithm to account for uncertainty in supply-demand balancing.

Online learning is the practice of updating the model continuously as new labeled examples arrive. Each completed trip generates a ground truth: the predicted ETA at trip start versus the actual duration. These examples can be used to retrain the model or update its parameters in near-real-time. The challenge is avoiding training on biased samples — if drivers who take longer routes are less likely to complete trips, the training data skews toward fast completions.

Geo-Spatial Infrastructure

Finding drivers near a pickup location, or finding which road segments are relevant to a query, requires efficient geo-spatial indexing. Doing this at scale means you cannot afford expensive distance calculations on every record.

GeoHash divides the Earth’s surface into a hierarchical grid of cells. Each cell is identified by a short string, and cells with similar prefixes are geographically close to each other. This turns 2D proximity queries into 1D prefix searches, which traditional databases can handle with index scans.

QuadTrees are an alternative spatial index that recursively divides a 2D region into four quadrants. They work well when data is unevenly distributed (like drivers, who cluster in city centers) because the tree adapts its density to match the data density.

For road graph queries, Uber partitions the global road graph into geographic tiles. Each tile contains the full topology of its region plus information about how to join the edges that cross tile boundaries. This lets the routing engine load only the tiles relevant to a given trip, rather than the entire global graph.

The nearby driver search is a classic problem in geo-spatial systems. Given a pickup location, find all drivers within X kilometers. With GeoHash, you convert the pickup location to a GeoHash cell and query all drivers in that cell and its neighbors. The neighbor lookup is the tricky part — with GeoHash, you have to handle cell boundaries carefully to avoid missing drivers just on the other side of a cell edge.

flowchart LR; A[Pickup Location]; B[GeoHash Encoder]; C[Cell and Neighbors]; D[Driver Location Store]; E[Candidate Drivers]; F[Distance Filter]; G[Available Drivers]; A –> B; B –> C; C –> D; D –> E; E –> F; F –> G;

Streaming Systems and Real-Time Pipelines

Everything we have described so far — GPS processing, traffic aggregation, ETA updates — happens continuously, not in batch. The backbone is a streaming data platform.

Kafka is the message bus that connects these systems. GPS events flow from drivers into Kafka topics. Stream processors consume from these topics, do their work (map matching, speed aggregation, ETA recalculation), and produce into other Kafka topics. The result is a series of processing stages, each consuming and producing events, with the final ETA sitting in a cache ready to serve.

flowchart LR; A[GPS Events Topic]; B[Map Matcher Consumer]; C[Matched Location Topic]; D[Speed Aggregator]; E[Traffic Topic]; F[ETA Recompute Service]; G[ETA Cache]; A –> B; B –> C; C –> D; D –> E; C –> F; E –> F; F –> G;

Windowed aggregations are how the traffic system computes road segment speeds. A sliding window over the last 5 minutes of GPS events for a given road segment gives you the current average speed. Flink and similar stream processing frameworks handle this natively.

Late-arriving events are a real operational problem. GPS pings sometimes get buffered on the device because of network connectivity and arrive out of order. A naive system that processes events in arrival order will compute incorrect speeds. The proper handling is to use event time (the timestamp from the device) rather than processing time (when the system received it), and to hold a short watermark window to allow late events to be incorporated before finalizing computations.

Exactly-once processing matters when events drive state changes. If a GPS event is processed twice (due to a consumer restart), you do not want it counted twice in the traffic average. Kafka and Flink together support exactly-once semantics through transactional commits.

Database Design

The ETA system touches several different data stores, each chosen for specific access patterns.

Cassandra stores driver location history and GPS event archives. It is designed for high write throughput, which matches the GPS ingestion workload. Reads are by driver ID and time range, which fits Cassandra’s partition key model well. Cassandra’s eventual consistency is acceptable here because we do not need perfect accuracy for historical location data.

Redis is the primary cache for current driver positions and recent ETA computations. Sub-millisecond read latency is essential when the app is refreshing the driver’s position on the map every few seconds. Redis Sorted Sets are particularly useful for geo-spatial queries — you can store driver IDs scored by GeoHash and do range queries to find nearby drivers.

A time-series database like InfluxDB or TimescaleDB stores traffic speed snapshots per road segment over time. The access pattern is: give me the average speed on segment X for the last 30 days at 5pm on Tuesdays. Time-series databases have native support for these kinds of downsampled historical queries.

PostgreSQL with PostGIS handles the road graph and map data. Road segments, intersection topology, turn restrictions, and speed limits are relational data with complex relationships. PostGIS adds the geo-spatial query capability needed to find all road segments within a bounding box or compute the length of a geometry.

Data Store	What It Stores	Why This Choice	Key Access Pattern
Cassandra	GPS events, driver history	High write throughput, time-series partitioning	Write-heavy, read by driver + time range
Redis	Current driver positions, ETA cache	Sub-millisecond reads, geo-sorted sets	Point reads, nearby driver lookup
InfluxDB / TimescaleDB	Road segment speed history	Native time-series aggregations	Historical speed profiles by segment + time
PostgreSQL + PostGIS	Road graph, map topology	Relational integrity, spatial queries	Road graph traversal, spatial bounding box queries
Elasticsearch	Address search, place names	Full-text search, geo-distance queries	Destination autocomplete, address resolution

GPS event schema in Cassandra might look like this:

CREATE TABLE gps_events (
  driver_id    UUID,
  event_time   TIMESTAMP,
  latitude     DOUBLE,
  longitude    DOUBLE,
  speed_kmh    FLOAT,
  heading      FLOAT,
  accuracy_m   FLOAT,
  PRIMARY KEY (driver_id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);

The partition key is driver_id so all events for a driver land on the same partition. The clustering key is event_time descending so the most recent events come back first on any range scan.

Caching and Performance Optimization

The ETA system has to serve sub-200ms responses to millions of concurrent users. Caching is not an optimization — it is a core architectural requirement.

Route caching stores the computed route between two fixed points. If a driver is picking up a rider at a major landmark, there might be thousands of pickups from that location per day. The optimal route from a given origin-destination pair does not change dramatically minute-to-minute. The routing engine can serve cached routes for common pairs and only run the full graph traversal for novel queries.

ETA results are cached at the trip level and invalidated when conditions change meaningfully. The definition of “meaningfully” matters here. A 10-second change in ETA does not warrant a cache invalidation. A 2-minute change does, especially if the trip has just started. Engineers tune these thresholds based on user research about when ETA changes are noticeable and frustrating.

Traffic tile caching stores pre-computed traffic overlays for geographic tiles. Instead of assembling the traffic picture from raw speed data on every routing request, the system periodically bakes traffic speeds into the road graph weights and serves those cached weights. This shifts computation from query time to background processing.

Hotspot mitigation handles the “stadium effect”: when a venue lets out and thousands of riders all request Uber simultaneously from the same location, the nearby driver search and ETA computation for that area gets hammered. One approach is to pre-compute and warm the cache for known high-demand zones before events end, using venue calendars and historical demand patterns.

Scaling Uber ETA Systems

System Component	Primary Bottleneck	Scaling Strategy	Tradeoff
GPS Ingestion	Write throughput at peak	Kafka partition scaling, stateless consumers	More partitions increase consumer coordination overhead
Map Matching	HMM computation per driver	Stateful stream processors sharded by driver ID	Rebalancing shards is expensive when consumers restart
Routing Engine	Graph traversal CPU	Geo-sharded routing nodes, precomputed hierarchies	Cross-shard queries needed for long trips
ML Inference	Model load and feature assembly	Model server clusters, feature store caching	Model freshness vs inference latency tradeoff
Geo Queries	Nearby driver lookup frequency	Redis geo-sorted sets, read replicas	Eventual consistency on driver position reads

The GPS ingestion service is horizontally scaled: add more Kafka partitions and more consumer instances. Because GPS events from a given driver need to be processed in order, events for the same driver are routed to the same partition using driver ID as the partition key. This ensures ordering within a driver’s event stream without requiring global ordering across all drivers.

The routing engine is geo-sharded. Each routing node is responsible for a geographic region and holds the road graph for that region in memory. Routing queries within a single region are served locally. Long trips that cross shard boundaries require coordination between routing nodes, which is handled by a routing federation layer.

Multi-region deployment is critical for latency. A rider in Mumbai should not have their ETA computed in a data center in Virginia. Routing and ETA services run in regional clusters close to the users they serve. Global state (like model weights) is replicated to all regions. Regional state (like driver positions) lives in the regional cluster and is not globally replicated.

Reliability and Availability

The ETA system has multiple failure modes that need graceful degradation rather than hard failures.

GPS outages happen when drivers lose network connectivity. The map matcher maintains a dead reckoning estimate: given the last known position, heading, and speed, project forward in time to estimate the current position. This estimate degrades quickly (30-60 seconds before it becomes unreliable) but it is better than showing no position at all.

Stale traffic data is handled by aging: if a road segment’s speed data has not been refreshed in the last five minutes, the system starts blending it with historical averages, weighting toward historical as the data gets older. After a configurable threshold, the segment is treated as having only historical data.

The fallback ETA system is a simplified rule-based model that can run without ML inference or live traffic data. It uses road type, distance, and time of day to produce a rough ETA. This fallback is less accurate but far more reliable — it has almost no external dependencies. When the full pipeline degrades, the fallback keeps the product functional.

Monitoring the ETA system requires multiple layers:

Infrastructure metrics: Kafka consumer lag, routing engine latency percentiles, ML model inference latency, cache hit rates
Data quality metrics: GPS event rate per region (a sudden drop suggests ingestion issues), map matching confidence scores, ETA prediction confidence
Business metrics: prediction accuracy measured against completed trips, the percentage of trips where actual duration exceeded predicted ETA by more than 20%, and driver arrival time accuracy

Engineering Tradeoffs

Real-time traffic versus historical traffic is the most fundamental tradeoff. Real-time data is more accurate for current conditions but noisy and sparse. Historical data is reliable but may be wrong when conditions deviate from norms. The right answer is a weighted blend, and the optimal weights change by scenario: for short trips in well-covered areas, real-time dominates; for long trips in sparse areas, historical patterns carry more weight.

Latency versus accuracy is the tension every ML inference pipeline faces. A more complex model might produce 5% more accurate ETAs, but if it adds 50ms to the response time, the user experience degrades. Uber’s serving infrastructure runs ML models behind latency SLAs — if a model cannot respond within its budget, the system falls back to a simpler model rather than waiting.

Route freshness versus compute cost is a cache management problem. Recomputing routes continuously would give the most up-to-date routing decisions but would require enormous compute capacity. Caching routes for 2-3 minutes is cheap but might miss a road that closed 90 seconds ago. The engineering decision is to cache most routes but subscribe to real-time road closure events and invalidate affected cached routes immediately.

The ML model update cadence involves a tradeoff between recency and stability. A model retrained daily incorporates recent driving patterns but might overfit to anomalies. A model retrained weekly is more stable but slower to adapt to new construction patterns or changed traffic conditions. Production systems typically run a shadow evaluation: the new model runs alongside the old one for a period before being promoted, with accuracy metrics compared before any traffic is switched.

Real-World Technology Stack

Technology	Role in ETA System	Why It Fits
Go	GPS ingestion, routing service	Low latency, high concurrency, small memory footprint per goroutine
Java / Kotlin	Core platform services, stream processing	JVM ecosystem, mature Kafka client libraries, Flink runs on JVM
Python	ML model training, feature engineering	Rich ML ecosystem (XGBoost, TensorFlow, Pandas), fast prototyping
Kafka	Event streaming backbone	High throughput, durable, consumer group scaling, replay capability
Flink	Stateful stream processing	Exactly-once semantics, event-time windowing, managed state
Redis	Driver position cache, ETA cache	Sub-millisecond reads, geo-sorted sets, atomic operations
Cassandra	GPS event store, driver history	Linear write scalability, tunable consistency, time-series friendly
PostgreSQL + PostGIS	Road graph, map data	Spatial indexing, complex relational queries, battle-tested
Kubernetes	Service orchestration	Pod autoscaling for traffic bursts, rolling deployments, health checks
XGBoost / LightGBM	ETA correction model	Fast inference, handles tabular features, well-calibrated predictions

Go is particularly well-suited for the high-concurrency ingestion services because its goroutine model handles tens of thousands of concurrent connections cheaply. A single Go service can ingest GPS events from hundreds of thousands of drivers without the thread-per-connection overhead that would bottleneck a Java service.

Flink over raw Kafka consumers for stream processing is worth the operational overhead because Flink manages state consistency through checkpointing. If a stream processor crashes, Flink restores its state from the last checkpoint and resumes processing from exactly where it left off. Without this, a crash would either reprocess events (causing double-counting in traffic aggregations) or miss events entirely.

System Design Interview Perspective

When an interviewer asks you to design Uber’s ETA system, they are testing several things simultaneously: your ability to decompose a complex problem, your knowledge of distributed systems primitives, your understanding of data pipelines, and your sense for what actually matters versus what is premature optimization.

Start with the problem definition. Many candidates jump straight into drawing services and databases. Better candidates start by asking: what does the system need to do? What are the latency requirements? How accurate does the ETA need to be? How many concurrent users? This framing shows you understand that architecture follows requirements.

Then establish the data flows before the system components. Where does the input data come from? (GPS events from drivers.) Where does it need to go? (ETA on the rider’s screen.) What transformations happen in between? (Map matching, routing, ML adjustment, caching.) This mental model of data flow makes the architectural diagram feel inevitable rather than arbitrary.

When you draw the architecture, focus on the critical path first. The critical path for ETA is: GPS event from driver arrives, gets map-matched, goes into routing, gets ML-adjusted, gets cached, gets served to the rider’s app. Everything else is supporting infrastructure. Candidates who get lost designing the admin dashboard or the reporting pipeline before the critical path has been addressed are not demonstrating good engineering judgment.

Interviewers will probe your bottlenecks. “What breaks first when your system gets 10x more drivers?” Be ready to walk through: GPS ingestion throughput, Kafka partition limits, stream processing consumer lag, routing engine CPU, ML inference latency, and cache memory. For each one, know the mitigation: horizontal scaling, sharding strategy, caching, async processing, or model simplification.

Common mistakes to avoid: designing a request-response system where ETA is computed on-demand rather than pre-computed; forgetting the map matching step and assuming raw GPS is usable directly; treating ML as a magic box without discussing features and training data; ignoring the failure modes and assuming happy path throughout; focusing on exact consistency when eventual consistency is perfectly acceptable for ETA data.

Strong candidates discuss the ML model’s role with nuance. They explain why static routing is not enough, what features the model would use, how the model is trained and updated, and what the fallback is when ML inference is unavailable. They also acknowledge the feedback loop: ETA predictions affect driver and rider behavior, which affects the actual trip duration, which feeds back into training data.

The best answers in these interviews feel like the candidate has actually thought about building this system rather than having memorized a template. That means discussing real tradeoffs, acknowledging uncertainty, and being willing to say “here I would run an experiment to determine the right threshold” rather than inventing a precise number with false confidence.

Closing Thoughts

That number on your screen — “3 minutes” — is the output of a genuinely impressive engineering system. GPS streams flowing from hundreds of thousands of devices every few seconds. Map matching algorithms keeping every driver pinned to the correct road. A routing engine traversing a weighted graph of continental scale in milliseconds. Traffic aggregation systems turning driver speed data into real-time congestion maps. ML models layered on top to correct for all the systematic biases that physics-based models cannot capture. Caches keeping the whole thing fast enough to feel instantaneous.

What makes this system interesting to study is that almost every design decision has a clear reason rooted in real engineering constraints. The pre-computation model exists because ETA read traffic dwarfs write traffic. The Kafka event bus exists because the system is fundamentally event-driven. The hybrid traffic model exists because real-time data is noisy and historical data is stale. The geo-sharded routing engine exists because road graph traversal is CPU-intensive and network round trips add latency.

If you are preparing for a system design interview or trying to build something similar, the key takeaway is this: understand the data flows first, understand the access patterns second, and let the component choices follow naturally from those constraints. The technology stack matters far less than the clarity of thought about what problem each part of the system is actually solving.