How Zomato Works?

May 20th, 2026

The moment you tap “Place Order” on Zomato, a massive chain of events begins instantly behind the scenes. Within seconds, the system identifies your location, finds the right restaurant, assigns a nearby delivery partner, processes your payment, and starts estimating delivery time — all before you even put your phone away. What feels simple on the surface is actually a real-time logistics system coordinating thousands of moving parts across an entire city.

Food delivery sounds like a simple logistics problem until you actually try to build it. Then it reveals itself as one of the hardest classes of distributed systems work: real-time geo-spatial computation, dynamic resource allocation, live tracking at millions of concurrent sessions, recommendation engines that personalize without being creepy, and payment flows that must never lose money even when half the network is flaking. Zomato operates across hundreds of cities, millions of daily orders, and tens of thousands of restaurant partners. Let us walk through how a system like this actually works.

What Zomato Actually Does

Most people know Zomato as an app where you order food. But from an engineering perspective, Zomato is really three distinct systems bolted together and made to feel seamless.

The first is a restaurant discovery and search platform. This is the part that helps you find what to eat, browse menus, read reviews, and decide. It behaves like a specialized search engine with local context.

The second is a transactional ordering and logistics system. This handles the actual purchase: taking your order, processing payment, routing it to the restaurant, orchestrating the delivery from pickup to dropoff.

The third is a personalization and recommendations engine. This is the layer that learns your preferences, predicts what you will want to eat on a Tuesday night versus a Sunday brunch, and surfaces restaurant and dish suggestions that feel eerily relevant.

These three systems share infrastructure but have very different performance and correctness requirements. The search system can tolerate slight staleness. The ordering system cannot afford to lose a transaction. The recommendation system can afford to be a little wrong as long as it is interesting. Understanding that these systems have different tradeoffs is the first thing a good engineer thinks about before designing anything.

Core Features at a Glance

Before going deep, it helps to enumerate what Zomato actually provides so we can map features to systems:

Restaurant discovery: search by location, cuisine, rating, price range, dietary filters
Menu browsing: dish listings, photos, prices, customizations, availability
Food ordering: cart management, address selection, order placement
Payment processing: cards, wallets, UPI, cash on delivery, offers
Delivery partner assignment: matching orders to nearby riders
Real-time tracking: live GPS updates for in-progress deliveries
ETA prediction: estimated preparation and delivery times
Order status updates: notifications at each lifecycle stage
Ratings and reviews: post-delivery feedback collection
Recommendations: personalized restaurant and dish suggestions
Restaurant partner tools: menu management, order acceptance, analytics
Delivery partner app: navigation, earnings, order queue

Each of these features, under production traffic, is a non-trivial engineering problem. We will go through the most architecturally interesting ones in detail.

High-Level Architecture

At the highest level, Zomato’s backend is a collection of microservices organized around business domains. The client apps — iOS, Android, and web — communicate through an API Gateway that handles authentication, rate limiting, request routing, and protocol translation. Behind the gateway, services are organized roughly like this:

flowchart TD; A[Mobile App or Web Client]; B[API Gateway]; C[Auth Service]; D[Restaurant Service]; E[Search Service]; F[Order Service]; G[Payment Service]; H[Delivery Service]; I[Tracking Service]; J[Notification Service]; K[Recommendation Service]; L[ETA Service]; M[Analytics Service]; N[Kafka Event Bus]; O[Redis Cache Layer]; P[PostgreSQL Cluster]; Q[Cassandra Cluster]; R[Elasticsearch Cluster]; A –> B; B –> C; B –> D; B –> E; B –> F; B –> G; B –> H; B –> I; B –> K; F –> N; H –> N; I –> N; N –> J; N –> M; D –> O; E –> R; F –> P; I –> Q; H –> L;

The API Gateway is not just a reverse proxy. It does meaningful work: JWT validation, request deduplication, circuit breaking on downstream services, response caching for idempotent reads, and traffic shaping during surge events. Treating the gateway as just a traffic forwarder is a common architectural mistake.

The event bus — think Kafka — is what ties the asynchronous parts of the system together. When an order is placed, the Order Service emits an event. The Notification Service, Analytics Service, Delivery Service, and several others all react to that event independently. This decoupling is what lets the system scale different concerns at different rates without tightly coupling the services.

Restaurant Discovery System

When you open Zomato and see a list of restaurants near you, something surprisingly sophisticated just happened. The system needed to answer a query that sounds simple — “give me restaurants within 5 kilometers of this GPS coordinate, ranked by relevance” — but at scale, this is a hard problem.

The naive solution is a database query like SELECT * FROM restaurants WHERE distance(lat, lng, user_lat, user_lng) < 5000. This does not scale. Computing Euclidean or Haversine distance for every restaurant in the database on every request is O(n) and collapses under load.

The production solution uses geo-spatial indexing. Zomato-like systems typically use one of two approaches: GeoHash-based bucketing or spatial trees like R-trees. Elasticsearch and OpenSearch both support geo-distance queries natively and use these techniques under the hood.

GeoHash works by encoding latitude/longitude into a short alphanumeric string. The key insight is that nearby coordinates share a common prefix. For example, a restaurant at GeoHash ttnf and another at ttnm are close to each other. By indexing restaurants on their GeoHash prefix, you can find nearby restaurants with a prefix scan rather than a full table scan. The cost drops from O(n) to O(log n) plus the size of the result set.

flowchart TD; A[User GPS Location]; B[Compute GeoHash for User]; C[Query Nearby GeoHash Cells]; D[Elasticsearch Geo Query]; E[Fetch Restaurant Documents]; F[Apply Business Filters]; G[Rank by Relevance Score]; H[Return Restaurant List]; A –> B; B –> C; C –> D; D –> E; E –> F; F –> G; G –> H;

The ranking step is where it gets interesting. Raw distance is only one signal. Zomato’s ranking model for restaurant discovery likely weighs:

Distance from user
Restaurant rating and review volume
Estimated delivery time
Current kitchen load and open/closed status
Cuisine match to user preference history
Promotional boost from restaurant partners
Time of day relevance (breakfast spots rank higher in the morning)
Order completion rate (restaurants that cancel orders frequently get penalized)

This is a learning-to-rank problem, and in production, it is usually solved with a gradient boosted tree model or a small neural ranker that can score candidate restaurants quickly. The candidate retrieval (geo query) is fast; the re-ranking happens over a bounded set of 50 to 200 candidates and is also fast.

One important detail: restaurant metadata like rating, cuisine tags, and menu availability is heavily cached. Redis sits in front of the restaurant service. Most restaurant profile reads never touch the primary database. Cache invalidation happens on a write-through basis when restaurant partners update their listings.

Food Ordering Pipeline

The ordering pipeline is where correctness requirements are strictest. Money changes hands, inventory changes state, and multiple parties need to be coordinated — all within a few seconds.

Here is the high-level flow:

flowchart TD; A[User Submits Order]; B[Validate Cart and Prices]; C[Reserve Payment]; D[Create Order Record]; E[Notify Restaurant]; F[Restaurant Accepts or Rejects]; G[Assign Delivery Partner]; H[Track Preparation]; I[Pickup and Deliver]; J[Mark Order Complete]; K[Capture Payment]; L[Release Payment on Failure]; A –> B; B –> C; C –> D; D –> E; E –> F; F –>|Accepted| G; F –>|Rejected| L; G –> H; H –> I; I –> J; J –> K;

The trickiest part here is the distributed transaction problem. Placing an order requires: validating item prices, deducting from the user’s wallet or pre-authorizing a card, writing an order record, and notifying the restaurant. These operations span multiple services and databases. If any one of them fails mid-flow, you need to roll back cleanly.

The common pattern for handling this is the Saga pattern. Rather than a distributed two-phase commit (which is slow and fragile), you design each step to be reversible. If payment succeeds but the restaurant rejects the order, you issue a compensating transaction to refund the payment. Each service owns its own data and publishes events; failure handlers subscribe to those events and trigger compensating actions.

Idempotency is critical here. Network retries are a fact of life. If the user’s app retries the “place order” request because it timed out, you do not want to create two orders and charge them twice. Every order creation request carries a client-generated idempotency key. The Order Service stores this key and returns the cached result if the same key is seen again.

Delivery Assignment System

Once the restaurant accepts an order, the delivery assignment system takes over. This is a real-time matching problem: you have an order at a specific location, and you need to find the best available delivery partner nearby.

The naive solution — broadcast the order to all nearby riders and see who accepts first — creates a race condition and a poor experience for riders. Production systems use a more deliberate assignment strategy.

flowchart TD; A[Order Accepted by Restaurant]; B[Geo Query for Nearby Riders]; C[Filter Available Riders]; D[Score Riders by ETA and Load]; E[Assign to Best Rider]; F[Rider Accepts]; G[Rider Declines or No Response]; H[Re-assign to Next Rider]; I[Monitor Assignment SLA]; A –> B; B –> C; C –> D; D –> E; E –> F; E –> G; G –> H; H –> D; F –> I;

The scoring function for rider selection considers:

Current distance from the restaurant
Current road conditions and estimated travel time
Rider’s current status (idle, heading to pickup, returning from delivery)
Rider’s acceptance rate history (low-acceptance riders get deprioritized)
Predicted kitchen preparation time (no point sending a rider who arrives in 3 minutes when food takes 20)

This last point is subtle but important. If a rider arrives at the restaurant before the food is ready, they idle at the restaurant, which wastes time and reduces their earnings. The assignment system tries to time the assignment so the rider arrives just as the food is being packaged.

Rider location data is stored in Redis using a Sorted Set with GeoHash scores, or using Redis’s native GEO commands. Redis GEO commands like GEORADIUS let you query for all riders within a radius in microseconds. This is one of the most performance-critical reads in the entire system and must respond in under 10 milliseconds.

One real challenge is handling simultaneous order surges. During a dinner rush, hundreds of new orders might hit the assignment system within seconds. If the system processes assignments sequentially, some orders will wait much longer than others. Production systems use a priority queue and process assignments concurrently, with distributed locking to prevent two orders from claiming the same rider simultaneously.

Real-Time Delivery Tracking

The little moving dot on the tracking screen is the most visible piece of real-time infrastructure in the whole system. Users check it obsessively. It needs to be accurate, low-latency, and available at scale.

Here is how it works end-to-end. The delivery partner’s app sends location updates over a persistent WebSocket connection to the tracking service. The frequency is typically every 3 to 5 seconds when moving, with a backoff to 10 to 15 seconds when stationary. More frequent updates drain battery and saturate the connection; less frequent updates make the tracking appear jittery.

On the server side, each location update goes through a pipeline:

flowchart TD; A[Rider App sends GPS Update]; B[Tracking Service receives via WebSocket]; C[Validate and Denoise Location]; D[Write to Cassandra Time Series]; E[Publish to Kafka Location Topic]; F[ETA Service consumes event]; G[Push updated location to Customer WebSocket]; H[Update Redis with latest position]; A –> B; B –> C; C –> D; C –> E; E –> F; E –> G; C –> H;

Cassandra is the right storage choice here. Location updates are pure append workloads — you never update a historical location, you only write new ones. Cassandra’s LSM-tree storage engine is optimized for exactly this access pattern. Reads for the customer tracking screen only need the latest position, which lives in Redis. Historical location data sits in Cassandra and is used for analytics and ETA model training.

The customer-facing tracking WebSocket is maintained by a separate service from the rider-facing one. This separation matters because the load profiles are completely different. You might have 100,000 concurrent customer tracking sessions but only 20,000 active rider sessions at the same time. Scaling them independently makes operational sense.

One underappreciated challenge is GPS noise. A rider standing still in a dense urban area might report positions that jump around by 20 to 50 meters due to signal multipath. Rendering these raw positions on a map looks terrible — the dot teleports. Production tracking systems apply a Kalman filter or similar smoothing algorithm to denoise the location stream before passing it to clients.

Recommendation System Deep Dive

Zomato’s recommendation system is responsible for the personalized restaurant and dish suggestions you see on the home screen. This is where the system shifts from reactive (serving what you asked for) to proactive (anticipating what you want).

The fundamental problem is a classic two-tower retrieval problem. You have a large item space (tens of thousands of restaurants) and a user with a history of interactions. You want to find the subset of items that are most relevant to this user right now.

User embeddings are learned from order history, search queries, browsed restaurants, time-of-day patterns, and review behavior. Restaurant embeddings are learned from their menu composition, cuisine tags, historical order patterns, and user interaction signals. At serving time, you compute the dot product between the user embedding and candidate restaurant embeddings, then rank by similarity.

flowchart TD; A[User opens Home Feed]; B[Fetch User Embedding from Feature Store]; C[Retrieve Candidate Restaurants via ANN Search]; D[Apply Geo and Availability Filters]; E[Re-rank with ML Ranker]; F[Apply Business Rules and Diversity Constraints]; G[Return Personalized Feed]; H[Log Impression for Training]; A –> B; B –> C; C –> D; D –> E; E –> F; F –> G; G –> H;

The candidate retrieval step uses Approximate Nearest Neighbor search — exact nearest neighbor over millions of embeddings is too slow. Libraries like FAISS or ScaNN can search hundreds of millions of vectors in milliseconds with configurable accuracy tradeoffs.

Re-ranking is where business logic mixes with ML. A pure ML ranker might keep surfacing the same beloved restaurant every time, which is great for engagement but bad for discovery. Production systems add diversity constraints — you cannot show more than two restaurants from the same cuisine in the top six slots, for example. They also incorporate recency bias to promote new restaurant additions in the user’s area.

Cold start is a real problem. A new user has no order history, so there are no preference signals to embed. New users get a hybrid approach: city-level trending restaurants, cuisine preferences collected during onboarding, and time-of-day defaults (breakfast places in the morning, comfort food late at night).

ETA Prediction System

Of all the things Zomato shows you, the delivery ETA is probably the number you care most about. It is also, quietly, one of the hardest things to get right.

Total delivery time = kitchen preparation time + rider-to-restaurant travel time + restaurant-to-customer travel time

Each of these components is independently uncertain, and the errors compound. Kitchen prep time depends on the restaurant’s current load, the complexity of the ordered items, the cook’s speed, and whether the kitchen has all the ingredients in stock. Rider travel time depends on current traffic, the route taken, weather, and rider behavior. The combined uncertainty is significant.

flowchart TD; A[Order Placed]; B[Predict Kitchen Prep Time]; C[Predict Rider to Restaurant Time]; D[Predict Restaurant to Customer Time]; E[Combine Components with Uncertainty Buffer]; F[Show Initial ETA to User]; G[Update ETA as Order Progresses]; H[Rider Picks Up Food]; I[Refine ETA with Live Rider Position]; J[Show Updated ETA]; A –> B; A –> C; A –> D; B –> E; C –> E; D –> E; E –> F; F –> G; H –> I; I –> J;

Kitchen preparation time is predicted using a model trained on historical data per restaurant, per cuisine type, and per hour of day. A restaurant that consistently takes 25 minutes during the dinner rush will have that factored in. Some restaurants have real-time kitchen load signals — order queue length, average prep time in the last 30 minutes — which get fed to the model as features.

Travel time prediction uses a combination of road network data, historical speed data by road segment and time of day, and real-time traffic feeds. This is where a mapping SDK like Google Maps, HERE, or an in-house routing engine comes in. The key insight is that you cannot just use the geometric distance between two points. You need road-aware routing, and that routing needs to be time-aware — driving from point A to point B at 8 AM Friday is completely different from doing it at 2 PM Sunday.

Weather is a meaningful signal that many teams underweight. Rain slows down riders significantly, both because they ride more cautiously and because demand spikes and congestion worsens simultaneously. A well-trained ETA model includes weather features.

The ETA is not static. It gets recalculated every few minutes as the order progresses and as the rider’s actual movement is observed. If the rider is moving faster than expected, the ETA tightens. If they have been stationary for several minutes, the ETA grows. This continuous recalculation is important for maintaining user trust — an ETA that never changes and then suddenly jumps by 20 minutes destroys confidence.

Database and Storage Design

Getting the storage layer right is foundational. A system that uses the wrong database for a workload will hit walls that no amount of horizontal scaling can fix.

Here is a simplified schema for the core entities, expressed in a readable form:

Restaurants Table (PostgreSQL)

CREATE TABLE restaurants (
  id            UUID PRIMARY KEY,
  name          TEXT NOT NULL,
  city_id       INT NOT NULL,
  lat           DOUBLE PRECISION,
  lng           DOUBLE PRECISION,
  geohash       VARCHAR(12),
  cuisine_tags  TEXT[],
  rating        NUMERIC(3,2),
  is_open       BOOLEAN,
  created_at    TIMESTAMPTZ,
  updated_at    TIMESTAMPTZ
);
CREATE INDEX ON restaurants (geohash);
CREATE INDEX ON restaurants (city_id, is_open);

Orders Table (PostgreSQL with partitioning)

CREATE TABLE orders (
  id              UUID PRIMARY KEY,
  user_id         UUID NOT NULL,
  restaurant_id   UUID NOT NULL,
  rider_id        UUID,
  status          TEXT NOT NULL,
  total_amount    NUMERIC(10,2),
  placed_at       TIMESTAMPTZ,
  delivered_at    TIMESTAMPTZ
) PARTITION BY RANGE (placed_at);

Partitioning orders by date is important. Without it, a table with hundreds of millions of rows becomes expensive to query. With monthly partitions, recent order queries (which are the hot path) only scan a small partition.

Location Events (Cassandra)

CREATE TABLE rider_locations (
  rider_id    UUID,
  recorded_at TIMESTAMP,
  lat         DOUBLE,
  lng         DOUBLE,
  speed       FLOAT,
  heading     INT,
  PRIMARY KEY (rider_id, recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC);

This Cassandra schema gives you extremely fast writes (append only) and fast reads for the latest position per rider (first row in partition, thanks to DESC ordering).

Data Type	Storage Choice	Reason	Key Concern
Restaurant profiles	PostgreSQL + Redis	Relational, read-heavy, cacheable	Cache invalidation on updates
Orders	PostgreSQL (partitioned)	ACID compliance, relational joins needed	Partition growth, archival strategy
Rider locations	Cassandra	High-volume append-only time series	Compaction, TTL for old records
Search index	Elasticsearch	Full-text + geo queries	Index refresh latency, shard sizing
Session data	Redis	Millisecond reads, short TTL	Memory pressure, eviction policy
User preferences	Redis + Cassandra	Fast reads, large historical volume	Consistency between layers
Analytics events	Kafka + data warehouse	Streaming ingestion, batch processing	Schema evolution, backpressure

Caching System Deep Dive

Caching is not just about performance. At Zomato’s scale, it is the difference between a system that works and one that falls over during a dinner rush.

The most aggressively cached data is restaurant metadata. A restaurant’s name, cuisine tags, rating, delivery fee, and approximate prep time are read millions of times per day and change infrequently. This data is an ideal candidate for a read-through cache with a TTL of several minutes. The cache hit rate for this data in a well-tuned system should be above 95%.

Menu data is slightly trickier. Prices change, items go out of stock, and new dishes get added. You want to cache menus aggressively but also need to propagate changes quickly. The typical approach is a shorter TTL (30 to 60 seconds) combined with an event-driven invalidation path: when a restaurant updates their menu through the partner app, a cache invalidation event is published to Kafka, and a consumer deletes the affected cache keys.

Search results for popular queries (cuisine type + city + sort by rating) are also cacheable. A city-level cache that stores the top 50 restaurants per cuisine per city, refreshed every few minutes, can absorb a large fraction of home screen traffic without hitting Elasticsearch at all.

Hotspot management is an often-overlooked caching problem. During a major event or extreme weather, everyone in a city might simultaneously search for the same popular restaurant. Even with caching, if that restaurant’s cache key expires and a thundering herd of concurrent requests tries to regenerate it, you blow through your database connection pool. The solution is probabilistic cache refresh (refresh slightly before TTL expires, randomly, to spread the load) and mutex-based cache filling (only one thread regenerates the cache while others wait on the result).

Event-Driven Architecture

The event bus is what makes all the asynchronous parts of Zomato work without coupling services together. Think of Kafka as a durable, ordered, replayable log of everything that happened.

Every meaningful state transition in an order’s lifecycle produces an event:

order.placed
order.restaurant_confirmed
order.rider_assigned
order.picked_up
order.delivered
order.cancelled

Multiple services consume these events independently. The Notification Service sends push notifications. The Analytics Service updates real-time dashboards. The Fraud Detection Service evaluates the order for anomalies. The Recommendation Service records the interaction for future model training. None of these consumers know about each other, and none of them are in the critical path of the order being placed.

This is the key scalability benefit of event-driven design. If the Notification Service is slow or temporarily unavailable, the order still completes. Events queue up in Kafka (which can hold them for days) and are processed when the service recovers. Contrast this with a synchronous architecture where the order service calls the notification service directly — an outage in the notification service takes down order placement with it.

flowchart TD; A[Order Service]; B[Kafka Topic - order-events]; C[Notification Service]; D[Analytics Service]; E[Fraud Detection Service]; F[Recommendation Service]; G[Delivery Assignment Service]; A –> B; B –> C; B –> D; B –> E; B –> F; B –> G;

Event ordering matters in some cases. If you receive a order.delivered event before order.picked_up, something went wrong. Kafka guarantees ordering within a partition, so if you partition by order ID, all events for a given order arrive in sequence to the same consumer. Cross-partition ordering is not guaranteed, which is fine because order events from different orders are independent.

Scalability Deep Dive

When traffic doubles, which parts of the system break first? This is the most important question for any engineer working on a platform like this.

The restaurant discovery and search path is probably the most read-heavy. A single user session might trigger five to ten search queries before placing an order. Elasticsearch clusters need to be sized generously and should sit behind aggressive caching. The geo queries are fast per-query but expensive in aggregate. Shard routing in Elasticsearch by city or geographic region keeps individual shard sizes manageable and query fan-out bounded.

The delivery assignment system is the most write-heavy real-time workload. During peak dinner hour, you might be processing thousands of assignment decisions per minute, each requiring a geo query against the live rider positions in Redis. Redis handles this well as a single-threaded in-memory store, but it needs to be sharded if rider counts grow large. Consistent hashing across Redis shards keeps the rider geo data partitioned without requiring cross-shard queries for nearby-rider lookups.

System Component	Scaling Bottleneck	Scaling Strategy	Known Limitation
Restaurant Search	Elasticsearch query load	Caching, read replicas, geo-partitioned shards	Cache invalidation lag during updates
Delivery Assignment	Redis geo query throughput	Redis cluster with geo-sharding	Cross-shard radius queries at shard boundaries
Live Tracking	WebSocket connection count	Stateless WS gateway with shared Redis pub/sub	Redis pub/sub fan-out at very high scale
ETA Prediction	ML model inference latency	Model serving with horizontal replicas	Feature freshness vs compute cost tradeoff
Order Processing	Database write throughput	Partitioned tables, connection pooling	Cross-partition analytics queries are slow
Notifications	Push gateway rate limits	Priority queues, provider batching	APNs and FCM have their own reliability SLAs

Multi-city deployments introduce geo-partition considerations. Running a single global cluster for all cities is wasteful and introduces unnecessary latency — a user in Mumbai does not need their request processed in a data center that also serves Delhi. In practice, Zomato likely runs regionally isolated clusters per metro area or cluster of cities, with shared global services for things like authentication and payments that need a single source of truth.

Reliability and Availability

A food delivery platform being down during dinner time is extremely bad. Users get frustrated, restaurants lose revenue, riders lose earnings. The reliability requirements here are more like financial services than like a casual social app.

The most important failure modes to design for are:

Payment outages. If the payment gateway is slow or unavailable, orders cannot be placed. The solution is multi-provider redundancy: if the primary payment provider fails, route to a backup. For pre-authorized card payments, implement retry logic with exponential backoff. For wallet payments, the local service handles more of the logic and is less dependent on external APIs.

Delivery partner unavailability. If no riders are available in an area, orders stack up and prep-time windows are violated. This is both a business problem and a UX problem. The system should detect low rider availability early and either surface a longer ETA estimate upfront or offer a “scheduled delivery” option. It should not accept orders and then fail to fulfill them silently.

Database leader failover. PostgreSQL and Cassandra both support synchronous replication to standbys. During a primary failure, automated failover promotes a standby to leader in 20 to 60 seconds. The key is that the application layer needs to retry connection establishment during this window rather than failing immediately. Connection pool libraries handle this if configured correctly.

Monitoring is not optional. Every service should emit the four golden signals: latency percentiles, error rate, request throughput, and saturation (CPU, memory, queue depth). Alerting on p99 latency degradation often catches problems before they become outages. Distributed tracing (something like Jaeger or Zipkin) is essential for understanding where latency is coming from in a multi-service request chain.

Pricing and Surge Systems

Surge pricing in food delivery is controversial but economically rational. When rider supply is low relative to demand — rainy evenings, major events, holiday periods — raising delivery fees incentivizes more riders to come online and moderates order demand to a level the available supply can handle.

The surge pricing system is essentially a real-time supply-demand balancer per geographic micro-zone. The system computes a demand signal (orders per minute in the zone) and a supply signal (available riders in the zone) and derives a surge multiplier. The multiplier feeds into the delivery fee shown to the user before they place an order.

Getting this system wrong in either direction is bad. Surge too aggressively and users leave. Surge too little and riders go offline because earnings are not worth the difficulty of peak-hour delivery, which makes fulfillment worse for everyone. The balance point is found through continuous A/B testing and feedback loops from both user conversion data and rider acceptance rate data.

Security and Fraud Prevention

Food delivery platforms are targets for several categories of fraud. Fake orders placed with stolen payment credentials, fake delivery completions where riders mark orders as delivered without actually delivering them, review manipulation, and coupon abuse are all real problems at scale.

Payment fraud is handled by the payment gateway’s risk scoring combined with Zomato’s own order-level fraud signals: new account placing a high-value order, unusual address, device fingerprint mismatch, velocity signals. High-risk orders can be routed to a manual review queue or declined automatically.

Delivery fraud — where riders mark orders as delivered prematurely — is detected by comparing the rider’s GPS position at the time of the delivery completion event against the delivery address coordinates. If the rider is more than 200 meters away from the address when marking delivered, it triggers an investigation workflow.

OTP confirmation on delivery (where the customer receives a code they read to the rider) is a common mitigation for this type of fraud, though it adds friction to the delivery experience.

Engineering Tradeoffs Worth Discussing

Real engineering is about choosing between imperfect options. Here are some of the most interesting tradeoffs in a food delivery system:

Real-time tracking vs battery life. Sending GPS updates every second gives perfect tracking but kills a rider’s phone battery within hours. Every 3 to 5 seconds is the practical sweet spot. For stationary riders, you back off to every 15 seconds. The tradeoff is acceptable because users do not actually benefit from sub-second position updates on a map.

ETA accuracy vs compute cost. Running a full ML inference with 50 features every time an ETA is requested is expensive. Running a simpler lookup model is fast but less accurate. Most systems use a cheap model for the initial ETA estimate at order placement and reserve the more expensive model for in-transit updates when accuracy matters more.

Delivery batching vs speed. Giving a single rider two orders to deliver simultaneously improves efficiency and reduces cost but increases average delivery time. Users who ordered first experience higher delivery time so that the economics work for the second order. How much batching is acceptable is a business decision, but the system design needs to support configurable batching windows and scoring functions.

Consistency vs availability in order status. During a network partition, should a user be able to see their order status even if the data might be slightly stale? For read operations, yes — showing a 5-second-old status is better than showing an error page. For write operations (cancel an order, change address), you need stronger consistency guarantees because incorrect state transitions are hard to undo.

Technology Stack

The realistic stack for a Zomato-scale system looks something like this:

Technology	Use Case	Why It Fits
Go	High-throughput services (tracking, assignment)	Low latency, excellent concurrency model, small memory footprint
Java or Kotlin	Order service, payment service	Strong type system, mature ecosystem, good for complex business logic
Node.js	API Gateway, notification service	Event loop fits I/O-bound gateway workloads well
PostgreSQL	Orders, restaurants, users	ACID transactions, excellent partitioning, mature geo extensions
Cassandra	Location events, order history analytics	Optimized for append-heavy time series, horizontally scalable
Redis	Caching, rider geo index, session storage	Sub-millisecond reads, native geo commands, pub/sub for tracking
Elasticsearch	Restaurant search, dish search	Native geo queries, full-text search, flexible ranking
Kafka	Event streaming, async communication	Durable, ordered, replayable, high-throughput
Kubernetes	Container orchestration	Horizontal pod autoscaling, rolling deployments, service mesh
TensorFlow or PyTorch	ETA models, recommendation models	GPU training support, model serving integration

System Design Interview Perspective

Zomato-style questions show up frequently in system design interviews. Interviewers usually frame it as “Design a food delivery app” or “Design Zomato’s delivery assignment system” or “Design the real-time tracking system.” Here is how to approach each.

When asked to design the full system, do not try to cover everything at equal depth. Start with a brief requirements clarification — how many daily active users, how many orders per day, what latency targets. Then sketch the high-level architecture. Then dive deep on two or three components: the interviewer cares more about your depth on key areas than surface-level coverage of everything.

The strongest candidates walk through the ordering pipeline end-to-end and discuss distributed transaction handling unprompted. They bring up idempotency before the interviewer asks. They proactively discuss the failure scenarios — what happens when the restaurant rejects the order after payment is captured, what happens when the rider goes offline mid-delivery.

For the delivery assignment question specifically, strong candidates go beyond “find nearby riders and assign” and discuss: how you time the assignment relative to kitchen prep, how you handle simultaneous surge, how you prevent double-assignment, and how you handle the reassignment workflow when a rider declines.

Common weak spots in interviews on this topic: candidates often undersell the complexity of ETA prediction, treat the notification system as a footnote rather than a reliability challenge, and forget to discuss geo-spatial indexing specifics when discussing restaurant discovery. Bringing up GeoHash, Redis GEO commands, or Elasticsearch geo-distance queries signals production-level familiarity.

One framing that works well in interviews: explain what data you are reading and writing at each step, what the latency budget is, and whether the operation needs to be strongly consistent. This forces you to think concretely about storage choices and quickly reveals whether a proposed design will actually work under load.

Closing Thoughts

Food delivery is a domain where the physical and digital worlds are tightly coupled. Every system design decision ultimately affects whether a real person gets their dinner on time, whether a delivery partner earns a fair living, and whether a restaurant can run a predictable operation. That coupling is what makes the engineering interesting.

The principles at work here — geo-spatial indexing, event-driven decoupling, distributed transactions with compensating actions, real-time stream processing, personalization through embeddings — are not Zomato-specific. They appear in ride-sharing, logistics, e-commerce, and financial systems. Understanding them through the lens of something as concrete and relatable as food delivery is a useful way to internalize patterns that transfer broadly.

The next time you watch that little bike icon move across a map toward your door, you will know at least a little of what it took to make that dot move.