Blogs


How Uber Computes ETA?

The magic of Uber doesn’t begin when the car arrives. It begins the instant the app tells you “how long” the wait will be. A tiny estimate — “2 minutes away” or “6 minutes away” — flashes onto your screen so casually that most people never think twice about it. Yet producing that single number requires a planet-scale system constantly processing live GPS streams, road traffic, driver movement, rider demand, map intelligence, and prediction models in real time. What looks like a simple countdown is actually the visible tip of one of the most advanced distributed systems ever engineered for everyday consumers.

That number is the Estimated Time of Arrival, and computing it correctly — at global scale, in real time, across millions of concurrent users — is genuinely one of the hardest problems in applied engineering.

Alt text

This post is a deep walkthrough of how a system like Uber’s ETA engine works. We will go through the GPS infrastructure, the map matching algorithms, the routing engines, the machine learning prediction pipelines, the streaming systems, the geo-spatial indexing, and the tradeoffs that engineers make every day to keep that number accurate and fast.

Read on →

How ChatGPT Works?

There is a moment, maybe you have felt it yourself, where you type a question into ChatGPT and within seconds you get a response that feels remarkably thoughtful. It does not just return a keyword match. It understands context, it reasons through problems, it can write code and explain concepts and help you draft emails. And it does all of this for millions of people simultaneously, in real time.

Alt text

If you are an engineer looking at that and thinking “okay, but what is actually happening behind that text box?”, this post is for you.

We are going to go deep. Not just “there’s a transformer model and it predicts tokens” deep. We are going to talk about the full engineering stack: how prompts flow through distributed systems, how GPUs communicate across data centers, how the inference pipeline is optimized for latency, how memory and context are managed, and what tradeoffs the engineering teams at OpenAI are navigating every single day. By the end, you should have a genuine mental model of how a system like ChatGPT is actually built.

Read on →

How Youtube Works?

There is a moment every engineer has when they first truly think about what YouTube does. Not the product, but the machine. Someone in rural Indonesia uploads a phone video of a street cat doing something peculiar. Within minutes, that video is available in crisp 1080p to a user in São Paulo, another in Stockholm, and a third on a slow connection in rural Kenya who gets a smooth 360p stream without a single rebuffering event. The recommendation engine is already deciding who else should see it. The ad system has already matched it to relevant advertisers. The copyright scanner has already checked it against a database of millions of audio and video fingerprints.

Alt text

That is not magic. That is engineering, done at a scale that very few systems in the world have ever had to achieve.

YouTube serves over 2 billion logged-in users every month. More than 500 hours of video are uploaded to the platform every single minute. The platform delivers over a billion hours of video playback per day. When you build a system at that scale, you cannot afford to think about problems the way you would in a startup. Every architectural decision has second and third order consequences. A naive caching strategy does not just waste a few dollars — it can collapse under load during a major event. A poorly designed upload pipeline does not just frustrate one creator — it fails millions simultaneously.

Read on →

How Airbnb Works?

Every time you search for a place to stay in Tokyo, lock in a booking for next weekend in Lisbon, or message a host about parking — you’re touching a system that handles millions of concurrent users, real-time availability across 7 million listings, payment transactions in dozens of currencies, and geo-spatial searches across the entire planet. Let’s pull back the curtain.

Alt text

Airbnb is one of the most fascinating systems to think about from an engineering standpoint. Not because any single piece is extraordinarily novel on its own, but because the combination of problems they have to solve simultaneously is genuinely hard.

You have geo-spatial search that needs to return results in under 200ms. You have a booking system that must never double-book a property, even when two guests are clicking “Reserve” at the exact same millisecond. You have dynamic pricing that shifts based on local events, season, and demand signals. You have payments flowing across 220+ countries with fraud detection running on every transaction. And you have to do all of this reliably for a platform where a single outage during peak travel season means real money lost for real hosts around the world.

As of recent years, Airbnb has over 7 million active listings in 220+ countries, serves hundreds of millions of guest arrivals per year, and sees traffic spikes that correlate with holiday seasons, major events, and even viral social media moments. That’s the scale we’re designing for.

Read on →

How Reddit Works?

If you’ve ever refreshed your Reddit feed at midnight, upvoted a post, or gone down a rabbit hole in a subreddit — you’ve touched a system that serves hundreds of millions of users every day. But have you ever wondered what’s actually happening under the hood? Let’s find out.

Alt text

Reddit calls itself “the front page of the internet,” and honestly, that’s not far off. At its core, Reddit is a massive, community-driven link aggregator and discussion platform — think of it as a giant bulletin board broken into thousands of topic-specific rooms called subreddits.

As of 2024, Reddit had:

  • ~1.5 billion monthly active users visiting the site
  • ~100,000 active subreddits
  • Over ~500 million posts and billions of comments
  • Multiple millions of concurrent users at peak times

What makes Reddit an interesting system design problem is the combination of scale, real-time interaction, and community-specific complexity. You have users browsing feeds, posting content, voting, commenting in real-time, getting notifications, and all of this happening across extremely diverse communities with completely different moderation rules.

It’s not as real-time as Twitter (where every tweet needs instant fan-out). It’s not as media-heavy as YouTube. But it’s arguably more complex than both because it combines a social graph, content ranking, tree-structured comments, moderation tools, and real-time interactions all in one platform.

Let’s break it all down.

Read on →