Blogs


How Tinder Works

There is a moment every Tinder engineer has probably thought about: a user swipes right, and within a second, both people get a match notification. That notification feels instant, almost magical. But behind that single interaction is an entire distributed system firing in coordination — a recommendation engine, a geo-spatial query, a mutual-match check, a real-time push notification, and a chat channel being provisioned, all happening faster than the human brain can process what just occurred.

Tinder is not a simple CRUD app with a swiping UI on top. It is one of the most sophisticated consumer-grade distributed systems ever built. On any given day, Tinder processes over 1.6 billion swipes globally, serves users across hundreds of countries, and must deliver personalized, geo-aware recommendation feeds to millions of concurrent users, all while keeping latency below perceptible thresholds.

Alt text

The engineering challenges here are real and genuinely hard. You are dealing with write-heavy workloads from swipe events, read-heavy workloads from feed generation, real-time geo queries at planetary scale, ML-based ranking pipelines that need to be both fast and personalized, and a messaging layer that must guarantee delivery even when mobile connections are flaky. Understanding how Tinder solves these problems teaches you almost everything you need to know about modern distributed systems engineering.

This blog is going to walk through the entire architecture, piece by piece, from how a swipe is processed to how the recommendation engine decides whose profile appears next on your screen. We will cover the tradeoffs, the bottlenecks, and the engineering reasoning behind each decision. By the end, you should feel like you genuinely understand how this system works at production scale.

Core Features of Tinder

Before diving into the architecture, it helps to understand exactly what the system needs to do. Tinder’s feature set is wider than most people realize.

Read on →

How NGINX Works?

There is a good chance that every HTTP request you have made today passed through NGINX at some point. It might have been serving a static file, forwarding your request to a backend application, terminating TLS, or quietly balancing your traffic across a dozen servers. NGINX sits at the heart of a staggering amount of internet infrastructure, and yet most engineers interact with it only through config files without fully understanding what is happening underneath.

This blog is a deep dive into how NGINX actually works. Not just what it does, but why it was built the way it was, what engineering problems it solves, how it interacts with the Linux kernel, and what makes it so exceptionally fast even under enormous load. By the end, you should have the kind of intuition that lets you reason about NGINX the way a systems engineer would.

Alt text

Why NGINX Exists at All

To understand NGINX, you have to understand the world it was born into. In the early 2000s, Apache HTTP Server was the dominant web server. Apache worked on a model where every incoming connection spawned either a new process or a new thread. For a few thousand requests, this was fine. But as web traffic started growing, engineers ran into a hard wall known as the C10K problem — how do you handle 10,000 simultaneous connections on a single server?

Read on →

How Google Search Works?

Let me ask you something. You type three words into a text box. Half a second later, you are staring at ten blue links, a knowledge panel, an image carousel, and a featured snippet that almost perfectly answers your question. That page was assembled, ranked, and delivered to you from across the planet faster than you can blink.

Alt text

Now consider what had to happen behind the scenes. Someone had to crawl hundreds of billions of web pages, extract their content, understand what each page was actually about, store all of that in a way that can be queried at low latency, figure out which of the billions of candidate results is most relevant to your specific query, personalize it a little, check it for spam, and ship it to you over the network before you notice any delay. At peak hours, Google handles tens of thousands of search queries per second, globally.

This is not a solved problem. This is one of the hardest distributed systems problems that has ever been built and maintained in production. The reason it feels effortless is precisely because so much engineering is hidden beneath it.

The interesting part is not just that it works. The interesting part is why it is designed the way it is. Every caching layer, every index shard, every ranking signal, every crawl scheduler exists because someone ran into a wall at scale and had to find a way through. That is what this article is about.

We will walk through the entire system end to end. Crawling. Parsing. Indexing. Query processing. Ranking. Distributed serving. Caching. Machine learning. We will look at what happens when things go wrong, and we will talk honestly about the tradeoffs that make this architecture look the way it does.

Read on →

How Uber Computes ETA?

The magic of Uber doesn’t begin when the car arrives. It begins the instant the app tells you “how long” the wait will be. A tiny estimate — “2 minutes away” or “6 minutes away” — flashes onto your screen so casually that most people never think twice about it. Yet producing that single number requires a planet-scale system constantly processing live GPS streams, road traffic, driver movement, rider demand, map intelligence, and prediction models in real time. What looks like a simple countdown is actually the visible tip of one of the most advanced distributed systems ever engineered for everyday consumers.

That number is the Estimated Time of Arrival, and computing it correctly — at global scale, in real time, across millions of concurrent users — is genuinely one of the hardest problems in applied engineering.

Alt text

This post is a deep walkthrough of how a system like Uber’s ETA engine works. We will go through the GPS infrastructure, the map matching algorithms, the routing engines, the machine learning prediction pipelines, the streaming systems, the geo-spatial indexing, and the tradeoffs that engineers make every day to keep that number accurate and fast.

Read on →

How ChatGPT Works?

There is a moment, maybe you have felt it yourself, where you type a question into ChatGPT and within seconds you get a response that feels remarkably thoughtful. It does not just return a keyword match. It understands context, it reasons through problems, it can write code and explain concepts and help you draft emails. And it does all of this for millions of people simultaneously, in real time.

Alt text

If you are an engineer looking at that and thinking “okay, but what is actually happening behind that text box?”, this post is for you.

We are going to go deep. Not just “there’s a transformer model and it predicts tokens” deep. We are going to talk about the full engineering stack: how prompts flow through distributed systems, how GPUs communicate across data centers, how the inference pipeline is optimized for latency, how memory and context are managed, and what tradeoffs the engineering teams at OpenAI are navigating every single day. By the end, you should have a genuine mental model of how a system like ChatGPT is actually built.

Read on →