Blogs


How X Timeline Works?

There is a moment, every time you open X, that feels effortless. A feed of tweets appears. Some are from people you follow. Others are from accounts you have never seen before but somehow feel relevant. A viral post catches your eye. A trending topic surfaces at just the right time. It all feels instant.

Alt text

Behind every timeline refresh is a chain of distributed systems doing extraordinary amounts of work in milliseconds. Machines are computing your interests, traversing social graphs with hundreds of millions of edges, fetching precomputed timelines from tiered caches, ranking thousands of candidate tweets using machine learning models, and delivering the result before your thumb stops scrolling. At peak traffic, X handles hundreds of millions of active users simultaneously, each expecting their own personalized, fresh, low-latency feed.

Understanding how this actually works is one of the richest system design problems in modern software engineering. It touches distributed databases, event streaming, ML-based ranking, graph traversal, cache design, and real-time data pipelines all at once.

This post walks through the full architecture from first principles. Whether you are preparing for a system design interview or simply want to understand how large-scale social media infrastructure is built, this is the engineering deep dive you have been looking for.

Core Features of the X Timeline

Before jumping into architecture, it is worth being precise about what the timeline actually is. X has two primary feed surfaces.

The Following tab shows tweets from accounts you explicitly follow, sorted by relevance and recency. The For You tab, which is the default, is a fully personalized algorithmic feed. It pulls in tweets from accounts you follow, accounts you interact with, accounts that are popular in your network, trending content, and content from accounts you do not follow but that the recommendation engine believes you will engage with.

Read on →

How URL Shortener Works?

There is a particular kind of engineering problem that looks deceptively small from the outside. You paste a long URL into a box, click a button, and get back something like https://bit.ly/3xKp9Ld. The whole interaction takes less than a second. Behind that second, though, is a distributed system that has to do quite a lot of work — generate a globally unique short code, write it durably, cache it for fast retrieval, serve hundreds of thousands of redirects per second, collect analytics events without slowing down the redirect, scan for malicious links, and stay available across multiple data centers.

Alt text

That is the honest shape of a URL shortener at scale. A toy version you could build in an afternoon with a SQLite database and a single Flask server. A production version used by millions of people is something else entirely.

This post walks through that production version — the architecture, the tradeoffs, the engineering decisions, and the failure modes. Whether you are preparing for a system design interview, building your own shortener, or just curious how Bitly or TinyURL actually work at scale, this should give you a real picture of what is going on inside.

Why URL Shortening Became a Distributed Systems Problem

The original use case was simple enough: URLs on the web can get long and ugly, especially after query parameters and tracking strings pile up. Early shorteners were literally just a database with two columns — a short code and a long URL — and a web server that did a lookup and issued a 301 redirect. That works perfectly fine at small scale.

Read on →

How Zomato Works?

The moment you tap “Place Order” on Zomato, a massive chain of events begins instantly behind the scenes. Within seconds, the system identifies your location, finds the right restaurant, assigns a nearby delivery partner, processes your payment, and starts estimating delivery time — all before you even put your phone away. What feels simple on the surface is actually a real-time logistics system coordinating thousands of moving parts across an entire city.

Alt text

Food delivery sounds like a simple logistics problem until you actually try to build it. Then it reveals itself as one of the hardest classes of distributed systems work: real-time geo-spatial computation, dynamic resource allocation, live tracking at millions of concurrent sessions, recommendation engines that personalize without being creepy, and payment flows that must never lose money even when half the network is flaking. Zomato operates across hundreds of cities, millions of daily orders, and tens of thousands of restaurant partners. Let us walk through how a system like this actually works.

What Zomato Actually Does

Most people know Zomato as an app where you order food. But from an engineering perspective, Zomato is really three distinct systems bolted together and made to feel seamless.

The first is a restaurant discovery and search platform. This is the part that helps you find what to eat, browse menus, read reviews, and decide. It behaves like a specialized search engine with local context.

The second is a transactional ordering and logistics system. This handles the actual purchase: taking your order, processing payment, routing it to the restaurant, orchestrating the delivery from pickup to dropoff.

The third is a personalization and recommendations engine. This is the layer that learns your preferences, predicts what you will want to eat on a Tuesday night versus a Sunday brunch, and surfaces restaurant and dish suggestions that feel eerily relevant.

These three systems share infrastructure but have very different performance and correctness requirements. The search system can tolerate slight staleness. The ordering system cannot afford to lose a transaction. The recommendation system can afford to be a little wrong as long as it is interesting. Understanding that these systems have different tradeoffs is the first thing a good engineer thinks about before designing anything.

Core Features at a Glance

Before going deep, it helps to enumerate what Zomato actually provides so we can map features to systems:

Read on →

How Netflix Works?

Every second, somewhere in the world, a person presses play on Netflix and expects the impossible to feel effortless. In the brief instant before the first frame appears, an enormous distributed system has already selected the optimal video quality, routed the request to the nearest CDN edge server, adapted to the user’s network conditions, and begun streaming millions of bytes across the internet — all fast enough that the viewer never thinks about it. No loading screens. No blurry frames. Just seamless playback delivered on demand, at planetary scale.

Alt text

What makes that moment deceptively simple is the enormous machinery running beneath it. At any given time, Netflix is serving somewhere north of 200 million active subscribers across 190 countries. It accounts for roughly 15% of global internet traffic during peak hours. Its recommendation engine influences what more than 80% of subscribers watch next. And the system has to work whether you are on fiber in Seoul, a spotty cellular connection in rural Brazil, or a smart TV in a hotel in Amsterdam.

Most distributed systems have to handle scale. Netflix has to handle scale while also being real-time, fault-tolerant, and nearly invisible to the user. The experience degrades the moment you notice it.

This post is an attempt to explain the full system. Not just the broad strokes, but the reasons behind each architectural decision, the tradeoffs made at each layer, and what happens when things go wrong. We will move from client to CDN to cloud, from video bytes to neural network embeddings, and from API gateway to analytics pipelines.

Core Features of Netflix

Before touching the architecture, it is worth being precise about what Netflix actually does. The feature list sounds mundane but each item carries serious engineering weight.

Read on →

How Spotify Works?

A fraction of a second before music starts flowing through your headphones, an invisible chain of systems has already sprung into action. Your device must figure out what track to play next, determine whether the audio is stored locally or needs to be fetched, connect to the nearest CDN edge server, stream and decode compressed audio packets in real time, and deliver uninterrupted playback before you even notice the delay. At Spotify’s scale — serving hundreds of millions of listeners across wildly different devices, bandwidth conditions, and geographies — this is not just streaming. It is a massive distributed system constantly balancing speed, reliability, and personalization, while quietly predicting the next song you are most likely to fall in love with.

Alt text

That is not a simple problem. It is one of the most interesting distributed systems challenges in consumer software, combining real-time media delivery, personalization at scale, search infrastructure, offline sync, and a licensing system that would give most engineers a headache. This article is a deep walk through all of it. Whether you are preparing for a system design interview, curious about how streaming infrastructure really works, or building something similar at a smaller scale, the goal is to leave you with a genuine mental model, not just a list of buzzwords.

Why Music Streaming Is Hard

Before diving into architecture, it is worth spending a moment on why this problem is genuinely difficult, because the instinctive answer — “just serve audio files from a server” — misses most of what makes Spotify interesting.

Read on →