Blogs


How Instagram Works?

Instagram serves over two billion active users every month. On any given day, people upload hundreds of millions of photos and videos, watch billions of reels, send hundreds of millions of messages, and scroll through feeds that feel magically personalized to each person. Behind that experience is one of the most complex distributed systems ever built.

Alt text

This is not a shallow overview. We are going to walk through the internals — the media pipelines, the feed generation engines, the recommendation systems, the reels infrastructure, the CDN architecture, the messaging stack, the caching layers, and the engineering tradeoffs that make all of it work at a scale that is genuinely hard to comprehend.

If you are a backend engineer, a system design interview candidate, or just someone who has ever wondered what actually happens when you tap “Post” on Instagram, this is for you.

What Instagram Really Is

Instagram launched in 2010 as a simple photo-sharing app. The original architecture could probably run on a single decent server. Fast forward to today, and Instagram is a full-scale media platform with feeds, stories, reels, live streaming, direct messaging, an explore page, shopping features, creator tools, and a recommendation engine that rivals anything in the industry.

The engineering challenge is not just size. It is the combination of things that makes Instagram uniquely difficult to build:

Read on →

How Meta Serverless Works?

Somewhere inside a hyperscale data center, a user taps a button on Instagram and an invisible chain reaction begins across thousands of machines. A piece of code spins up, runs for a few milliseconds, and vanishes almost instantly. No engineer manually provisioned a server for that request. No virtual machine was waiting in advance. The infrastructure simply reacted in real time, allocating compute exactly when it was needed and disappearing when the work was done.

That is serverless computing at scale, and building it correctly is one of the hardest distributed systems problems in the industry.

Alt text

This blog is a deep technical walkthrough of how Meta-style serverless infrastructure works internally. We are going to walk through the execution pipeline, the scheduler, container isolation, cold start optimization, autoscaling, networking, observability, and the engineering decisions that tie everything together. If you are preparing for a system design interview or just want to understand what really happens under the hood of a serverless platform at hyperscale, this is for you.

What Serverless Actually Means at Scale

The word “serverless” is a bit misleading. There are absolutely servers. What serverless really means is that the people writing the business logic do not have to think about those servers. They write a function, deploy it, and the platform handles everything else: provisioning, scaling, networking, isolation, and cleanup.

AWS Lambda made this concept mainstream. But building serverless infrastructure for a company like Meta, which handles billions of daily active users, trillions of requests, and petabytes of data, requires an entirely different class of engineering. You cannot just run Lambda internally. You need something purpose-built, something that understands Meta’s workload patterns, Meta’s infrastructure constraints, and Meta’s latency requirements.

Read on →

How X Timeline Works?

There is a moment, every time you open X, that feels effortless. A feed of tweets appears. Some are from people you follow. Others are from accounts you have never seen before but somehow feel relevant. A viral post catches your eye. A trending topic surfaces at just the right time. It all feels instant.

Alt text

Behind every timeline refresh is a chain of distributed systems doing extraordinary amounts of work in milliseconds. Machines are computing your interests, traversing social graphs with hundreds of millions of edges, fetching precomputed timelines from tiered caches, ranking thousands of candidate tweets using machine learning models, and delivering the result before your thumb stops scrolling. At peak traffic, X handles hundreds of millions of active users simultaneously, each expecting their own personalized, fresh, low-latency feed.

Understanding how this actually works is one of the richest system design problems in modern software engineering. It touches distributed databases, event streaming, ML-based ranking, graph traversal, cache design, and real-time data pipelines all at once.

This post walks through the full architecture from first principles. Whether you are preparing for a system design interview or simply want to understand how large-scale social media infrastructure is built, this is the engineering deep dive you have been looking for.

Core Features of the X Timeline

Before jumping into architecture, it is worth being precise about what the timeline actually is. X has two primary feed surfaces.

The Following tab shows tweets from accounts you explicitly follow, sorted by relevance and recency. The For You tab, which is the default, is a fully personalized algorithmic feed. It pulls in tweets from accounts you follow, accounts you interact with, accounts that are popular in your network, trending content, and content from accounts you do not follow but that the recommendation engine believes you will engage with.

Read on →

How URL Shortener Works?

There is a particular kind of engineering problem that looks deceptively small from the outside. You paste a long URL into a box, click a button, and get back something like https://bit.ly/3xKp9Ld. The whole interaction takes less than a second. Behind that second, though, is a distributed system that has to do quite a lot of work — generate a globally unique short code, write it durably, cache it for fast retrieval, serve hundreds of thousands of redirects per second, collect analytics events without slowing down the redirect, scan for malicious links, and stay available across multiple data centers.

Alt text

That is the honest shape of a URL shortener at scale. A toy version you could build in an afternoon with a SQLite database and a single Flask server. A production version used by millions of people is something else entirely.

This post walks through that production version — the architecture, the tradeoffs, the engineering decisions, and the failure modes. Whether you are preparing for a system design interview, building your own shortener, or just curious how Bitly or TinyURL actually work at scale, this should give you a real picture of what is going on inside.

Why URL Shortening Became a Distributed Systems Problem

The original use case was simple enough: URLs on the web can get long and ugly, especially after query parameters and tracking strings pile up. Early shorteners were literally just a database with two columns — a short code and a long URL — and a web server that did a lookup and issued a 301 redirect. That works perfectly fine at small scale.

Read on →

How Zomato Works?

The moment you tap “Place Order” on Zomato, a massive chain of events begins instantly behind the scenes. Within seconds, the system identifies your location, finds the right restaurant, assigns a nearby delivery partner, processes your payment, and starts estimating delivery time — all before you even put your phone away. What feels simple on the surface is actually a real-time logistics system coordinating thousands of moving parts across an entire city.

Alt text

Food delivery sounds like a simple logistics problem until you actually try to build it. Then it reveals itself as one of the hardest classes of distributed systems work: real-time geo-spatial computation, dynamic resource allocation, live tracking at millions of concurrent sessions, recommendation engines that personalize without being creepy, and payment flows that must never lose money even when half the network is flaking. Zomato operates across hundreds of cities, millions of daily orders, and tens of thousands of restaurant partners. Let us walk through how a system like this actually works.

What Zomato Actually Does

Most people know Zomato as an app where you order food. But from an engineering perspective, Zomato is really three distinct systems bolted together and made to feel seamless.

The first is a restaurant discovery and search platform. This is the part that helps you find what to eat, browse menus, read reviews, and decide. It behaves like a specialized search engine with local context.

The second is a transactional ordering and logistics system. This handles the actual purchase: taking your order, processing payment, routing it to the restaurant, orchestrating the delivery from pickup to dropoff.

The third is a personalization and recommendations engine. This is the layer that learns your preferences, predicts what you will want to eat on a Tuesday night versus a Sunday brunch, and surfaces restaurant and dish suggestions that feel eerily relevant.

These three systems share infrastructure but have very different performance and correctness requirements. The search system can tolerate slight staleness. The ordering system cannot afford to lose a transaction. The recommendation system can afford to be a little wrong as long as it is interesting. Understanding that these systems have different tradeoffs is the first thing a good engineer thinks about before designing anything.

Core Features at a Glance

Before going deep, it helps to enumerate what Zomato actually provides so we can map features to systems:

Read on →