Blogs


AI Adoption vs AI Reality

There is a version of the AI story that goes like this: companies saw ChatGPT, realized the future had arrived, moved fast, deployed AI everywhere, and transformed their businesses. That version makes for good investor decks. The reality is messier, more expensive, and far more interesting.

What actually happened is that almost every company launched AI initiatives simultaneously, many of them with very little understanding of what production AI systems actually cost, how they behave at scale, and what it takes to keep them reliable. The gap between the AI demo and the AI production system turned out to be enormous. And most enterprises are still in the process of figuring out how to close that gap.

Alt text

This piece is not about whether AI is real or valuable. It is. This is about understanding what building serious AI systems inside a company actually requires, what it costs, what breaks, and what you should be thinking about if you are an engineer, architect, or technology leader trying to navigate this space honestly.

The Rush That Started Everything

When OpenAI released ChatGPT in late 2022, something unusual happened in the enterprise technology world. Usually, new platforms take years to reach boardroom-level urgency. Cloud computing took most of the 2000s to become a board-level concern. Mobile took years after the iPhone before enterprise IT leaders were forced to take it seriously. AI did not have that grace period.

Read on →

How Slack Works?

There is a version of Slack that most engineers imagine when they first think about how it works. You type a message, hit Enter, and it shows up on someone else’s screen. Simple enough. But the moment you start pulling at the threads of that interaction, things get complicated fast. What happens when a thousand people are in the same channel? What happens when someone is on mobile with spotty connectivity? What happens when your company has 80,000 employees and legal needs a full audit trail of every message sent over the last three years?

Alt text

Slack is not just a chat application. It is a distributed, event-driven, real-time collaboration platform operating at a scale where individual engineering decisions ripple out into millions of daily user experiences. It handles hundreds of millions of messages per day, maintains persistent WebSocket connections for millions of concurrent users, and does all of this while offering sub-second message delivery, reliable push notifications, full-text search across years of history, and enterprise-grade security.

Building any one of those systems in isolation is a meaningful engineering problem. Building them together, making them interact reliably under load, and then keeping them running around the clock for paying enterprise customers is a genuinely difficult distributed systems challenge. This blog is about how that all works.

We will move from the high-level shape of the system down into the guts of individual subsystems. Some sections will feel like a system design interview. Others will feel like a postmortem. That is intentional. The goal is not just to understand Slack academically but to develop real engineering instincts about why real-time collaboration platforms are designed the way they are.

Core Features of Slack

Before diving into the architecture, it helps to enumerate what the system actually needs to do. Engineers sometimes skip this step and then wonder why their schema does not support threads or why their fanout logic breaks on large channels.

Read on →

How Apple Airtags Work?

There is a small white disc sitting on your key ring right now. It weighs eleven grams. It has no GPS chip, no cellular radio, and no Wi-Fi antenna. Its battery lasts over a year. And yet, if you drop your keys somewhere in downtown Tokyo, there is a very good chance you will get an accurate location update within minutes, without Apple ever knowing where your keys are.

Alt text

That combination of properties is not magic. It is one of the most carefully engineered consumer-scale distributed systems in the world, and the design decisions behind it touch almost every interesting area of modern infrastructure: Bluetooth Low Energy networking, end-to-end encrypted relay systems, crowd-sourced location discovery, ultra-wideband spatial positioning, privacy-preserving cryptography, and a globally distributed event pipeline spanning hundreds of millions of devices.

This blog walks through the full internal architecture of Apple AirTags, from the radio signals pulsing out of the hardware to the cloud infrastructure that receives, processes, and delivers location events back to owners. By the end, you should understand not just what AirTags do, but why each system is designed the way it is, what the tradeoffs are, and how this infrastructure scales to a network of over a billion Apple devices.

What Apple AirTags Are and Why They Matter

Before AirTags, tracking a lost item typically meant relying on GPS. GPS trackers work reasonably well outdoors but require cellular or Wi-Fi connectivity to upload their position, have batteries measured in days or weeks, and broadcast their presence loudly to anyone listening. They also cost more, require SIM cards in some cases, and fundamentally cannot work indoors where GPS signals are too weak.

AirTags sidestep all of that. Instead of using their own GPS or cellular connection, they hitchhike on the massive installed base of Apple devices around them. An AirTag sitting inside a bag on a train broadcasts a short Bluetooth Low Energy advertisement packet. Every iPhone and iPad nearby receives that advertisement, silently looks up the AirTag’s rotating public key, uses it to encrypt the current GPS position of the iPhone, and uploads that encrypted blob to Apple’s servers. The AirTag’s owner can then download that blob, decrypt it with the matching private key that only they hold, and recover the location.

Read on →

How Google Docs Works?

There is a moment every developer takes for granted. You open a Google Doc, your colleague is already in it, and you both start typing at the same time. The cursor moves, text appears, changes propagate in near real-time, and nothing breaks. It just works.

Alt text

What happens underneath that blinking cursor is one of the most sophisticated distributed systems problems in software engineering. You have multiple users modifying shared state simultaneously, across different machines, different networks, different continents. You have to handle network partitions, conflicting edits, stale state, and offline scenarios. You need to guarantee that no matter how chaotic the concurrent editing gets, every user eventually sees the same document in a consistent state.

Google Docs is not just a word processor. It is a distributed system that happens to look like a word processor.

This article is a genuine engineering walkthrough of how Google Docs works internally. We will go through collaborative editing algorithms, real-time synchronization infrastructure, operational transformation, CRDT concepts, version history systems, autosave mechanisms, offline editing, and scalability challenges. By the end, you should have a solid mental model of why the system is designed the way it is, not just what it does.

What Makes Collaborative Editing So Hard

Before we look at how Google Docs solves the problem, it is worth understanding why the problem is hard in the first place.

Read on →

How Instagram Works?

Instagram serves over two billion active users every month. On any given day, people upload hundreds of millions of photos and videos, watch billions of reels, send hundreds of millions of messages, and scroll through feeds that feel magically personalized to each person. Behind that experience is one of the most complex distributed systems ever built.

Alt text

This is not a shallow overview. We are going to walk through the internals — the media pipelines, the feed generation engines, the recommendation systems, the reels infrastructure, the CDN architecture, the messaging stack, the caching layers, and the engineering tradeoffs that make all of it work at a scale that is genuinely hard to comprehend.

If you are a backend engineer, a system design interview candidate, or just someone who has ever wondered what actually happens when you tap “Post” on Instagram, this is for you.

What Instagram Really Is

Instagram launched in 2010 as a simple photo-sharing app. The original architecture could probably run on a single decent server. Fast forward to today, and Instagram is a full-scale media platform with feeds, stories, reels, live streaming, direct messaging, an explore page, shopping features, creator tools, and a recommendation engine that rivals anything in the industry.

The engineering challenge is not just size. It is the combination of things that makes Instagram uniquely difficult to build:

Read on →