How Pastebin Works?

There is something deceptively simple about Pastebin. You paste some text, click a button, and get a short URL back. You share that URL with someone else, they open it, and they see your text. That is the entire product in one sentence. And yet, building Pastebin at the scale of tens of millions of daily users, billions of stored pastes, and a global audience requires you to make dozens of careful engineering decisions that would fill a whiteboard wall from top to bottom.

The reason Pastebin is a classic system design interview question is not because it is hard to understand — it is because it forces you to think through the full lifecycle of a piece of data in a distributed system: write it, store it, serve it to millions of readers, expire it gracefully, cache it intelligently, and make sure no one abuses it to host malware or leak credentials. Each of those steps has gotchas.

Alt text

This post is a full engineering deep dive. We will go through the architecture layer by layer, explain every major decision, and make sure you understand not just what Pastebin does but why it is built the way it is.

What Pastebin Actually Is

Before we go deep into the systems, let us be clear about the product. Pastebin is a text-sharing service. Users paste raw text, often source code, configuration files, log dumps, or command output, and get a short URL they can share. The core workflow is:

  • A user submits a block of text
  • The system assigns it a unique short ID and URL
  • Other users visit that URL and read the text
  • The paste may expire after a set time or persist forever

The platform also supports optional features: syntax highlighting for dozens of programming languages, public vs private paste visibility, user accounts, paste search, trending paste lists, API access, and raw text mode. These features add complexity, but the core is surprisingly minimal.

What makes it hard is scale. Pastebin reportedly handles hundreds of millions of page views per month. Even modest competitors need to handle read traffic that can be orders of magnitude higher than write traffic. A paste can be written once and read tens of thousands of times, especially if it goes viral on a developer forum or Reddit. The read-to-write ratio sits somewhere around 10:1 to 100:1 depending on traffic patterns. That asymmetry shapes every architectural decision.

Core Features and Their Engineering Implications

Let us map each feature to its engineering challenge before we look at the full architecture.

Paste creation seems like a simple write operation, but at millions of pastes per day, it becomes a high-throughput write pipeline with questions around ID generation, duplicate detection, and metadata extraction.

Paste retrieval is the hot path. Most reads are anonymous users loading a paste URL. This has to be fast — sub-100ms ideally — regardless of whether the paste was created five seconds ago or two years ago.

Syntax highlighting requires the system to detect what programming language a block of text is written in and then render colored HTML output from it. Doing this on every request is expensive. Caching the rendered HTML is critical.

Expiration sounds simple but distributed cleanup at scale requires coordination. You cannot just set a database TTL and call it a day.

Anonymous access is a double-edged sword. It lowers the barrier to sharing, but it also means you have no accountability. Spam, malware, credential dumps, and illegal content become persistent threats.

Trending pastes require view counting, aggregation, and ranking, all of which must work under heavy read load without slowing down the read path itself.

High-Level Architecture

Let us start with the bird’s-eye view and then drill down into each subsystem.

flowchart TD; A[Web Browser or API Client]; B[CDN Edge Node]; C[API Gateway]; D[Paste Service]; E[Metadata Service]; F[Syntax Highlight Service]; G[Redis Cache Cluster]; H[PostgreSQL Metadata DB]; I[Object Storage S3]; J[Kafka Queue]; K[Background Workers]; L[Elasticsearch]; M[Moderation Service]; A –> B; B –> C; C –> D; D –> G; D –> H; D –> I; D –> J; J –> F; J –> K; J –> M; K –> L; E –> H;

The CDN edge nodes handle the majority of read traffic. For pastes that have already been served, the CDN can return the cached HTML or raw text without touching the origin at all. This is where the biggest scaling wins come from.

The API Gateway handles authentication, rate limiting, and request routing. Every request from a browser or API client passes through here.

The Paste Service is the core write and read handler. It coordinates between the cache, metadata database, and object storage.

Kafka sits in the middle as a durable event bus. When a paste is created, an event is published to Kafka, and background workers pick it up for syntax highlighting, search indexing, and moderation scanning asynchronously. This keeps the write path fast by not doing expensive work inline.

Paste Creation Pipeline

When a user submits a paste, here is what happens step by step.

flowchart TD; A[User Submits Paste]; B[API Gateway Rate Check]; C[Paste Service Receives Request]; D[Validate Input Length and Content]; E[Generate Unique Short ID]; F[Detect Programming Language]; G[Compress Text Content]; H[Write to Object Storage]; I[Write Metadata to PostgreSQL]; J[Publish Event to Kafka]; K[Return Short URL to User]; L[Async Syntax Highlight Worker]; M[Async Moderation Worker]; N[Async Search Index Worker]; A –> B; B –> C; C –> D; D –> E; E –> F; F –> G; G –> H; H –> I; I –> J; J –> K; J –> L; J –> M; J –> N;

Input validation happens first. You check text length, reject obviously malformed requests, and enforce per-user or per-IP rate limits. This should be as early as possible in the pipeline to avoid doing any expensive work for bad requests.

ID generation is a critical subsystem we will cover in depth shortly. The short ID needs to be unique, short, non-sequential (to prevent scraping), and generated fast.

Language detection is done upfront as a lightweight heuristic to populate the syntax type field in metadata. A more thorough highlighting pass happens asynchronously.

Compression matters more than most engineers expect. A typical paste of 5,000 characters of Python code compresses to less than 40% of its original size with zstd. At billions of pastes, that storage saving is substantial.

The write path splits here. The compressed text blob goes to object storage. The metadata, including paste ID, owner, creation time, expiry time, visibility, and language hint, goes to the relational database. Keeping content and metadata separate is a foundational design choice.

The write to Kafka is the last step before returning the URL to the user. The heavy lifting of syntax highlighting and search indexing happens outside the critical path, which keeps paste creation fast even when the highlighting pipeline is under load.

Unique ID Generation System

This is deceptively complex. The ID is what becomes the URL. A paste at pastebin.com/aB3xK has the ID aB3xK. You need this ID to be:

  • Short enough to fit in a tweet or a chat message
  • Unique across all pastes ever created
  • Not guessable or predictable (to prevent scraping private pastes)
  • Generated fast without a central coordination bottleneck

The standard approach is Base62 encoding. Base62 uses characters a-z, A-Z, and 0-9, which gives you 62 possible characters per position. A 6-character Base62 string can represent 626 = approximately 56 billion unique values. That is enough headroom for a very long time.

How you generate the raw number matters a lot. Sequential IDs from a database autoincrement are fast and collision-free but predictable — an attacker can iterate through them to scrape all pastes. Random UUIDs eliminate predictability but introduce collision risk and require checking for duplicates on every write.

A smarter approach used in production systems is to pre-generate a pool of short IDs. A background service generates batches of random 6-character Base62 IDs, verifies uniqueness against the database, and stores them in a Redis set. The paste creation service pops an ID from this pool atomically. This gives you randomness, uniqueness guarantees, and fast generation without a blocking database write on the hot path.

Another approach is to use a distributed ID generation system similar to Twitter’s Snowflake: a 64-bit ID composed of a timestamp, machine ID, and sequence number. This is truly decentralized but produces numeric IDs that need Base62 encoding to become short URL slugs.

Strategy Uniqueness Predictability Scalability Collision Handling
DB Auto-increment + Base62 Guaranteed High risk Bottleneck at DB Not needed
Random + Collision Check High Low risk Moderate Retry on collision
Pre-generated ID Pool Guaranteed Low risk Excellent Not needed
Snowflake-style Distributed ID Guaranteed Moderate risk Excellent Not needed

Paste Storage Architecture

The storage layer is where most of the interesting engineering decisions live.

Content storage and metadata storage are deliberately separated. Here is why.

The text of a paste can be anywhere from 10 bytes to 500 kilobytes. You have billions of these. Storing them in a relational database as large text columns creates serious problems: it bloats table sizes, makes backups enormous, slows down index scans, and makes it hard to apply storage tiering. Object storage, such as Amazon S3, Google Cloud Storage, or a self-hosted system like MinIO, is purpose-built for storing arbitrary binary blobs cheaply and reliably. It scales to exabytes. It handles replication natively. And it is significantly cheaper per gigabyte than database storage.

Metadata, by contrast, is small and highly structured. Each paste has an ID, owner user ID, creation timestamp, expiry timestamp, visibility flag, language type, view count, content hash for deduplication, and storage path pointing to the blob in object storage. This fits perfectly in a relational database like PostgreSQL, where you get strong consistency, indexing, and complex queries.

flowchart TD; A[Paste Service]; B[Redis Hot Cache]; C[PostgreSQL Metadata Store]; D[Object Storage Blob Store]; E[Cold Archive Storage]; A –> B; A –> C; A –> D; D –> E;

Hot vs cold storage tiering is an important optimization. Most pastes get the vast majority of their traffic in the first 48 hours after creation. After that, traffic drops dramatically. You can tier your storage accordingly: recently created or frequently accessed pastes live in fast, more expensive storage with low latency. Older, infrequently accessed pastes are migrated to cheaper archival storage like S3 Glacier or equivalent. A lifecycle policy automates this migration based on last-access time.

Compression is applied before writing to object storage. The zstd algorithm is preferred in modern systems because it offers a very good compression ratio with fast decompression speed. Source code, configuration files, and log dumps all compress extremely well. A system storing a billion pastes with an average compressed size of 2KB is storing 2 terabytes of blob data — very manageable.

Deduplication is worth doing if your traffic patterns include a lot of repeated pastes. A SHA-256 hash of the raw content, stored in the metadata database, lets you detect exact duplicates. If the same content is pasted twice, you store one blob and two metadata records pointing to it. This can save significant storage at scale.

Syntax Highlighting Pipeline

Syntax highlighting sounds like a frontend concern, but at scale it becomes a backend infrastructure problem.

The naive approach is: user loads paste, backend detects language, runs a parser, generates colored HTML, sends it to the browser. The problem is that parsing and tokenizing source code is CPU-intensive. A single server can do thousands of highlighting operations per second, but at millions of requests per minute, you need either serious horizontal scaling or a smarter approach.

The smarter approach is to treat the highlighted HTML as a derived, cacheable artifact. When a paste is created, a background worker runs the syntax highlighter and stores the resulting HTML in object storage alongside the raw text blob. When a user later loads that paste, the pre-rendered HTML is served directly. The highlighting computation happens once, not on every request.

flowchart TD; A[Paste Created Event on Kafka]; B[Highlight Worker Consumes Event]; C[Load Raw Text from Object Storage]; D[Detect Language If Not Specified]; E[Run Syntax Parser Pygments or Tree-sitter]; F[Generate HTML with CSS Classes]; G[Store Highlighted HTML in Object Storage]; H[Update Metadata Record with Highlight Path]; I[Invalidate Cache Entry If Exists]; A –> B; B –> C; C –> D; D –> E; E –> F; F –> G; G –> H; H –> I;

Language detection uses a combination of signals: the file extension the user specified, the explicit language selection from a dropdown, and fallback heuristics based on the content itself. Libraries like Linguist (used by GitHub) or Pygments’ built-in lexer guessing can handle this reasonably well for common languages.

Security is a real concern in the highlighting pipeline. Syntax highlighting libraries parse untrusted user input. A maliciously crafted input that exploits a parser bug could cause resource exhaustion or even code execution. Highlighting workers should run in isolated sandboxes with memory limits and CPU quotas. Any input that causes the parser to take more than a few hundred milliseconds should be killed and logged.

Client-side vs server-side highlighting is a legitimate tradeoff. Libraries like highlight.js and Prism.js run in the browser, which offloads the CPU cost from your servers entirely. The tradeoff is that the browser has to download a parsing library (100-200KB gzipped), and initial render requires JavaScript execution. For a read-heavy service with anonymous users who visit once and leave, client-side highlighting can be a good choice because it reduces server cost dramatically.

Caching System Deep Dive

Caching is the most important performance lever in a Pastebin-like system. Without aggressive caching, the raw database and object storage cannot handle the read volume cost-effectively.

CDN caching is the outermost layer. When a paste URL is accessed for the first time, the CDN fetches it from the origin and caches it at the edge node closest to the user. Every subsequent request for that paste from the same geographic region gets served from the CDN cache in a few milliseconds. For a popular paste, CDN offload can be 99% or higher — meaning 99 out of every 100 requests never reach your application servers at all.

The challenge with CDN caching is cache invalidation. If a user creates a paste and then deletes it, or if the system expires a paste, you need to tell the CDN to stop serving it. Most CDN providers offer API-based cache purge, but it is not instantaneous — it can take seconds to propagate across all edge nodes globally.

Redis caching sits between the application servers and the database. Hot pastes, meaning those receiving frequent traffic, are cached in Redis as serialized metadata objects and raw text. A typical Redis entry for a paste looks like this:

KEY: paste:{id}:meta
VALUE: {id, owner, created_at, expires_at, language, view_count, storage_path}
TTL: 3600 seconds

KEY: paste:{id}:raw
VALUE: <compressed raw text>
TTL: 1800 seconds

KEY: paste:{id}:html
VALUE: <pre-rendered highlighted HTML>
TTL: 3600 seconds

The raw text and highlighted HTML are cached separately because different API consumers want different things. Raw mode requests skip the HTML cache.

Hotspot mitigation is a real problem. If a paste goes viral — say, a popular developer posts it on Hacker News — the request rate can spike to thousands of hits per second in minutes. A single Redis key for that paste becomes a thundering herd problem: thousands of requests simultaneously miss the cache, all rush to the database, and you overwhelm the origin. The standard solution is request coalescing: only one request goes to the database when there is a cache miss, and all other concurrent requests wait for that one result. Libraries like Singleflight in Go implement this pattern cleanly.

Cache Layer What It Stores TTL Strategy Eviction Policy
CDN Edge Full HTTP responses Cache-Control headers LRU per edge node
Redis Cluster Metadata, raw text, HTML Seconds to hours allkeys-lru
Application Memory Ultra-hot paste metadata Seconds Fixed-size LRU map

Expiration and Cleanup System

Pastes can have expiry times ranging from ten minutes to never. The expiry system needs to:

  • Track when each paste expires
  • Stop serving expired pastes promptly
  • Reclaim storage from expired pastes
  • Handle the cleanup at scale without blocking the read path

The simplest approach is to store an expires_at timestamp in the metadata database and check it on every read. If expires_at is in the past, return a 404. This works and is very reliable, but it means expired pastes still occupy storage until a background job cleans them up.

The background cleanup system runs as a scheduled job. It queries the metadata database for pastes where expires_at < now() and deletes them in batches: remove the metadata record, issue a delete request to object storage, and remove any cache entries. This is done in small batches to avoid locking up the database.

flowchart TD; A[Scheduled Cleanup Job Every 5 Minutes]; B[Query PostgreSQL for Expired Pastes]; C[Batch Process Up to 1000 IDs]; D[Delete Object Storage Blobs]; E[Delete Metadata Records]; F[Purge Redis Cache Entries]; G[Publish Deletion Events to Kafka]; H[CDN Purge API Call]; A –> B; B –> C; C –> D; D –> E; E –> F; F –> G; G –> H;

For very high-volume expiry systems, a dedicated time-series store or a sorted set in Redis works well. You store paste IDs in a Redis sorted set where the score is the expiry Unix timestamp. A worker periodically calls ZRANGEBYSCORE to get all IDs that expired before now and processes them. This is faster than a full database scan for recent expirations.

The CDN purge is the most operationally annoying part of expiration. Expired pastes might still be cached at CDN edges. The cleanup job needs to call the CDN’s purge API. Most providers rate-limit purge requests, so you need to batch them and handle throttling.

Read Path Optimization

The read path is where you spend the most engineering effort in a system like this, because it is the hot path that most users experience.

flowchart TD; A[User Opens Paste URL]; B[CDN Edge Cache Hit?]; C[Return Cached Response]; D[CDN Forwards to API Gateway]; E[Paste Service Receives Request]; F[Check Redis Cache]; G[Return Cached Paste Data]; H[Check PostgreSQL for Metadata]; I[Paste Expired or Not Found?]; J[Return 404]; K[Fetch Blob from Object Storage]; L[Populate Redis Cache]; M[Return Paste to User]; N[CDN Caches Response]; A –> B; B –>|Yes| C; B –>|No| D; D –> E; E –> F; F –>|Hit| G; F –>|Miss| H; H –> I; I –>|Yes| J; I –>|No| K; K –> L; L –> M; M –> N;

The happy path for a popular paste is: CDN edge hit, response returned in under 10 milliseconds. The user never reaches your servers.

The second happy path is: CDN miss, Redis hit, response returned in 20-50 milliseconds. One fast network round trip to your cache cluster.

The slow path is: CDN miss, Redis miss, metadata DB lookup, object storage fetch. This might take 100-300 milliseconds. You want this to happen as rarely as possible for pastes that are actually being accessed. Your cache sizing and TTL strategy determines how often you fall through to this path.

View counting is a subtle problem. You want to track how many times a paste has been viewed, but incrementing a database counter on every page view is a terrible idea at scale. The standard solution is to use a Redis counter with periodic flush. On each request, you do a Redis INCR on the paste’s view counter. A background job periodically reads these counters and flushes them to the database in batch. You trade real-time accuracy for performance, which is almost always the right call.

Search and Trending Systems

Full-text search over billions of pastes is a hard problem. You cannot do this in PostgreSQL with standard B-tree indexes. You need a dedicated search engine.

Elasticsearch or OpenSearch is the typical choice. When a paste is created and passes moderation, a background worker indexes it in the search engine. The document includes the paste ID, content text, language, creation time, and visibility flag. Private pastes are excluded from the index.

The indexing pipeline is asynchronous, which means there is a lag between paste creation and search availability. For most use cases this is acceptable — a few seconds of indexing lag is fine for a search feature.

Trending pastes are calculated using view counts aggregated over a time window. A simple approach is to maintain a Redis sorted set where the score is the number of views in the last hour, updated on each page view. A background job reads the top N entries from this sorted set to populate the trending page. The sorted set is updated incrementally, and the trending page is cached so it does not need to be recalculated on every request.

Security and Abuse Prevention

Pastebin-style platforms are attractive to bad actors. The anonymous, no-friction paste creation model is exactly what spammers, malware distributors, and credential dumpers want. If you build Pastebin, you will spend a significant amount of engineering time fighting abuse.

Rate limiting is the first line of defense. Per-IP and per-user-account rate limits on paste creation prevent automated bulk creation. Anonymous IPs get stricter limits than authenticated accounts. Rate limit state is stored in Redis using a sliding window counter.

Content scanning runs asynchronously after paste creation. A moderation worker pulls the paste content and runs it through several checks:

  • Known malware signatures using YARA rules
  • URL blacklists to detect phishing links
  • Leaked credential patterns using regex matching
  • Text classifiers trained to detect spam content

If a paste triggers these checks, it is flagged for human review or automatically deleted depending on confidence level.

CAPTCHA is gating for anonymous paste creation at high traffic times. If an IP is creating pastes at a rate that looks automated, force a CAPTCHA challenge before accepting further pastes.

Private paste visibility requires that the access control system actually works. Private pastes should never appear in search results, trending lists, or any public index. The system needs to enforce this at every layer: the search indexer skips private pastes, the CDN caches should never be shared across users for private content, and metadata queries must always filter by visibility.

Abuse Type Detection Method Response Engineering Cost
Spam paste creation Rate limiting, IP reputation Block, CAPTCHA Low
Malware hosting YARA rules, signature scanning Auto-delete, report Medium
Credential dumps Pattern matching, entropy analysis Flag for review Medium
Phishing links URL blacklist, ML classifier Block URLs, delete paste High
Bulk scraping Request pattern detection Rate limit, block Low

Database Design

The metadata database needs to support fast reads by paste ID, efficient expiry queries, and user-specific paste listings. Here is a representative schema.

CREATE TABLE pastes (
  id            VARCHAR(10)   PRIMARY KEY,
  owner_id      UUID,
  title         VARCHAR(255),
  language      VARCHAR(64),
  visibility    SMALLINT      DEFAULT 0,  -- 0=public, 1=unlisted, 2=private
  storage_path  VARCHAR(512)  NOT NULL,
  content_hash  CHAR(64),
  byte_size     INT,
  view_count    BIGINT        DEFAULT 0,
  created_at    TIMESTAMPTZ   NOT NULL DEFAULT now(),
  expires_at    TIMESTAMPTZ,
  deleted_at    TIMESTAMPTZ
);

CREATE INDEX idx_pastes_owner ON pastes(owner_id, created_at DESC);
CREATE INDEX idx_pastes_expires ON pastes(expires_at) WHERE expires_at IS NOT NULL;
CREATE INDEX idx_pastes_hash ON pastes(content_hash);

CREATE TABLE users (
  id            UUID          PRIMARY KEY DEFAULT gen_random_uuid(),
  username      VARCHAR(64)   UNIQUE NOT NULL,
  email         VARCHAR(255)  UNIQUE,
  password_hash VARCHAR(128),
  api_key       VARCHAR(64)   UNIQUE,
  created_at    TIMESTAMPTZ   NOT NULL DEFAULT now(),
  is_banned     BOOLEAN       DEFAULT false
);

CREATE TABLE paste_views (
  paste_id      VARCHAR(10)   NOT NULL,
  viewed_at     DATE          NOT NULL,
  view_count    INT           NOT NULL DEFAULT 1,
  PRIMARY KEY (paste_id, viewed_at)
);

CREATE TABLE reports (
  id            SERIAL        PRIMARY KEY,
  paste_id      VARCHAR(10)   NOT NULL,
  reporter_id   UUID,
  reason        VARCHAR(64),
  created_at    TIMESTAMPTZ   NOT NULL DEFAULT now(),
  resolved_at   TIMESTAMPTZ
);

The expires_at index is partial, covering only rows where the column is not null. This keeps the index small and fast for the cleanup job.

The paste_views table uses daily aggregation rather than individual row per view. Inserting one row per view at millions of views per day would be catastrophically slow. Instead, a background worker flushes Redis view counters into this table as daily aggregates.

Sharding becomes necessary when the pastes table grows beyond what a single PostgreSQL instance can handle efficiently. The natural shard key is the paste ID. Because IDs are random Base62 strings, they distribute evenly across shards without hotspots. A consistent hashing layer routes reads and writes to the correct shard.

CDN and Global Delivery

The CDN is one of the most impactful infrastructure decisions for a service like Pastebin. Without it, a popular paste creates a traffic spike that hits your origin servers directly. With it, that spike is absorbed at the edge.

The cache key for a paste response is the paste ID combined with the requested format: HTML view, raw text, or highlighted HTML. Different formats need separate cache entries because a user hitting the raw format should not get the rendered HTML.

Cache-Control headers need to be set thoughtfully. For public pastes, a Cache-Control: public, max-age=300 header tells the CDN and browsers to cache the response for 5 minutes. For private pastes, Cache-Control: private, no-store prevents CDN caching entirely — you must never let private paste content sit in a shared CDN cache where another user might get it.

Edge regions matter for latency. A Pastebin user in Tokyo getting their response from a CDN edge node in Singapore will have a much better experience than one getting it from a server in Virginia. Providers like Cloudflare, Fastly, and AWS CloudFront operate thousands of edge nodes globally.

API Infrastructure

The public API is a significant traffic source for Pastebin. Developer tools, IDE plugins, CI/CD scripts, and custom automation all create and read pastes programmatically.

API requests carry an API key in the header, which the API gateway validates before forwarding the request. Keys are hashed before storage in the database — you never store the raw API key, only its SHA-256 hash.

Rate limiting for API traffic is per API key with configurable quotas. Free tier keys might allow 100 paste creations per day. Paid tiers get higher quotas. Redis is used to track usage counters per key with daily reset windows.

API versioning is handled via URL path prefix: /v1/paste, /v2/paste. This lets the team evolve the API without breaking existing integrations.

Scaling Pastebin

Let us think through scaling from 10 requests per second to 10,000 requests per second.

At 10 RPS, a single application server with a single PostgreSQL instance and local file storage handles everything. Paste creation and retrieval both work fine.

At 100 RPS, you add Redis for caching hot pastes and move content storage to a managed object store. Read traffic drops significantly because popular pastes are served from Redis.

At 1,000 RPS, you need multiple application server instances behind a load balancer. The paste creation pipeline becomes async with Kafka workers. The CDN absorbs most of the read traffic.

At 10,000 RPS, database read replicas are necessary for metadata queries. Redis becomes a cluster. Object storage is handled by a globally distributed provider with regional replication. The syntax highlighting and moderation pipelines scale horizontally as independent worker pools.

Bottleneck Symptom Solution
PostgreSQL write throughput High paste creation latency Write batching, async writes, partitioning
Redis single node High cache latency, OOM failures Redis Cluster with sharding
Object storage GET latency Slow read path on cache misses Regional replication, CDN offload
Kafka consumer lag Delayed syntax highlighting More partitions, more consumers
Search indexing Search returning stale results More Elasticsearch nodes, priority queues

Reliability and Availability

A paste service going down is not life-critical, but it is still a bad experience. Engineering for high availability means thinking through every failure scenario.

Database failures are handled by read replicas with automatic failover. PostgreSQL with Patroni or AWS RDS Multi-AZ provides automatic failover in 30-60 seconds. During failover, the application continues reading from replicas. Writes fail until the new primary is elected.

Redis cluster failures are partially degraded, not full outages. If a Redis node fails, the cluster reroutes requests to other nodes. Cache hit rates drop temporarily, increasing load on the database, but the system continues serving paste reads and writes.

Object storage outages are rare with managed providers but can happen. A regional failure might make some paste content temporarily unavailable. Multi-region replication provides redundancy. The application can return a 503 with a friendly message for pastes it cannot retrieve rather than hanging indefinitely.

Monitoring needs to cover all critical paths: paste creation success rate, paste retrieval latency at p50/p95/p99, cache hit ratio, error rate by endpoint, queue consumer lag, storage write failure rate, and CDN offload ratio. Dashboards and alerting on these metrics let the on-call engineer know within minutes when something is wrong.

Engineering Tradeoffs

These are the decisions where reasonable engineers disagree, and where the right answer depends on your specific constraints.

SQL vs NoSQL for metadata: PostgreSQL gives you strong consistency, rich querying, and ACID transactions. Cassandra gives you write throughput and horizontal scalability with weaker consistency guarantees. For a read-heavy system where most reads are by primary key, PostgreSQL with read replicas wins on simplicity and operability. Cassandra makes sense if write volume is extreme and you have the operational expertise.

Server-side vs client-side syntax highlighting: Server-side pre-rendering means every user gets highlighted content without JavaScript. Client-side means faster initial page loads and zero server CPU for highlighting, but requires JavaScript and adds complexity for search engine indexing. The right choice depends on your user base and infrastructure budget.

Aggressive caching vs strong consistency: Caching paste metadata means that if a paste is deleted, some users may still see it for up to the cache TTL. This is usually acceptable. If your product requires instant deletion (for compliance or content moderation), you need cache purge APIs and accept the operational complexity that comes with them.

Compression vs CPU: zstd compression saves significant storage and bandwidth but costs CPU on every read and write. At very high volumes, the CPU cost matters. The tradeoff generally favors compression because storage and bandwidth are more expensive than CPU in modern cloud environments.

Real-World Technology Stack

Here is a realistic technology stack for a production Pastebin:

API servers in Go: Go’s goroutine model handles high concurrency with low memory overhead. Fast startup time makes it great for containerized deployments. The standard library and ecosystem for building HTTP services are excellent.

PostgreSQL for metadata: Battle-tested, well-understood operationally, excellent tooling. pgBouncer for connection pooling, Patroni for HA.

Redis Cluster for caching: The industry standard for application-level caching. Supports the data structures you need: strings for raw text, hashes for metadata, sorted sets for trending.

Apache Kafka for async processing: Durable, high-throughput, replayable. Kafka’s consumer group model makes it easy to scale highlighting workers independently of the write path.

S3 or compatible object storage for content blobs: Near-infinite scale, built-in replication, cheap storage costs, lifecycle policies for tiering.

Elasticsearch for search: Mature full-text search engine with good Python and Go clients, support for complex scoring, and horizontal scalability.

Kubernetes for container orchestration: Stateless application servers run as Kubernetes deployments with horizontal pod autoscaling. Kafka and Elasticsearch run on separate node pools.

Cloudflare or Fastly for CDN: Both offer strong developer APIs, fast purge propagation, and global edge networks.

System Design Interview Perspective

Pastebin is one of the most commonly asked system design questions because it is well-scoped, has clear requirements, and touches a wide range of distributed systems concepts.

What interviewers want to see: They want you to drive the conversation. Start with requirements clarification — how many pastes per day, what is the expected read-to-write ratio, what features are in scope. Then sketch the high-level architecture. Then dive deep into the areas the interviewer cares about: storage, caching, ID generation, or expiration.

Common mistakes: Jumping straight into low-level details without establishing the big picture. Over-engineering with too many components before justifying the need. Ignoring the read-heavy nature of the workload and designing a write-optimized system. Forgetting about abuse prevention. Treating ID generation as trivial.

Strong answers demonstrate clear reasoning about why each design choice is made. When you say “I would use Redis to cache hot pastes,” follow it with “because the read-to-write ratio is high, paste content is immutable, and cache hit rates will be very high for popular pastes, which eliminates most database reads.” That is what separates a strong candidate from an average one.

Areas to go deep on if asked to dive deeper: the ID generation system and collision prevention, the caching strategy and cache invalidation on expiry, the expiry cleanup pipeline and storage reclamation, the content moderation pipeline, and the database schema with indexing strategy.

Scaling discussion: Be ready to say how the system changes as it scales. What breaks first at 10x current load? Usually it is the write path to the database. What breaks at 100x? Usually the single-region architecture. Walk through these scaling conversations confidently, and your interviewer will know you understand distributed systems at a real depth.

Building Pastebin well is about understanding that simple products have non-trivial infrastructure requirements when millions of people use them every day. The database does not just store text. The cache is not just an optimization. The expiry system is not just a scheduled job. Each of these is a distributed systems problem in its own right, with failure modes, performance tradeoffs, and scaling concerns that deserve careful thought. That is the lesson Pastebin teaches, and it is why it remains one of the best system design learning exercises out there.

Comments