How NGINX Works?

There is a good chance that every HTTP request you have made today passed through NGINX at some point. It might have been serving a static file, forwarding your request to a backend application, terminating TLS, or quietly balancing your traffic across a dozen servers. NGINX sits at the heart of a staggering amount of internet infrastructure, and yet most engineers interact with it only through config files without fully understanding what is happening underneath.

This blog is a deep dive into how NGINX actually works. Not just what it does, but why it was built the way it was, what engineering problems it solves, how it interacts with the Linux kernel, and what makes it so exceptionally fast even under enormous load. By the end, you should have the kind of intuition that lets you reason about NGINX the way a systems engineer would.

Alt text

Why NGINX Exists at All

To understand NGINX, you have to understand the world it was born into. In the early 2000s, Apache HTTP Server was the dominant web server. Apache worked on a model where every incoming connection spawned either a new process or a new thread. For a few thousand requests, this was fine. But as web traffic started growing, engineers ran into a hard wall known as the C10K problem — how do you handle 10,000 simultaneous connections on a single server?

The thread-per-connection model has a fundamental problem. Each thread or process consumes memory, typically somewhere between 1MB and 8MB per thread depending on the stack size. Scheduling 10,000 threads means the operating system is constantly doing context switches, which burn CPU cycles on bookkeeping rather than actual work. The kernel’s scheduler is not designed to efficiently juggle tens of thousands of threads all doing mostly waiting.

Igor Sysoev was a Russian engineer who experienced this problem firsthand while working on the infrastructure for Rambler, one of Russia’s largest internet portals. In 2002 he started writing NGINX specifically to solve the C10K problem. The core insight was simple but radical: instead of giving each connection its own thread, use a small number of worker processes and handle all connections inside an event loop. This is the architecture that makes NGINX fundamentally different from Apache.

The practical implication is dramatic. An NGINX worker process can handle tens of thousands of simultaneous connections using only a few megabytes of memory. You could run 4 worker processes on a modern server and comfortably handle hundreds of thousands of concurrent connections. That same workload would bring Apache to its knees.

Core Features of NGINX

Before going deep on architecture, it helps to understand the full surface area of what NGINX actually does in production.

NGINX is first and foremost a web server. It can serve static HTML, CSS, JavaScript, images, and video files directly from disk with extraordinary efficiency. It uses kernel-level optimizations like sendfile() to move data from disk to network without copying it through userspace at all.

As a reverse proxy, NGINX sits in front of your backend application servers and forwards requests to them. This is one of its most common production uses. Your Node.js or Python or Java application never talks directly to the internet. NGINX handles the connection, buffers the request, forwards it upstream, and sends the response back to the client.

As a load balancer, NGINX distributes incoming requests across a pool of backend servers. It supports multiple algorithms and actively monitors backend health. If a backend dies, NGINX stops sending traffic to it automatically.

NGINX also serves as an API gateway, providing rate limiting, request filtering, authentication headers, and routing logic. It handles TLS termination, taking the expensive work of SSL handshakes and encryption off your application servers. It manages WebSocket connections, HTTP streaming, response caching, and Gzip compression.

The important thing to notice is that NGINX does all of these things within the same event-driven architecture. Whether it is serving a static file, proxying to a backend, or terminating TLS, the same non-blocking I/O model handles everything.

High-Level Architecture

Let’s look at the overall structure of how NGINX processes work and communicate.

flowchart TD; A[Client Request]; B[Master Process]; C[Worker Process 1]; D[Worker Process 2]; E[Worker Process 3]; F[Cache Manager]; G[Cache Loader]; H[Upstream Backend 1]; I[Upstream Backend 2]; J[Upstream Backend 3]; K[File System]; A –>|TCP Connection| B; B –>|Accept| C; B –>|Accept| D; B –>|Accept| E; C –>|Proxy| H; D –>|Proxy| I; E –>|Proxy| J; C –>|Static Files| K; B –>|Manage| F; B –>|Manage| G;

The master process is the parent of everything. It reads and validates the configuration file, binds to ports, and then forks worker processes. After that, the master process mostly stays out of the way. It listens for signals from the operating system administrator and manages the lifecycle of workers.

Worker processes are where all the real work happens. Each worker runs a tight event loop that handles thousands of connections simultaneously. Workers do not communicate with each other during normal operation. There are no shared mutexes on the hot path. Each worker is an independent unit processing its own set of connections.

The cache manager and cache loader are separate helper processes. The cache loader walks the disk cache on startup and populates the in-memory cache metadata. The cache manager handles eviction and expiry of cached responses during normal operation. Separating these out keeps the worker processes focused purely on serving requests.

Event-Driven Architecture Deep Dive

This is the section where NGINX starts to look like a fundamentally different kind of system. Most developers are used to thinking about concurrent programs in terms of threads. You have work to do, so you give it a thread. More work means more threads. NGINX rejects this model entirely.

flowchart TD; A[Event Loop]; B[epoll wait]; C[Event Queue]; D[Read Event Handler]; E[Write Event Handler]; F[Timer Event Handler]; G[Connection Accept]; H[Request Parse]; I[Upstream Connect]; J[Response Send]; A –> B; B –>|Events Ready| C; C –> D; C –> E; C –> F; D –> H; G –> H; H –> I; I –> J; E –> J; F –>|Timeout Check| A; J –> A;

The fundamental primitive NGINX relies on is called epoll on Linux (kqueue on BSD and macOS). Epoll is a kernel mechanism for monitoring large numbers of file descriptors efficiently. A file descriptor is just an integer that refers to an open resource — a socket, a file, a pipe. When you have 50,000 connections open, each connection is a file descriptor.

With the old select() and poll() syscalls, your program would hand the kernel a list of all file descriptors to watch, and the kernel would scan through them all to see which ones were ready. This is O(n) work for every single poll, meaning it got slower as you had more connections. Epoll fixes this by maintaining the watch list inside the kernel and only returning the file descriptors that are actually ready. If you have 50,000 connections but only 200 have data available, epoll returns only those 200. The cost is proportional to activity, not to total connection count.

Here is the mental model for a single NGINX worker. It calls epoll_wait() and goes to sleep. The kernel watches all the sockets. When data arrives on any socket, or when a socket becomes writable, the kernel wakes up the worker and hands it a list of ready events. The worker processes each ready event by calling a non-blocking read or write. If a read would block because no data is available yet, the worker does not wait. It registers interest in that file descriptor with epoll and moves on to handle the next event.

This is the key insight: the worker process never blocks. It is always either doing useful work or sleeping in epoll_wait waiting for the kernel to tell it something is ready. There is no wasted time spinning on a connection that has not received data yet.

Compare this to the thread-per-connection model. With threads, when a connection has no data, the thread blocks on a read() syscall. The OS scheduler has to context switch it out and bring in another thread. That context switch costs somewhere between 1 and 10 microseconds. At thousands of connections, context switching overhead starts dominating your CPU usage. You end up spending more time managing threads than doing actual work.

NGINX avoids all of that. The worker’s event loop is essentially a tightly controlled state machine. Every connection is represented as a lightweight struct with a current state. When an event fires, the worker looks up the connection struct, calls the appropriate handler function, updates the state, and moves on. The entire operation stays in the same thread and the same CPU core. CPU cache locality is excellent.

Master and Worker Process Model

The master process is responsible for things that happen infrequently but matter a great deal: starting up, reloading configuration, upgrading the binary, and shutting down gracefully.

flowchart TD; A[Admin sends SIGHUP]; B[Master Process receives signal]; C[Master reads new config]; D[Config valid?]; E[Master forks new workers]; F[New workers start accepting]; G[Old workers drain connections]; H[Old workers exit]; I[Log error]; A –> B; B –> C; C –> D; D –>|Yes| E; D –>|No| I; E –> F; F –> G; G –> H;

When you send NGINX a SIGHUP signal, it performs a zero-downtime reload. The master process reads the new configuration file and validates it. If valid, it forks new worker processes that use the new configuration. The old workers are told to stop accepting new connections. They finish processing whatever connections are already open, then exit. During this entire process, no client experiences a dropped connection. From the client’s perspective, the reload is invisible.

This is one of the engineering properties that makes NGINX production-friendly. You can push a configuration change to a live server with zero downtime, no service interruption, and no dropped connections.

Worker processes are pinned to CPU cores in production deployments using worker_cpu_affinity. This is important for cache efficiency. When a process always runs on the same core, its data stays hot in the L1 and L2 caches. Context switching across cores would constantly invalidate those caches. For a latency-sensitive system like a web proxy, these cache misses add up.

The number of worker processes is typically set to the number of CPU cores. One worker per core is the right starting point because the workers spend most of their time either in epoll_wait or doing I/O. They are not CPU-bound. Adding more workers than cores usually just creates contention without improving throughput.

Request Processing Lifecycle

Let’s trace a single HTTP request through NGINX from the moment the TCP connection arrives to the moment the response is sent back.

flowchart TD; A[Client TCP SYN]; B[Kernel accept queue]; C[NGINX epoll event]; D[Worker accepts connection]; E[TLS handshake if HTTPS]; F[HTTP request parsing]; G[Location matching]; H[Upstream selection]; I[Backend connection pool check]; J[New TCP to backend or reuse]; K[Forward request headers]; L[Backend processes request]; M[Read backend response]; N[Buffer response]; O[Send response to client]; P[Keep-alive or close]; A –> B; B –> C; C –> D; D –> E; E –> F; F –> G; G –> H; H –> I; I –> J; J –> K; K –> L; L –> M; M –> N; N –> O; O –> P;

When a client initiates a TCP connection, the kernel completes the three-way handshake and places the connection in the kernel’s accept queue. The socket bound by the NGINX master process becomes readable. Epoll fires an event. The worker calls accept() to pull the connection out of the queue and gets a file descriptor for the new socket.

If this is an HTTPS connection, the TLS handshake happens next. NGINX handles this using OpenSSL or a compatible library. The handshake involves certificate validation, cipher negotiation, and key exchange. This is computationally expensive, which is why NGINX supports TLS session resumption — clients that have recently connected can present a session ticket and skip most of the handshake.

Once the connection is established, NGINX reads the HTTP request. HTTP/1.1 is a text protocol, so NGINX parses the headers byte by byte, looking for the method, URI, and headers. This parsing is fast and allocation-light. NGINX avoids copying data whenever possible.

With the request parsed, NGINX evaluates the location blocks in the configuration to determine how to handle the request. Location matching is done using a combination of prefix matching and regex matching. The most specific prefix match wins, unless a regex match applies. NGINX compiles regex patterns at startup so matching is fast.

Once the location is determined, NGINX knows what to do: serve a static file, proxy to an upstream, return a redirect, or apply some other directive. If proxying, it selects a backend using the configured load balancing algorithm, checks whether there is a keepalive connection to that backend already in the pool, and either reuses it or opens a new TCP connection.

The request is forwarded to the backend with appropriate headers added: X-Real-IP, X-Forwarded-For, Host, and any custom headers configured. NGINX then reads the backend response. If proxy_buffering is on, NGINX reads the full response into a buffer before sending it to the client. This decouples the backend from slow clients. If buffering is off, NGINX pipes the response directly, which reduces latency but means the backend stays connected until the client finishes reading.

Finally, NGINX sends the response back to the client. If keep-alive is enabled, the connection stays open and the state machine resets to wait for the next request. If not, the connection is closed and the file descriptor is returned to the kernel.

Reverse Proxy Internals

The reverse proxy is where NGINX adds enormous value in real production systems. Without a reverse proxy, your application server is directly exposed to the internet. It has to deal with slow clients, malformed requests, connection floods, and TLS negotiation all by itself.

flowchart TD; A[Slow Client]; B[Fast Client]; C[NGINX Reverse Proxy]; D[Backend App Server]; E[Backend App Server]; A –>|Slow upload| C; B –>|Fast request| C; C –>|Buffered full request| D; C –>|Buffered full request| E; D –>|Response| C; E –>|Response| C; C –>|Slow delivery| A; C –>|Fast delivery| B;

One of the most important things a reverse proxy does is protect backends from slow clients. Imagine a client on a mobile connection uploading a 10MB file at 100KB/s. Without a proxy, your application server has to hold a thread or connection open for 100 seconds waiting for the upload to complete. With NGINX, the proxy buffers the entire upload first, then sends it to the backend in a burst over the fast internal network. The backend sees a fast request and finishes quickly. It never has to deal with the slow client.

NGINX maintains connection pools to backends. Opening a new TCP connection takes time — the three-way handshake, potential TLS handshake, and kernel allocations. With connection pooling using the keepalive directive in an upstream block, NGINX keeps a set of idle connections to each backend. When a request arrives, it reuses an existing connection rather than opening a new one. This dramatically reduces latency, especially for HTTPS backends.

Timeout handling is critical. NGINX has separate timeout values for proxy_connect_timeout, proxy_read_timeout, and proxy_send_timeout. If a backend takes too long to respond, NGINX can close the connection, optionally retry on a different backend, and return an error to the client. This prevents a slow or hung backend from holding connections indefinitely and consuming resources.

Load Balancing System

NGINX’s load balancer is deliberately simple, and that simplicity is a feature. Complex load balancing algorithms are hard to reason about under failure conditions.

flowchart TD; A[Incoming Request]; B[NGINX Load Balancer]; C[Round Robin Selector]; D[Backend 1 healthy]; E[Backend 2 unhealthy]; F[Backend 3 healthy]; G[Health Check Process]; A –> B; B –> C; C –>|Request 1| D; C –>|Skip| E; C –>|Request 2| F; G –>|Check| E; G –>|Mark healthy or unhealthy| E;

Round robin is the default. Each request goes to the next backend in the list. It is simple, predictable, and fair. It works well when backends are homogeneous and requests are roughly equal in cost.

Least connections sends each request to the backend with the fewest active connections. This works better when requests have variable processing time. A backend that is slow on one request does not get buried with more requests while other backends are idle.

IP hash routes each client IP to the same backend consistently. This is used for sticky sessions when your application stores session state on the backend server rather than in a shared store like Redis. It has a significant failure mode: if that backend dies, all those clients have their sessions invalidated. For this reason, most modern architectures avoid relying on sticky sessions.

Weighted round robin lets you send more traffic to backends with more capacity. If one backend has twice the CPU as the others, you give it double the weight and it receives double the requests.

Health checks are either passive or active. Passive health checks work by observing failures. If a backend returns 5xx errors or times out, NGINX marks it as failed and stops sending traffic for a configurable period. Active health checks (available in NGINX Plus) send synthetic requests to backends on a schedule and remove them from rotation proactively.

Algorithm Best Use Case Weakness Session Affinity
Round Robin Homogeneous backends, uniform request cost Ignores backend load No
Least Connections Variable request processing time More state to track No
IP Hash Session state on backend Uneven load, hard failover Yes
Weighted Round Robin Heterogeneous backend capacity Requires manual weight tuning No
Random with two choices Large backend pools Less predictable No

Static File Serving Optimization

NGINX is extremely fast at serving static files, and the reason comes down to a Linux syscall called sendfile(). Normally, serving a file requires two copies: one from the kernel’s page cache into userspace memory, and another from userspace memory into the socket buffer. With sendfile(), the kernel copies data directly from the page cache to the socket buffer without ever bringing it into userspace. This is called zero-copy.

The benefit is twofold. First, you eliminate the memory copy, which saves CPU cycles and memory bandwidth. Second, the data stays inside the kernel, which means one fewer context switch between kernel mode and user mode.

The kernel’s page cache is also critical here. The first time NGINX reads a file, the kernel reads it from disk and caches the pages in RAM. Every subsequent read returns from the cache without touching the disk at all. For a server serving the same popular static assets repeatedly, the hot files stay permanently in cache and disk I/O is essentially zero.

NGINX also supports TCP_NOPUSH and TCP_NODELAY socket options to optimize how the kernel sends data. TCP_NOPUSH tells the kernel to batch data into larger TCP segments rather than sending many small packets. This reduces network overhead significantly when sending large responses.

Caching System

NGINX implements a full reverse proxy cache that can dramatically reduce load on backend systems.

flowchart TD; A[Incoming Request]; B[Cache key generation]; C[Cache lookup]; D[Cache hit]; E[Cache miss]; F[Backend request]; G[Response received]; H[Cache store]; I[Send cached response]; J[Send fresh response]; A –> B; B –> C; C –> D; C –> E; D –> I; E –> F; F –> G; G –> H; H –> J;

The cache key is typically derived from the request URI and optionally other headers like the Host header. When a request arrives, NGINX hashes the key and looks it up in a shared memory zone that contains cache metadata. If there is a hit and the cached entry has not expired, NGINX serves the response directly from disk without touching the backend at all.

Microcaching is an interesting technique where you cache responses for very short periods, sometimes as little as 1 second. For high-traffic endpoints, even 1 second of caching means a thousand requests per second hit the backend only once. The response might be slightly stale, but for most use cases that is completely acceptable.

Cache invalidation is the hard part. NGINX does not support active cache invalidation by default in the open source version. You can work around this by using cache keys that include version numbers or content hashes, so a new version naturally gets a new cache key. Stale responses are handled by proxy_cache_use_stale, which lets NGINX serve a stale cached response if the backend is slow or returning errors. This dramatically improves resilience.

TLS and HTTPS Internals

TLS termination is one of the most computationally expensive things NGINX does. A TLS handshake requires asymmetric cryptography — RSA or ECDSA operations — that are orders of magnitude slower than symmetric operations like AES.

NGINX uses OpenSSL (or BoringSSL in some deployments) for TLS operations. Modern servers use hardware acceleration for AES-NI instructions, which brings the cost of symmetric encryption to almost nothing. But the handshake is still expensive.

Session resumption is critical for TLS performance. NGINX supports both session tickets and session IDs. With session tickets, the server encrypts its session state and gives it to the client. On the next connection, the client presents the ticket, and the server decrypts it to restore the session without going through the full handshake. This reduces handshake latency from several round trips to one.

HTTP/2 multiplexes multiple requests over a single TLS connection. Instead of one request per connection, a client can have dozens of in-flight requests over a single TCP stream. NGINX supports HTTP/2 natively. For backend connections, NGINX can also proxy to backends over HTTP/2, though in practice many backend connections still use HTTP/1.1 with keepalive because backends often live on a trusted internal network where TLS overhead matters less.

WebSocket and Streaming Support

WebSocket is a protocol that starts as HTTP and then upgrades to a full-duplex persistent connection. NGINX supports WebSocket proxying by detecting the Upgrade and Connection headers in the HTTP handshake and then switching the connection into a transparent tunnel mode.

Once in tunnel mode, NGINX does not interpret the data flowing through the connection. It simply reads from one side and writes to the other. The connection stays open indefinitely until either side closes it. This means NGINX has to track long-lived connections differently from normal HTTP connections. Timeouts need to be adjusted because a WebSocket connection that is idle is not necessarily broken — it might just be waiting for an event.

Streaming HTTP responses, like server-sent events or chunked transfer encoding, work similarly. NGINX buffers responses to protect backends from slow clients. For streaming, you typically want to disable buffering with proxy_buffering off so that chunks are sent to the client as soon as they arrive from the backend, rather than accumulating in a buffer first.

Rate Limiting and Security

NGINX implements rate limiting using the leaky bucket algorithm. You define a zone in shared memory that tracks request rates per key (usually IP address). Requests that arrive faster than the configured rate are either delayed or rejected with a 429 status.

The two main directives are limit_req and limit_conn. limit_req controls the rate of requests over time. limit_conn controls the number of simultaneous connections from a single client. Together, they are your first line of defense against abusive clients and simple DDoS attacks.

Rate limiting in NGINX works at the worker level, using shared memory zones that all workers can read and write. The shared memory is protected by a lightweight mutex. This is one of the few places where NGINX workers communicate during the hot path, and it is engineered to be fast.

For DDoS mitigation at scale, NGINX rate limiting is useful but limited. It sees traffic after it has reached your server. Real DDoS protection happens further upstream at the CDN or anycast edge. But for protecting a backend against accidental traffic spikes or specific abusive clients, NGINX rate limiting is very effective.

NGINX Kernel Interaction

Understanding how NGINX interacts with the Linux kernel is what separates a deep understanding of NGINX from a surface-level one.

flowchart TD; A[NGINX Worker Process]; B[epoll instance]; C[Linux Kernel]; D[TCP Socket Buffers]; E[Network Interface]; F[Page Cache]; G[File System]; A –>|epoll_wait| B; B –>|Kernel monitors fds| C; C –>|Packet arrives| E; E –>|DMA to| D; D –>|Readable event| B; B –>|Wakeup worker| A; A –>|read syscall| D; A –>|sendfile syscall| F; F –>|Read from disk| G; F –>|Write to socket| D;

When a network packet arrives, the network interface card uses DMA (Direct Memory Access) to copy the packet into the kernel’s socket receive buffer without CPU involvement. The kernel then fires an interrupt, processes the packet, reassembles TCP segments, and marks the associated socket as readable. Epoll, which was watching that socket, adds it to the ready list. The next time the worker calls epoll_wait(), it receives the event and calls read() to pull the data out of the buffer.

The socket buffers are a critical resource. The kernel’s TCP receive buffer holds incoming data until the application reads it. The send buffer holds outgoing data until it can be transmitted. If your application is slow to read, the receive buffer fills up, which causes TCP flow control to kick in — the kernel tells the sender to slow down. NGINX avoids this by always reading data promptly in its event loop.

The kernel’s TCP stack also handles accept queues. When connections arrive faster than NGINX can call accept(), the kernel queues them. The backlog parameter controls how many connections can wait in this queue. If the queue fills up, the kernel starts dropping SYN packets, which causes clients to see connection timeouts. In high-traffic deployments, tuning net.core.somaxconn and net.ipv4.tcp_max_syn_backlog is essential.

Kernel Parameter What It Controls Production Consideration
net.core.somaxconn Maximum accept queue depth per socket Increase to 65535 under heavy load
net.ipv4.tcp_max_syn_backlog Maximum incomplete connections Increase to handle connection bursts
net.ipv4.tcp_tw_reuse Reuse TIME_WAIT sockets faster Enable to reduce ephemeral port exhaustion
net.core.rmem_max Maximum TCP receive buffer size Increase for high-bandwidth connections
fs.file-max Maximum open file descriptors system-wide Must exceed total expected connections
worker_rlimit_nofile Per-process file descriptor limit for NGINX Set to match expected connections per worker

The file descriptor limit deserves special attention. On Linux, every socket is a file descriptor. If NGINX handles 100,000 connections, it needs at least 100,000 open file descriptors. Plus more for open files, logs, and upstream connections. The default per-process limit on many systems is 1024, which is hopelessly inadequate. Production NGINX deployments set worker_rlimit_nofile to something like 100000.

Scaling NGINX

A single NGINX instance can handle an enormous amount of traffic, but there are limits. At some point, you need to scale horizontally.

flowchart TD; A[Anycast IP or DNS load balanced]; B[NGINX Edge Node Region 1]; C[NGINX Edge Node Region 2]; D[NGINX Edge Node Region 3]; E[Origin NGINX Cluster]; F[Backend Application Cluster]; G[Database Layer]; A –> B; A –> C; A –> D; B –> E; C –> E; D –> E; E –> F; F –> G;

The first scaling lever is multi-core optimization. NGINX worker processes are independent, so adding CPU cores and matching workers scales linearly up to the point where the network interface card becomes the bottleneck. For a 10Gbps NIC, that bottleneck happens somewhere around 1 million requests per second for small responses.

Beyond a single machine, you add more NGINX instances behind a hardware load balancer or DNS round robin. For geographic distribution, you deploy NGINX at edge locations in different regions and use anycast or geographic DNS routing to send clients to the nearest edge.

Kernel-level features like SO_REUSEPORT allow multiple NGINX worker processes to each have their own accept queue for the same port. Without SO_REUSEPORT, all workers compete to call accept() on a single listening socket, which creates contention. With SO_REUSEPORT, the kernel distributes incoming connections evenly across the workers’ separate accept queues, eliminating that lock contention entirely.

Reliability and Availability

Zero-downtime operation is a core NGINX design goal. The hot-reload mechanism we discussed earlier handles configuration changes. Binary upgrades work similarly — you can replace the NGINX binary on disk and perform a live upgrade without dropping connections.

flowchart TD; A[Backend returns 5xx]; B[NGINX passive health check]; C[Failure threshold reached]; D[Backend marked unavailable]; E[Traffic redirected to healthy backends]; F[Retry timer expires]; G[Test request sent]; H[Backend healthy]; I[Backend returns to rotation]; A –> B; B –> C; C –> D; D –> E; D –> F; F –> G; G –> H; H –> I;

When a backend fails, NGINX’s passive health checking detects the failures and temporarily stops routing to that backend. The proxy_next_upstream directive controls what counts as a failure worth retrying. You can configure NGINX to retry on connection errors, timeouts, and specific HTTP status codes. It will try the next backend in the pool automatically.

For observability, NGINX logs every request with configurable detail in the access log. The error log captures anything unexpected. NGINX Plus adds a real-time metrics API. In open-source NGINX, you typically use tools like Prometheus with an nginx-prometheus-exporter to scrape metrics, or parse logs with a tool like Filebeat sending to Elasticsearch.

Engineering Tradeoffs

The event-driven architecture that makes NGINX so efficient also imposes constraints. NGINX worker processes cannot do blocking operations. If you need to run a slow computation, talk to a database, or call a blocking API, you cannot do it in the worker’s event loop without blocking the entire worker. This is why Lua scripting in NGINX (via OpenResty) is written to be non-blocking, using coroutines that yield when waiting for I/O.

Decision NGINX Choice Tradeoff
Concurrency Model Event loop with epoll Scales to millions of connections, but blocking code kills performance
Process vs Thread Worker processes Isolation and stability, but no shared state between workers
Response Buffering Buffer by default Protects backends, increases latency for streaming
Caching Disk-based cache Survives restarts, slower than in-memory
Load Balancing Simple algorithms Predictable and debuggable, not adaptive
Configuration Static config files Simple to audit, requires reload for changes

Buffering versus streaming is a real tension. Buffering the full response from the backend before sending to the client protects the backend from slow clients and reduces backend connection hold time. But for streaming APIs, chatbots, or live video, buffering defeats the purpose. You want bytes flowing to the client as fast as they arrive. The right choice depends entirely on your use case.

The static configuration file model is both a strength and a weakness. It is auditable, version-controllable, and simple to reason about. But dynamic routing changes require a reload. For Kubernetes environments where backends change constantly, this creates friction. This is why tools like NGINX Kubernetes Ingress Controller exist — they watch the Kubernetes API and dynamically generate NGINX configurations and trigger reloads.

Real-World Technology Stack

NGINX is written in C, which gives it direct access to Linux system calls without the overhead of a managed runtime. There is no garbage collector pausing at inconvenient moments, no JIT compilation latency, and no abstraction layers between the code and the kernel.

OpenSSL handles all the cryptography. It is battle-tested, hardware-accelerated, and continuously audited. Some high-security deployments use BoringSSL, Google’s fork, which has a smaller and more conservative API surface.

Lua, through the ngx_http_lua_module (part of OpenResty), provides a scripting layer that runs inside the NGINX event loop. You can write non-blocking Lua code that talks to Redis, makes HTTP subrequests, modifies headers, and implements custom authentication logic. OpenResty effectively turns NGINX into a programmable platform.

In Kubernetes, NGINX is widely used as the ingress controller. It watches Kubernetes Service and Ingress resources through the API server and dynamically generates and reloads its upstream configuration. This closes the gap between NGINX’s static configuration model and Kubernetes’s dynamic service discovery.

HTTP/2 and HTTP/3 (QUIC) are increasingly important. HTTP/2 multiplexing reduces the overhead of many small requests. HTTP/3 moves from TCP to QUIC, which handles packet loss more gracefully and eliminates head-of-line blocking at the transport layer. NGINX has experimental HTTP/3 support, and the ecosystem is moving in that direction.

System Design Interview Perspective

When an interviewer asks you to design an API gateway, a CDN edge node, or a high-throughput proxy, they are essentially asking you to design something NGINX-like. Here is how to think about it.

Start by clarifying the problem. How many requests per second? What is the acceptable latency? Do we need TLS termination? Is the backend pool static or dynamic? Do we need caching? Rate limiting? The answers to these questions drive the architecture.

When explaining concurrency, mention the event-driven model explicitly. Say that thread-per-connection does not scale to hundreds of thousands of connections because of memory and context-switching overhead. Explain that an event loop with non-blocking I/O handles this by never blocking — the process is always either handling a ready event or sleeping in epoll waiting for one.

Strong candidates discuss the full lifecycle: TCP acceptance, request buffering, location matching, upstream selection, connection pooling, response buffering, and keep-alive. They also discuss what happens when things go wrong — backend timeouts, health check failures, buffer exhaustion, file descriptor limits.

Weak answers focus only on the happy path. They describe NGINX as “it forwards requests to backends” without discussing the engineering underneath. They do not mention the event loop, epoll, connection pooling, zero-copy file serving, or how configuration reloads work without downtime.

A question specifically about load balancing should cover at minimum: algorithm choices and their tradeoffs, health check mechanisms, what happens during backend failure, how you handle draining connections during a deploy, and the limits of session affinity.

When discussing scaling, strong candidates layer the answer. Single server first: worker processes matching CPU cores, SO_REUSEPORT, kernel tuning. Then horizontal scaling with multiple NGINX instances. Then geographic distribution with anycast and edge caching. Then integration with a CDN. Each layer has a cost and a reason.

The most impressive thing you can do in a system design interview is explain the tradeoffs honestly. If you choose buffering over streaming, explain what you gain and what you give up. If you choose round-robin over least-connections, explain when that choice breaks down. Real engineers make principled tradeoffs, not perfect decisions.

NGINX is a masterclass in doing one set of things extremely well. It does not try to be a database, a message queue, or an application runtime. It sits on the network edge, handles connections efficiently, routes requests intelligently, and gets out of the way. That clarity of purpose, executed with deep engineering discipline, is what makes it the foundation of so much of the internet’s infrastructure.

The more you understand the systems underneath — epoll, socket buffers, the page cache, process isolation, zero-copy — the more natural NGINX’s design choices feel. They are not arbitrary. Every decision traces back to a real constraint in the underlying system. That is what good systems engineering looks like.

Comments