How Indigo Booking Works?

IndiGo is India’s largest airline by market share, operating hundreds of flights daily across domestic and international routes. If you’ve ever booked a flight on goindigo.in or the IndiGo app, you interacted with one of the most sophisticated transactional systems in the country — a system that has to handle millions of searches, coordinate real-time inventory across thousands of flights, process payments securely, and issue legally valid tickets, all within a few seconds.

Alt text

Airline reservation systems are not just booking systems. They are distributed systems problems at scale, with real money, real people, and legally binding contracts involved. A failed payment that results in a confirmed seat is a liability. A double-booked seat is a disaster. A slow search experience sends users to a competitor. The stakes are incredibly high.

What makes airline booking particularly hard compared to, say, an e-commerce system? In e-commerce, you can slightly oversell and manage the backorder. In airline booking, there is no backorder. If seat 14A on flight 6E-205 departing Mumbai at 8:30 AM is sold, it is sold. You cannot manufacture another 14A. Inventory is finite, time-bounded, and irreplaceable.

Add to this the following complexity: prices change every few minutes based on demand, remaining seats, time to departure, competitor pricing, and internal revenue management algorithms. Two users searching the same route at the same time may see different prices. A seat that showed available when you started the booking flow may be gone by the time you hit confirm. And the system must handle all of this consistently, at scale, with zero tolerance for double bookings.

This is why airline reservation systems sit alongside banking and healthcare as the most critical transactional systems in the software world.

Core Components of an Airline Booking Platform

Before diving into each component, it helps to understand what the full system looks like from a bird’s eye view.

The Flight Search Service handles all user queries about available flights between origin and destination pairs across specific dates. It is read-heavy and latency-sensitive.

The Fare Engine calculates the price a specific user should see for a specific flight at a specific point in time. This is not a simple lookup. It involves revenue management rules, demand signals, fare class availability, taxes, surcharges, and promotional logic.

The Seat Inventory Service tracks exactly how many seats are available, held, reserved, or sold across every flight in the network. This is the most write-contended component in the system.

The Reservation Service orchestrates the process of temporarily locking a seat, coordinating payment, and confirming the booking. It manages the lifecycle from “user clicked book” to “ticket issued.”

The Payment Service handles the actual money movement — authorization, capture, refund, and reconciliation — across UPI, credit cards, debit cards, net banking, and wallets.

The Ticketing Service generates the official e-ticket after payment is confirmed. The ticket is a legally recognized travel document.

The PNR Generation System creates the unique booking reference that ties together the passenger, flight, seat, and payment into a single retrievable record.

The Check-In System manages the online and airport check-in process, including boarding pass generation and seat finalization.

The Notification Service delivers booking confirmations, reminders, boarding alerts, and delay notifications via SMS, email, and push notifications.

The Loyalty and Ancillary Systems handle frequent flyer points, baggage add-ons, meal preferences, and other revenue-generating services.

High-Level Booking Architecture

When you open the IndiGo app and search for a flight, your request travels through multiple services before you see results. Here is a simplified view of the overall architecture:

graph TD A[Mobile App] –> B[API Gateway] C[Web App] –> B B –> D[Flight Search Service] B –> E[Pricing Engine] B –> F[Inventory Service] B –> G[Reservation Service] B –> H[Payment Service] B –> I[Ticketing Service] B –> J[Notification Service] D –> K[Search Index] D –> F E –> L[Fare Rules DB] E –> F F –> M[Inventory DB] G –> F G –> H H –> N[Payment Gateway] I –> O[Ticket Store] G –> I I –> J classDef client fill:#1a73e8,color:#fff,stroke:none classDef gateway fill:#0d47a1,color:#fff,stroke:none classDef service fill:#1565c0,color:#fff,stroke:none classDef store fill:#0a3d62,color:#fff,stroke:none class A,C client class B gateway class D,E,F,G,H,I,J service class K,L,M,N,O store

The API Gateway is the entry point for all client traffic. It handles authentication, rate limiting, SSL termination, and request routing. Behind it, each service is independently deployable and independently scalable. This separation is intentional — the search service handles dramatically more traffic than the payment service, so they need to scale differently.

Flight Search System Deep Dive

Search is the highest volume operation in the entire booking platform. For every booking that completes, there are hundreds of searches. Users search, compare, abandon, search again, and eventually book. This means the search system must be extremely fast, extremely available, and must not create write load on the inventory system.

When you search for a flight from Delhi to Mumbai on a specific date, here is what actually happens:

The search service first checks a distributed cache (typically Redis or Memcached) for a pre-computed result set for that origin-destination-date combination. Popular routes like DEL-BOM are searched thousands of times per hour. Pre-computing and caching these results means the database is not hit for every search request.

If the cache misses, the system queries an Elasticsearch or Solr-based search index that contains flight schedules, availability buckets, and price ranges. This index is periodically refreshed from the inventory system but is not a real-time view. The goal of search is to give the user a fast, accurate-enough result, not a millisecond-precise inventory snapshot.

Why not query live inventory for every search? Because inventory is stored in a transactional database optimized for writes and consistency. Hitting that database with millions of search queries simultaneously would destroy its performance and create lock contention that slows down actual bookings.

The search index is updated asynchronously. When inventory changes — a seat is booked, a hold expires, a fare bucket closes — an event is published to a message queue. Search index consumers read from this queue and update the index with a short lag, typically a few seconds. This means search results might not be perfectly real-time, but they are fast, scalable, and close enough for a user browsing flights.

Search results are returned with available seat counts and price ranges. The exact price is calculated only when a user selects a specific flight, because fare calculation is more expensive than a search index lookup.

Seat Inventory Management

This is arguably the most complex component in the entire airline system. Understanding seat inventory requires understanding how airlines think about their product.

An airline does not just have seats. It has fare classes. Fare classes are subdivisions of the cabin that allow the airline to sell the same physical seat at different prices to different customers. On a typical IndiGo flight, you might have fare classes like S, T, V, Q, L, K, M, B, H, Y — each representing a different price point and set of rules.

The inventory system tracks how many seats are available within each fare class on each flight. This is called a fare bucket. When you buy a cheap ticket, you are buying from a low-priced fare bucket. When that bucket empties, the cheapest available price jumps to the next bucket. This is why flight prices increase as you approach the travel date and as more seats are sold.

Here is a simplified view of what an inventory record looks like:

Fare Class Total Seats Available Held Sold Price Range
S4004Lowest
T6006Low
V8215Medium-Low
Q10703Medium
Y201802Highest

The hardest problem in inventory management is race conditions. Imagine two users simultaneously trying to book the last seat in fare class V on the same flight. Both query inventory, both see one seat available, both initiate the booking flow. Without proper concurrency control, both bookings could complete, resulting in an oversold seat.

Airlines solve this with a technique called seat locking or inventory holding. When a user selects a flight and starts the booking flow, the system places a soft hold on inventory: it decrements the available count and increments the held count. This hold is temporary — typically 10 to 15 minutes. If payment is not completed within that window, the hold expires and inventory is restored.

Implementing this correctly in a distributed system is non-trivial. The inventory service cannot simply do a read-then-write in two separate database operations, because another process could read and modify the same row between those two operations. The update must be atomic.

A typical implementation uses optimistic locking with database-level compare-and-swap operations. The inventory update includes a version check: “Update inventory where available_seats = 1 and version = 42, set available_seats = 0, held_seats = 1, version = 43.” If another transaction has already modified this row, the version check fails, and the system retries or rejects the request.

For high-throughput scenarios, some systems use pessimistic locking with SELECT FOR UPDATE, which acquires a row-level lock before reading. This prevents concurrent modifications but reduces write throughput on heavily contended flights.

Another approach is to move inventory management to an in-memory system like Redis and use Redis’s atomic operations (DECR, WATCH/MULTI/EXEC) to manage seat counts. This provides much higher throughput than a relational database for hot inventory, with the relational database serving as the durable source of truth that Redis is synchronized with.

Dynamic Fare Calculation Engine

The fare engine is where airline economics becomes visible in the system. Understanding why it works the way it does requires understanding the concept of revenue management.

Airlines have a fixed inventory that expires at a specific time. An empty seat on a departed flight generates zero revenue. The goal of revenue management is to maximize total revenue across all seats by selling the right seat to the right customer at the right price at the right time.

This is why prices are dynamic. Early in the booking window, when a flight has many empty seats, the airline offers low prices to stimulate demand and fill the plane. As the flight fills up and the departure date approaches, prices rise because the remaining seats are scarce and late-booking travelers tend to be less price-sensitive business travelers who value flexibility.

The fare calculation engine takes multiple inputs:

The base fare for the route, which reflects operating costs including fuel, crew, airport fees, and overhead. The fare class the user is eligible for, which depends on inventory availability at query time. Taxes and surcharges, which include government-mandated passenger service fees, fuel surcharges, and airport development fees. These are not discretionary — they must be applied accurately.

On top of the base fare and taxes, the engine applies modifiers. A promotional code might apply a discount. Booking far in advance might unlock an early-bird fare. Booking for a group might apply group pricing rules. A user with IndiGo’s frequent flyer status might see a different fare structure.

The reason two users see different prices is often because they query at slightly different times, with different inventory states. User A queries and sees 3 seats available in fare class V. User B queries 30 seconds later, after those 3 seats have been placed on hold by other users, and now sees only Q class available, which is more expensive.

The fare engine itself is typically stateless — it takes inputs (flight, date, fare class, passenger type, applied promotions) and returns a price. State is managed by the inventory system. This makes the fare engine horizontally scalable. You can run dozens of instances behind a load balancer.

Reservation System Deep Dive

The reservation system orchestrates the most critical workflow in the platform: converting a user’s intent to buy into a confirmed booking.

The reservation lifecycle has several distinct states:

State Description Duration Trigger
INITIATEDUser begins booking flowSecondsUser selects flight
HELDInventory soft-locked10-15 minPassenger details submitted
PENDING_PAYMENTAwaiting payment confirmation5-10 minPayment initiated
PAYMENT_CONFIRMEDPayment authorizedSecondsPayment gateway callback
CONFIRMEDBooking complete, PNR issuedPermanentTicket generated
CANCELLEDBooking voidedPermanentUser or system action
EXPIREDHold timed outPermanentTTL expiry

The transition from HELD to CONFIRMED involves a distributed transaction across the inventory service, payment service, and ticketing service. This is where things get architecturally challenging.

In a traditional database world, you would wrap these operations in a single ACID transaction. But in a microservices architecture, each service has its own database. You cannot span a database transaction across service boundaries.

Airlines solve this with the Saga pattern. A saga is a sequence of local transactions, each with a compensating transaction that can undo it if a later step fails. The booking saga looks like this:

  • Step 1: Lock inventory (compensating action: release lock).
  • Step 2: Charge payment (compensating action: refund).
  • Step 3: Generate ticket (compensating action: void ticket).
  • Step 4: Issue PNR (compensating action: cancel PNR).

If payment succeeds but ticket generation fails, the saga triggers the compensating actions in reverse order: the payment is refunded, the inventory lock is released. This ensures the system reaches a consistent state even when individual steps fail.

The saga can be implemented in two ways. Choreography-based sagas use events: each service publishes an event after completing its step, and the next service listens for that event. Orchestration-based sagas use a central orchestrator service that explicitly calls each step and handles failures. For complex booking workflows, orchestration tends to be easier to reason about and debug.

Booking Workflow

Let us walk through the complete end-to-end flow, including what happens at each step and what can go wrong.

graph TD A[User Searches Flight] –> B[Search Service returns results] B –> C[User selects fare] C –> D[Fare Engine calculates price] D –> E[User enters passenger details] E –> F[Inventory Service places hold] F –> G{Hold successful?} G –>|No| H[Show seat unavailable] G –>|Yes| I[User initiates payment] I –> J[Payment Service calls gateway] J –> K{Payment authorized?} K –>|No| L[Release hold and show failure] K –>|Yes| M[Ticketing Service generates e-ticket] M –> N[PNR Generation Service creates PNR] N –> O[Notification Service sends confirmation] O –> P[Booking Complete] classDef step fill:#1565c0,color:#fff,stroke:none classDef decision fill:#0d47a1,color:#fff,stroke:none classDef terminal fill:#0a3d62,color:#fff,stroke:none classDef failure fill:#b71c1c,color:#fff,stroke:none class A,B,C,D,E,F,I,J,M,N,O step class G,K decision class P terminal class H,L failure

When the user submits passenger details, the reservation service immediately tries to place a soft hold on inventory. This is the first critical moment. If the hold fails — because another user got there first — the system surfaces a friendly “seat no longer available” message and sends the user back to search results.

If the hold succeeds, the user is given a time window to complete payment. A countdown timer in the UI reflects the hold expiry. On the backend, a scheduled job monitors held reservations and releases those that expire without payment.

After payment, the system receives a callback from the payment gateway. This callback must be processed idempotently — it could arrive more than once if the network is unreliable. The payment service records the payment authorization ID from the gateway and checks whether it has already processed a booking for this authorization ID before taking action.

Once payment is confirmed, the reservation transitions to PAYMENT_CONFIRMED and the ticketing workflow begins.

Payment Infrastructure

Payment is where real money moves, and where the consequences of failures are most visible to users. IndiGo’s payment infrastructure must handle a wide variety of payment methods with varying reliability characteristics.

Payment Method Latency Reliability Failure Mode
UPI1-3 secondsHighBank timeout, user rejection
Credit Card2-5 secondsHighDecline, 3DS failure
Debit Card2-5 secondsMediumInsufficient funds, bank down
Net Banking5-30 secondsMediumBank redirect failure
Wallet1-2 secondsHighInsufficient balance

The payment service does not directly integrate with every bank and wallet. It integrates with a payment aggregator (like Razorpay, PayU, or Cashfree in the Indian context) that handles the downstream integrations. The aggregator provides a unified API and handles the complexity of bank-specific protocols.

The trickiest scenario in payment processing is what engineers call the “dual write” problem: payment succeeds at the gateway, but the callback to the booking system fails or times out. Now the user’s money has been deducted, but the booking system does not know about it.

Airlines handle this with a reconciliation job. The payment service periodically queries the gateway for the status of all pending transactions. When it finds a transaction that the gateway shows as successful but the booking system has not confirmed, it triggers the booking confirmation flow retroactively.

This is why sometimes your bank shows a deduction before you receive the booking confirmation email. The payment captured, but the confirmation flow was delayed.

Idempotency is critical throughout. Every payment request is assigned a unique idempotency key — typically a combination of the reservation ID and an attempt number. If the same payment request is submitted twice (due to a retry), the payment service checks whether it has already processed a request with this idempotency key and returns the cached result instead of creating a duplicate charge.

PNR Generation System

PNR stands for Passenger Name Record. It is the six-character alphanumeric code that uniquely identifies your booking. When you call the airline, this is the first thing they ask for.

PNRs are not just user-facing identifiers. They are the primary key of the global booking record in airline systems. They are recognized by Global Distribution Systems (GDS) like Amadeus and Sabre, which airlines use to coordinate with travel agents and codeshare partners.

Generating a PNR has specific requirements. It must be unique across all bookings, not just current bookings but all historical bookings. It must be short enough to be communicated verbally. It must not contain characters that look similar (0 and O, 1 and I are typically excluded). It must be generated in a distributed system where multiple instances need to produce non-colliding IDs simultaneously.

The simplest approach is to generate a random 6-character alphanumeric string from a safe character set (A-Z, 2-9, excluding confusable characters) and check for collisions before committing. With roughly 346 possible combinations (about 1.5 billion), the collision probability is very low in practice.

A more sophisticated approach uses a distributed ID generation service — similar to Twitter’s Snowflake or Flickr’s ticket server — that guarantees uniqueness without requiring a collision check. The service encodes a timestamp, a machine ID, and a sequence number into a compact representation that maps to the PNR character space.

The PNR lookup system must be extremely fast. Airport agents, customer service representatives, and self-service kiosks all query PNRs constantly. The PNR record and its associated booking details are cached aggressively in a fast key-value store like Redis, with the relational database serving as the authoritative source.

Ticketing System

The e-ticket is a legally valid travel document. It proves that you have a confirmed reservation on a specific flight. Its generation is not optional — without a ticket, the booking is not complete.

E-ticket generation involves assembling a record that includes the passenger name exactly as it appears on their ID document, the flight details (number, date, departure, arrival, times), the seat assignment, the fare paid, applicable taxes, the ticket number (distinct from the PNR), the booking class, the ticket validity dates, and any ancillary services purchased.

Ticket numbers follow the IATA standard format: a three-digit airline code followed by a ten-digit number. IndiGo’s airline code is 6E, which corresponds to a specific numeric prefix in the IATA ticketing system. These numbers must be unique, sequentially issued, and globally unambiguous.

After generation, the ticket is stored in the ticketing database and the PDF or structured data is made available for download and email delivery. The ticket data is also transmitted to the Departure Control System (DCS), which airport staff use to manage boarding.

Ticket modification — a name correction, a date change — triggers a re-issuance workflow. The original ticket is voided and a new ticket is generated with a new ticket number. The fare difference, if any, is collected or refunded. This re-issuance must maintain the integrity of the audit trail.

Check-In Architecture

Online check-in typically opens 48 hours before departure and closes a few hours before boarding. The check-in system is architecturally simpler than the reservation system, but it has strict operational constraints.

When a passenger checks in online, the system validates that the reservation is in CONFIRMED state, that the check-in window is open, that the passenger’s travel documents are valid (for international flights), and that no operational holds exist on the booking (such as an unpaid fee).

If validation passes, the system upgrades the booking to CHECKED_IN state and generates a boarding pass. The boarding pass contains the passenger name, flight number, seat number, boarding time, gate number, and a barcode or QR code that encodes the BCBP (Bar Coded Boarding Pass) standard used globally.

The boarding pass barcode must be scannable at airport security and at the gate. This requires the check-in system to be synchronized with the Departure Control System in real time. The DCS is the system of record during airport operations — it knows which passengers have checked in, which bags have been tagged, which passengers have boarded, and whether the flight is ready to close.

In the event of a flight delay or gate change, the DCS triggers notifications that flow through the airline’s notification infrastructure to reach passengers via SMS and app push notifications.

Notification Infrastructure

Notifications in an airline system are not a nice-to-have. A missed gate change notification can mean a passenger misses their flight. A delayed refund notification affects trust. A timely check-in reminder reduces no-shows.

The notification system consumes events from the main booking and operations event stream. When a booking is confirmed, a BOOKING_CONFIRMED event is published. The notification service listens for this event and dispatches confirmations via email, SMS, and push notification.

Different notification channels have different reliability characteristics. Email is reliable but slow. SMS is fast but can fail in poor network areas. Push notifications require the user to have the app installed with notifications enabled.

For critical notifications like booking confirmations, the system sends via multiple channels simultaneously and tracks delivery receipts where possible. For reminders, the system schedules notifications in a job queue with retry logic — if an SMS fails to deliver, the job retries with exponential backoff.

At scale, notification delivery is managed through dedicated providers: AWS SES or SendGrid for email, Twilio or Route Mobile for SMS, Firebase Cloud Messaging for push. The booking system itself does not directly call these providers — it publishes events, and the notification service handles the downstream dispatch. This decoupling means a transient SMS provider outage does not affect the booking flow.

Cancellation and Refund Systems

Cancellation is the unhappy path, and it involves undoing a completed distributed transaction — which is hard.

When a user cancels a booking, the system must: update the reservation state to CANCELLED, calculate the refund amount based on the fare rules and cancellation timing, release the inventory back to the flight, initiate the refund through the payment service, generate a cancellation ticket, and notify the passenger.

Refund calculation is non-trivial. IndiGo (like most airlines) has tiered cancellation penalties based on how many hours before departure the cancellation occurs. The fare rules engine must apply these penalties correctly, accounting for non-refundable components like certain taxes and surcharges, seat selection fees, and baggage fees.

Cancellation Window Refund Percentage Processing Time
More than 7 days before departureHigh (minus penalty)5-7 business days
2-7 days before departureMedium (minus penalty)5-7 business days
Less than 48 hoursLow or nil depending on fare type5-7 business days
Non-refundable faresTaxes only5-7 business days

Payment reversal is initiated through the payment gateway’s refund API. The refund takes days to reflect in the customer’s account because it must travel through the payment network back to the customer’s bank or card, which operates on T+2 or T+3 settlement cycles.

Reconciliation is a significant operational challenge. The airline’s internal accounting system must track every booking, payment, refund, and partial refund and ensure that the amounts match what the payment gateway and bank settlement reports show. Discrepancies must be investigated and resolved, which requires detailed audit logs at every step of the payment and refund flow.

Event-Driven Architecture

Modern airline platforms are deeply event-driven. Almost every significant action in the booking lifecycle produces an event, and those events drive subsequent processing asynchronously.

graph TD A[Reservation Service] –>|BookingConfirmed| B[Event Bus] C[Payment Service] –>|PaymentCaptured| B D[Inventory Service] –>|InventoryUpdated| B E[Ticketing Service] –>|TicketIssued| B F[Check-in Service] –>|PassengerCheckedIn| B B –> G[Notification Service] B –> H[Analytics Platform] B –> I[Loyalty Service] B –> J[Search Index Updater] B –> K[Audit Log Service] B –> L[Reconciliation Service] classDef producer fill:#1565c0,color:#fff,stroke:none classDef bus fill:#0d47a1,color:#fff,stroke:none classDef consumer fill:#0a3d62,color:#fff,stroke:none class A,C,D,E,F producer class B bus class G,H,I,J,K,L consumer

The event bus (typically Kafka in production systems of this scale) acts as the central nervous system. Services publish events without knowing who will consume them. New consumers — a new analytics pipeline, a new loyalty integration — can be added without modifying the producers.

Kafka’s durability guarantees mean that events are not lost even if a consumer is temporarily down. When the notification service restarts after an outage, it resumes consuming from where it left off and delivers any confirmations that were queued during downtime.

Kafka partitioning ensures ordering. Events for the same booking are routed to the same partition, guaranteeing that a TicketIssued event for booking X is processed after the BookingConfirmed event for booking X by the same consumer.

Database Design

The core entities in an airline booking system and their rough schemas:

Flights table: flight_id, flight_number, origin, destination, departure_time, arrival_time, aircraft_type, status. Indexed on (origin, destination, departure_date) for search queries.

Inventory table: flight_id, fare_class, total_seats, available_seats, held_seats, sold_seats, version. The version column enables optimistic locking. This table sees extremely high write contention and may benefit from row-level sharding or in-memory replication.

Bookings table: booking_id, pnr, status, created_at, updated_at, user_id, total_fare, currency. Partitioned by created_at for query performance. Indexed on pnr for fast lookups.

Passengers table: passenger_id, booking_id, first_name, last_name, dob, passport_number, nationality, seat_number, meal_preference. Linked to bookings via booking_id.

Payments table: payment_id, booking_id, amount, currency, status, gateway_transaction_id, payment_method, created_at, updated_at. The gateway_transaction_id must have a unique index to prevent duplicate payment recording.

Tickets table: ticket_id, ticket_number, booking_id, passenger_id, flight_id, issued_at, voided_at, status. Ticket numbers are globally unique and indexed for fast retrieval.

Refunds table: refund_id, booking_id, payment_id, amount, reason, status, initiated_at, completed_at. Linked to both the booking and the original payment for reconciliation.

Partitioning strategy: bookings and payments tables are partitioned by date range. Older partitions are moved to cold storage or archived. Active partitions for recent months remain on fast SSD-backed storage. This keeps query performance high as the dataset grows over years.

Caching Systems

Caching is what allows an airline platform to serve millions of requests per day without melting its databases.

Flight schedule data — routes, flight numbers, departure times — changes infrequently. This is aggressively cached at the CDN level and in application-level Redis caches with TTLs of several hours.

Fare data changes frequently but not per-millisecond. Fare cache entries for a specific route-date-class combination have TTLs of one to five minutes. This means a user might see a price that is slightly stale, but the actual price is confirmed at checkout. If the price has changed, the user is informed and asked to confirm.

Inventory data is the most volatile. The available seat count for a hot route can change tens of times per minute. The inventory service uses a write-through cache: every inventory update writes to both Redis and the database. The cache is the primary read path for the booking flow (checking if seats are available), while the database is the authoritative write path (actually decrementing counts with optimistic locking).

Search results for popular routes like DEL-BOM or BOM-GOI are precomputed and cached at the CDN edge. When a user in Chennai searches for a popular domestic route, the response can be served from a CDN node in Chennai without any backend processing. This reduces both latency and backend load dramatically.

Cache invalidation for inventory is event-driven. When an inventory update event is published to Kafka, a cache invalidation consumer reads the event and invalidates or updates the relevant cache entries. This keeps the cache reasonably fresh without requiring the write path to manage cache coherence synchronously.

Scalability Deep Dive

IndiGo sees enormous traffic spikes during flash sales, long weekends, holiday announcements, and flight schedule releases. The system must handle 10x or 20x normal traffic during these events.

The search service is the easiest to scale horizontally. Since it is stateless and primarily read-oriented, adding more search service instances behind the load balancer immediately increases capacity. The search index (Elasticsearch) can be horizontally scaled by adding data nodes and increasing replica count.

The inventory service is harder to scale because it is write-heavy and requires strong consistency. Horizontal scaling of the inventory service requires careful partitioning — all inventory operations for a given flight must go to the same database shard, to prevent cross-shard distributed transactions. Flights can be partitioned by flight_id modulo the number of shards.

Component Scaling Strategy Bottleneck Mitigation
Search ServiceHorizontal, statelessIndex refresh lagIncrease replicas, tune refresh interval
Fare EngineHorizontal, statelessFare rules DB readsCache fare rules aggressively
Inventory ServiceSharded by flightWrite contention on hot flightsRedis atomic ops, retry with backoff
Reservation ServiceHorizontal with saga orchestrationPayment gateway latencyAsync payment confirmation
Payment ServiceHorizontal, idempotentGateway rate limitsMultiple gateway integrations, circuit breaker
Notification ServiceHorizontal consumers from KafkaProvider rate limitsQueue and batch, multiple providers

During flash sales, one additional technique is queue-based load leveling. Instead of allowing all users to hit the booking service simultaneously, the system places booking requests into a queue and processes them at a controlled rate. Users see a virtual waiting room UI while their request queues. This protects backend systems from being overwhelmed while ensuring all users are served in order.

Reliability and Availability

An airline booking platform must be extremely available. Downtime directly costs revenue and damages brand trust. The target availability for the core booking path is typically 99.95% or higher.

Reliability is achieved through redundancy at every layer. Multiple availability zones for all services. Active-active database replication for reads. Automated failover for primary database failures. Circuit breakers on all external dependencies — if the payment gateway starts responding slowly, the circuit breaker opens and surfaces a friendly error instead of allowing requests to pile up and cause cascading failures.

Health checks and automated instance replacement ensure that unhealthy service instances are removed from the load balancer pool and replaced by the container orchestration platform (Kubernetes) automatically.

Chaos engineering — deliberately injecting failures into the system in controlled ways — is practiced by mature engineering teams to identify weaknesses before they manifest as production incidents.

Security Considerations

Payment security is governed by PCI-DSS compliance requirements. IndiGo’s payment infrastructure does not store raw card numbers — these are tokenized by the payment gateway and replaced with non-sensitive tokens that can be stored and used for refunds without exposing card data.

All communication between services is encrypted in transit using TLS. Authentication between services uses mutual TLS or JWT tokens issued by a central identity service.

Fraud detection runs asynchronously on booking data. Signals like unusual booking velocity from a single IP, multiple bookings for the same passenger on overlapping flights, and card BIN country mismatch with billing address trigger fraud scoring. High-score bookings may be flagged for manual review or automatically blocked.

Bot attacks and fare scraping are countered with rate limiting at the API Gateway level, CAPTCHA challenges for suspicious traffic patterns, and behavioral analysis that distinguishes human users from automated scrapers by examining mouse movement patterns, request timing, and session behavior.

Engineering Tradeoffs

Every architectural decision in a booking system involves tradeoffs. Understanding these tradeoffs is what separates a good system designer from a great one.

Consistency vs availability in inventory: A strictly consistent inventory system ensures zero double bookings but may reject valid booking requests under high contention (for example, when optimistic locking retries are exhausted). A more available system might allow a tiny number of oversells and resolve them operationally. Most airlines choose strict consistency and accept the small percentage of failed booking attempts, because overselling a seat creates a far worse experience than a booking failure.

Reservation hold duration vs user experience: A longer hold gives users more time to complete payment, reducing checkout abandonment. But longer holds mean inventory is locked for longer, which can make flights appear more full than they are, discouraging other potential buyers. Airlines tune hold durations carefully — 10 to 15 minutes is a common sweet spot.

Fare caching vs freshness: Caching fares for five minutes means users occasionally see slightly stale prices. The alternative — computing fares in real time for every display — is far more expensive and slower. The solution is to cache the displayed fare but revalidate the price at checkout, which is the industry standard approach.

Dynamic pricing vs predictability: Revenue management systems that reprice frequently maximize airline revenue but create an unpredictable experience for users. This is a deliberate business decision, not a technical limitation. Some airlines offer price-lock features (often for a small fee) that let users lock a fare while they decide, which is itself a product built on top of the reservation hold mechanism.

Real-World Technology Stack

Large-scale airline booking platforms typically use a combination of the following technologies, each chosen for specific characteristics:

Java and Spring Boot form the backbone of most reservation and payment services. Java’s maturity, strong concurrency support, and vast ecosystem make it a reliable choice for financial-grade transaction processing.

Go is increasingly used for high-throughput, low-latency services like the search API and the notification dispatcher. Go’s lightweight goroutines and fast startup time make it excellent for services that handle massive concurrent request volumes.

Apache Kafka is the event streaming backbone. It provides durable, ordered, partitioned event streams with replay capability. For an airline platform, the ability to replay events is invaluable for debugging incidents and reprocessing failed operations.

Redis serves multiple roles: distributed cache, session store, inventory counter with atomic operations, and rate limiter. Its single-threaded event loop and in-memory speed make it ideal for high-throughput, low-latency operations.

PostgreSQL is the primary relational database for bookings, payments, and passengers. Its strong ACID guarantees, row-level locking, and mature replication features make it well-suited for financial transaction records.

Elasticsearch powers the flight search index. Its full-text search and aggregation capabilities, combined with horizontal scalability, make it the right tool for search workloads.

Kubernetes orchestrates all services. It provides automatic scaling, health management, rolling deployments, and resource isolation.

CDN systems (Akamai, Cloudflare) cache static assets and popular search results at edge nodes globally, reducing latency for end users and offloading traffic from origin servers.

System Design Interview Perspective

When an interviewer asks you to design an airline booking system like IndiGo, they are primarily testing your understanding of distributed systems, concurrency, and tradeoffs — not your knowledge of airlines specifically.

The most common opening question is: “Design a flight search and booking system.” Strong candidates immediately clarify requirements: read vs write ratio, expected scale, consistency requirements, and failure tolerance.

The areas that differentiate strong from weak answers are exactly the areas covered in this blog. Weak candidates design a simple CRUD application with a single database. They propose a direct SELECT from the inventory table for every search query and ignore concurrency issues in the booking flow.

Strong candidates identify the read-write asymmetry immediately: search is orders of magnitude more frequent than booking, so they deserve separate optimization strategies. They introduce caching for search, a dedicated inventory service with optimistic locking for booking, and an event-driven architecture for post-booking processes.

The inventory management section is the most important thing to get right in an interview. Explain fare buckets, soft holds, optimistic locking, and expiry TTLs. This shows you understand the real complexity of the problem.

The payment section is where candidates demonstrate understanding of distributed failure scenarios. Discuss idempotency keys, the dual-write problem, and the reconciliation job. Interviewers at companies like Flipkart, Razorpay, MakeMyTrip, and OYO ask exactly these kinds of questions.

On scalability, go beyond “add more servers.” Discuss partitioning strategies for the inventory service, CDN caching for search, and Kafka-based decoupling for async processing. Explain the tradeoffs of each decision.

Common mistakes to avoid: not handling the inventory race condition, not discussing payment failure scenarios, ignoring the hold expiry mechanism, and not distinguishing between the search data model and the booking data model.

The best answers treat the booking system as what it really is: a distributed systems problem with financial consequences, where correctness and reliability matter more than raw throughput.

Closing Thoughts

Building a system like IndiGo’s booking platform is one of the genuinely hard engineering challenges in the software industry. It combines real-time inventory management, financial transaction correctness, distributed system coordination, and high availability into a single product that millions of people depend on daily.

The engineering principles that make this system work — event sourcing, idempotency, optimistic locking, saga-based distributed transactions, aggressive caching with event-driven invalidation, and graceful degradation — are the same principles that apply across any system that must be fast, reliable, and correct at scale.

If you understand how a flight booking platform is built, you understand distributed systems. And that understanding translates directly into better system design across every domain you work in.

Comments