4  Load Balancing

4.1 What is Load Balancing?

A load balancer is a component that distributes incoming network traffic across multiple servers or databases. It acts as a traffic director, ensuring no single server bears too much load while maximizing speed and capacity utilization.

4.2 Why Load Balancing?

Without load balancing:

  • Single server becomes overwhelmed
  • Poor performance during traffic spikes
  • Single point of failure
  • Difficult to scale

With load balancing:

  • Improved availability and reliability
  • Better resource utilization
  • Horizontal scalability
  • Reduced response time
  • Easier maintenance (zero-downtime deployments)

4.3 Types of Load Balancers

4.3.1 Hardware Load Balancers

Characteristics:

  • Dedicated physical devices (F5, Citrix NetScaler)
  • High performance and throughput
  • Advanced features (SSL offloading, DDoS protection)
  • Expensive
  • Vendor lock-in

When to use:

  • Large enterprise environments
  • Very high traffic volumes
  • Strict performance requirements
  • Budget allows for premium solutions

4.3.2 Software Load Balancers

Characteristics:

  • Run on commodity hardware or virtual machines
  • Cost-effective
  • Flexible and customizable
  • Open-source options (NGINX, HAProxy)
  • Cloud-native (AWS ELB, Azure Load Balancer)

When to use:

  • Cloud deployments
  • Budget constraints
  • Need for flexibility
  • Modern microservices architectures

4.4 Load Balancer Configurations

4.4.1 Active-Active Configuration

Multiple load balancers actively serving traffic simultaneously

  • All load balancers handle requests
  • Better resource utilization
  • Higher throughput
  • DNS round-robin or anycast routing

Active-Active Load Balancing

4.4.2 Active-Passive Configuration

One load balancer active, others on standby

  • Primary handles all traffic
  • Secondary takes over on primary failure
  • Simpler to manage
  • Some capacity sits idle

Active-Passive Load Balancing

4.5 Load Balancing Algorithms

Choosing the right algorithm depends on your application’s characteristics and requirements.

4.5.1 1. Round Robin

How it works: Requests are distributed sequentially across servers in rotation.

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)

Pros:

  • Simple to implement
  • Fair distribution
  • No state required

Cons:

  • Doesn’t consider server capacity
  • Ignores current load
  • Not ideal for long-lived connections

When to use:

  • Servers have similar capacity
  • Requests have similar processing time
  • Stateless applications

4.5.2 2. Least Connections

How it works: Routes traffic to the server with the fewest active connections.

Pros:

  • Better for long-lived connections
  • Considers current server load
  • More balanced than round robin

Cons:

  • Requires tracking connection state
  • More complex than round robin

When to use:

  • Long-lived connections (WebSockets, database connections)
  • Variable request processing time
  • Server capacities differ

4.5.3 3. Least Response Time (Least Loaded)

How it works: Routes to the server with the fastest response time and fewest active connections.

Pros:

  • Optimizes for performance
  • Adapts to server performance
  • Accounts for backend processing time

Cons:

  • Most complex to implement
  • Requires health monitoring
  • Higher overhead

When to use:

  • Performance-critical applications
  • Varying backend processing times
  • Heterogeneous server pool

4.5.4 4. IP Hash (Session-Based)

How it works: Client IP address is hashed to determine which server receives the request.

hash(client_ip) % server_count = server_index

Pros:

  • Session persistence (sticky sessions)
  • No session store required
  • Predictable routing

Cons:

  • Uneven distribution if traffic concentrated from few IPs
  • Breaks on server addition/removal
  • Not suitable for clients behind NAT

When to use:

  • Applications requiring session affinity
  • Stateful applications
  • When consistent hashing can’t be used

4.5.5 5. Weighted Round Robin

How it works: Like round robin, but servers are assigned weights based on capacity.

Server A (weight=3): Receives 3 requests
Server B (weight=2): Receives 2 requests
Server C (weight=1): Receives 1 request

When to use:

  • Heterogeneous server capacities
  • Gradual rollout (blue-green deployment)
  • Testing new server versions

4.5.6 6. Random

How it works: Randomly selects a server for each request.

Pros:

  • Simple
  • No state required
  • Surprisingly effective at scale

Cons:

  • Can be uneven in short term
  • No optimization

When to use:

  • Very simple use cases
  • As a fallback method

4.6 Layer 4 vs Layer 7 Load Balancing

4.6.1 Layer 4 (Transport Layer)

Routes based on: IP address and TCP/UDP port

Characteristics:

  • Fast and efficient
  • Lower latency
  • No content inspection
  • Cannot make routing decisions based on content

Example:

Client → Load Balancer (checks IP:Port) → Backend Server

4.6.2 Layer 7 (Application Layer)

Routes based on: HTTP headers, URL path, cookies, request content

Characteristics:

  • Content-aware routing
  • URL-based routing
  • Header-based routing
  • SSL termination
  • Higher latency (content inspection)

Example:

/api/* → API Servers
/images/* → Image Servers
/static/* → CDN

4.7 Load Balancer Architecture

Load Balancer Architecture

Typical flow:

  1. Client sends request to load balancer
  2. Load balancer selects backend server using algorithm
  3. Load balancer forwards request to chosen server
  4. Server processes request
  5. Response flows back through load balancer to client

4.8 Health Checks

Load balancers continuously monitor backend server health:

Types of health checks:

  • Passive: Monitor actual request failures
  • Active: Periodic health check requests

Health check methods:

  • HTTP endpoint (e.g., /health)
  • TCP connection test
  • Custom scripts

Configuration example:

health_check:
  interval: 10s
  timeout: 5s
  healthy_threshold: 2
  unhealthy_threshold: 3
  endpoint: /health

4.9 Best Practices

4.9.1 1. Multiple Load Balancers

Never use a single load balancer (single point of failure). Deploy at least two in active-active or active-passive mode.

4.9.2 2. Health Monitoring

  • Implement comprehensive health checks
  • Monitor both infrastructure and application health
  • Automatic removal of unhealthy backends

4.9.3 3. SSL/TLS Termination

  • Terminate SSL at load balancer to reduce backend load
  • Use backend encryption for sensitive data
  • Manage certificates centrally

4.9.4 4. Connection Draining

  • Gracefully remove servers from rotation
  • Allow existing connections to complete
  • Prevent new connections to draining server

4.9.5 5. Logging and Monitoring

  • Access logs for debugging
  • Performance metrics (latency, throughput)
  • Error rates per backend
  • Connection pooling statistics

4.10 Common Load Balancers

Open Source:

  • NGINX: High-performance, Layer 7
  • HAProxy: Reliable, Layer 4 and Layer 7
  • Traefik: Cloud-native, microservices-focused

Cloud Providers:

  • AWS: Application Load Balancer (ALB), Network Load Balancer (NLB)
  • Azure: Azure Load Balancer, Application Gateway
  • GCP: Cloud Load Balancing

4.11 Summary

Load balancing is essential for building scalable, highly available systems. The choice of load balancing algorithm and configuration depends on your specific application requirements, traffic patterns, and infrastructure.

Key takeaways:

  • Use multiple load balancers (no single point of failure)
  • Choose algorithm based on application characteristics
  • Implement comprehensive health checks
  • Monitor performance and adjust configuration
  • Plan for horizontal scaling from the start