4 Load Balancing

4.1 What is Load Balancing?

A load balancer is a component that distributes incoming network traffic across multiple servers or databases. It acts as a traffic director, ensuring no single server bears too much load while maximizing speed and capacity utilization.

4.2 Why Load Balancing?

Without load balancing:

Single server becomes overwhelmed
Poor performance during traffic spikes
Single point of failure
Difficult to scale

With load balancing:

Improved availability and reliability
Better resource utilization
Horizontal scalability
Reduced response time
Easier maintenance (zero-downtime deployments)

4.3 Types of Load Balancers

4.3.1 Hardware Load Balancers

Characteristics:

Dedicated physical devices (F5, Citrix NetScaler)
High performance and throughput
Advanced features (SSL offloading, DDoS protection)
Expensive
Vendor lock-in

When to use:

Large enterprise environments
Very high traffic volumes
Strict performance requirements
Budget allows for premium solutions

4.3.2 Software Load Balancers

Characteristics:

Run on commodity hardware or virtual machines
Cost-effective
Flexible and customizable
Open-source options (NGINX, HAProxy)
Cloud-native (AWS ELB, Azure Load Balancer)

When to use:

Cloud deployments
Budget constraints
Need for flexibility
Modern microservices architectures

4.4 Load Balancer Configurations

4.4.1 Active-Active Configuration

Multiple load balancers actively serving traffic simultaneously

All load balancers handle requests
Better resource utilization
Higher throughput
DNS round-robin or anycast routing

4.4.2 Active-Passive Configuration

One load balancer active, others on standby

Primary handles all traffic
Secondary takes over on primary failure
Simpler to manage
Some capacity sits idle

4.5 Load Balancing Algorithms

Choosing the right algorithm depends on your application’s characteristics and requirements.

4.5.1 1. Round Robin

How it works: Requests are distributed sequentially across servers in rotation.

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)

Pros:

Simple to implement
Fair distribution
No state required

Cons:

Doesn’t consider server capacity
Ignores current load
Not ideal for long-lived connections

When to use:

Servers have similar capacity
Requests have similar processing time
Stateless applications

4.5.2 2. Least Connections

How it works: Routes traffic to the server with the fewest active connections.

Pros:

Better for long-lived connections
Considers current server load
More balanced than round robin

Cons:

Requires tracking connection state
More complex than round robin

When to use:

Long-lived connections (WebSockets, database connections)
Variable request processing time
Server capacities differ

4.5.3 3. Least Response Time (Least Loaded)

How it works: Routes to the server with the fastest response time and fewest active connections.

Pros:

Optimizes for performance
Adapts to server performance
Accounts for backend processing time

Cons:

Most complex to implement
Requires health monitoring
Higher overhead

When to use:

Performance-critical applications
Varying backend processing times
Heterogeneous server pool

4.5.4 4. IP Hash (Session-Based)

How it works: Client IP address is hashed to determine which server receives the request.

hash(client_ip) % server_count = server_index

Pros:

Session persistence (sticky sessions)
No session store required
Predictable routing

Cons:

Uneven distribution if traffic concentrated from few IPs
Breaks on server addition/removal
Not suitable for clients behind NAT

When to use:

Applications requiring session affinity
Stateful applications
When consistent hashing can’t be used

4.5.5 5. Weighted Round Robin

How it works: Like round robin, but servers are assigned weights based on capacity.

Server A (weight=3): Receives 3 requests
Server B (weight=2): Receives 2 requests
Server C (weight=1): Receives 1 request

When to use:

Heterogeneous server capacities
Gradual rollout (blue-green deployment)
Testing new server versions

4.5.6 6. Random

How it works: Randomly selects a server for each request.

Pros:

Simple
No state required
Surprisingly effective at scale

Cons:

Can be uneven in short term
No optimization

When to use:

Very simple use cases
As a fallback method

4.6 Layer 4 vs Layer 7 Load Balancing

4.6.1 Layer 4 (Transport Layer)

Routes based on: IP address and TCP/UDP port

Characteristics:

Fast and efficient
Lower latency
No content inspection
Cannot make routing decisions based on content

Example:

Client → Load Balancer (checks IP:Port) → Backend Server

4.6.2 Layer 7 (Application Layer)

Routes based on: HTTP headers, URL path, cookies, request content

Characteristics:

Content-aware routing
URL-based routing
Header-based routing
SSL termination
Higher latency (content inspection)

Example:

/api/* → API Servers
/images/* → Image Servers
/static/* → CDN

4.7 Load Balancer Architecture

Typical flow:

Client sends request to load balancer
Load balancer selects backend server using algorithm
Load balancer forwards request to chosen server
Server processes request
Response flows back through load balancer to client

4.8 Health Checks

Load balancers continuously monitor backend server health:

Types of health checks:

Passive: Monitor actual request failures
Active: Periodic health check requests

Health check methods:

HTTP endpoint (e.g., /health)
TCP connection test
Custom scripts

Configuration example:

health_check:
  interval: 10s
  timeout: 5s
  healthy_threshold: 2
  unhealthy_threshold: 3
  endpoint: /health

4.9 Best Practices

4.9.1 1. Multiple Load Balancers

Never use a single load balancer (single point of failure). Deploy at least two in active-active or active-passive mode.

4.9.2 2. Health Monitoring

Implement comprehensive health checks
Monitor both infrastructure and application health
Automatic removal of unhealthy backends

4.9.3 3. SSL/TLS Termination

Terminate SSL at load balancer to reduce backend load
Use backend encryption for sensitive data
Manage certificates centrally

4.9.4 4. Connection Draining

Gracefully remove servers from rotation
Allow existing connections to complete
Prevent new connections to draining server

4.9.5 5. Logging and Monitoring

Access logs for debugging
Performance metrics (latency, throughput)
Error rates per backend
Connection pooling statistics

4.10 Common Load Balancers

Open Source:

NGINX: High-performance, Layer 7
HAProxy: Reliable, Layer 4 and Layer 7
Traefik: Cloud-native, microservices-focused

Cloud Providers:

AWS: Application Load Balancer (ALB), Network Load Balancer (NLB)
Azure: Azure Load Balancer, Application Gateway
GCP: Cloud Load Balancing

4.11 Summary

Load balancing is essential for building scalable, highly available systems. The choice of load balancing algorithm and configuration depends on your specific application requirements, traffic patterns, and infrastructure.

Key takeaways:

Use multiple load balancers (no single point of failure)
Choose algorithm based on application characteristics
Implement comprehensive health checks
Monitor performance and adjust configuration
Plan for horizontal scaling from the start