HTTP Load Balancing with nginx

While it performs great as a general purpose web server, it is also possible to configure powerful HTTP load balancing with nginx quite easily. Load balancing allows you to optimise resource utilisation and maximise throughput of your web services by distributing incoming traffic over multiple physical servers or VPS. It also allows for redundancy & fault tolerance – should one of the servers in your load balanced group go offline, traffic to these servers will be redirected to healthy servers until such a time as it comes back online.

Reverse proxying in nginx includes load balancing for HTTP, HTTPS, FastCGI, uwsgi, SCGI, and memcached. It's also possible to use nginx to load balance email and other TCP or UDP services.

nginx Load Balancing Methods

There are several load balancing methods supported by nginx

Round Robin
Requests will be distributed across load balanced servers in a round-robin fashion
Least Connected
Incoming requests will be assigned to the server with the least amount of connections
IP Hash
With round-robin or least-connected load balancing each request may be directed to a different upstream server. If your application requires session persistence (ie. a user must log in, and session information is stored on each individual server), you can use nginx IP hash load-balancing to ensure each subsequent request from a client will reach the same server (unless of course the upstream server goes offline).

Simple Example

Now, lets start with the basics. If you look at the virtual host configuration below, you’ll see that there are 3 identical instances of our application running on app01-app03.example.com.
If you don’t specify a load balancing mechanism nginx will default to using round-robin. In this first example all requests to example.com will be evenly distributed amongst the servers defined in the roundrobinapp block.

http {
    upstream backend {
        server app01.example.com;
        server app02.example.com;
        server app03.example.com;
    }

    server {
        listen 80;
        server_name www.example.com example.com;

        location / {
            proxy_pass http://backend;
        }
    }
}

To configure load balancing for HTTPS instead of HTTP, just use “https” as the protocol.

Least Connected

If your application servers will serve long-running requests, using the least-connected methodology can help to keep your application servers from becoming overloaded with requests. Least connected will, as the name suggests, pass requests to the upstream server with the least amount of clients connected.

upstream backend {
    least_conn;
    server app01.example.com;
    server app02.example.com;
    server app03.example.com;
}

Weighted Load Balancing

If your application servers have different hardware specifications, it may be desirable to have one or more serve more requests than another. It is possible to specify a weight with each upstream server declaration – the higher the weight of the server, the more requests nginx will proxy to it.

upstream backend {
    server app01.example.com weight=3;
    server app02.example.com;
    server app03.example.com;
}

In this example, for each 5 requests that nginx recieves, it will proxy 3 to app01, and one to each of app02 and app03.
The default weight for all servers is 1, but you may change this if desired.

In recent versions of nginx it also became possible to use server weighting with the least-connected and ip-hash load balancing methods.

Session Persistence / Sticky Sessions

As mentioned in the overview above, if your application servers are unable to share session data about connected clients, it’s possible to ensure that each subsequent request will reach the same server each time. This is generally referred to as session persistence or ‘sticky sessions’, and can be enabled by adding the ip_hash declaration in your upstream block.

upstream backend {
    ip_hash;
    server app01.example.com;
    server app02.example.com;
    server app03.example.com;
}

Excluding Servers

It is possible to exclude servers from your upstream by explicitly marking them as down.

upstream backend {
    server app01.example.com down;
    server app02.example.com;
    server app03.example.com;
}

This may be useful in several situations, such as a staggered upgrade of each application server – as you upgrade each server you can manually remove it from the upstream pool, and nginx will direct traffic to the other servers in the group.

Backup

By using the backup flag, nginx will only direct traffic to a server, or set of servers, if the primary servers of the group are down.

upstream backend {
    server app01.example.com;
    server app02.example.com;
    server app03.example.com;
    server backup01.example.com backup;
}

Failover

If for any reason an upstream server becomes unresponsive, nginx will temporarily remove it from the pool and send requests to the next available server in the group. If this happens, the client will not experience any downtime, though they may experience a longer-than-usual response time as nginx will wait a set time for the server to respond before marking it as down.
You are able to specify the maximum number of failures and the failure timeout period explicitly, which dictate under what circumstances nginx will mark a load-balanced server as ‘down’.

upstream backend {
    server app01.example.com max_fails=3 fail_timeout=5s;
    server app02.example.com;
    server app03.example.com;
}

Note: max_fails will default to 1, and fail_timeout to 10 seconds if they aren’t specified. It is also possible to set max_fails to 0 which will disable health checks.

Further Reading

If there’s anything that I’ve missed, or you’d like more information on this topic, you might be interested in taking a look at the official “Using nginx as a HTTP load balancer” page from nginx.org and ngx_http_upstream_module documentation.