16C. Reverse proxy servers and load balancers (Nginx)
There's confusion about what's a reverse proxy vs load balancer since both of them sit between clients and servers, accepting requests from the clients and delivering responses from the servers.
The most common use of a reverse proxy is to provide load balancing for web applications to improve performance through SSL acceleration, compression and caching.
The basic features are:
- A reverse proxy accepts a request from a client, forwards it to a server, and returns the server's response to the client.
- A load balancer distributes incoming requests to a group of servers, in each case returning the response from the selected server to the appropriate client.
Note: a load balancer can balance traffic from layer 3 upwards to layer 7, but a reverse proxy is HTTP specific
For implementation details, please visit Load Balancing with HAProxy (High Availability Proxy).
A load balancer acts as the "traffic cop" sitting in front of our servers and routing client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization and ensures that no one server is overworked, which could degrade performance. If a single server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added to the server group, the load balancer automatically starts to send requests to it.
Load balancers are most commonly deployed when a site needs multiple servers because the volume of requests is too much for a single server to handle efficiently. Deploying multiple servers also eliminates a single point of failure, making the website more reliable. Most commonly, the servers all host the same content, and the load balancer's job is to distribute the workload in a way that makes the best use of each server's capacity, prevents overload on any server, and results in the fastest possible response to the client.
A load balancer can also enhance the user experience by reducing the number of error responses the client sees. It does this by detecting when servers go down, and diverting requests away from them to the other servers in the group. In the simplest implementation, the load balancer detects server health by intercepting error responses to regular requests. Application health checks are a more flexible and sophisticated method in which the load balancer sends separate health-check requests and requires a specified type of response to consider the server healthy.
Another useful function provided by some load balancers is session persistence, which means sending all requests from a particular client to the same server. Even though HTTP is stateless in theory, many applications must store state information just to provide their core functionality - think of the shopping basket on an e-commerce site. Such applications under perform or can even fail in a load-balanced environment, if the load balancer distributes requests in a user session to different servers instead of directing them all to the server that responded to the initial request.
Load balancing method:
- Round robin: Requests are distributed across the group of servers sequentially.
- Weighted round robin: This builds on top of the simple Round Robin load balancing method. In the weighted version each server in the pool is given a static numerical weighting. Servers with higher ratings get more requests sent to them.
- Least connections: Neither Round Robin or Weighted Round Robin take the current server load into consideration when distributing requests. So, in this method, a new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections.
- Weighted Least Connection: Builds on top of the Least Connection method. Like in the Weighted Round Robin method each server is given a numerical value. The load balancer uses this when allocating requests to servers. If two servers have the same number of active connections then the server with the higher weighting will be allocated the new request.
- IP Hash: The IP address of the client is used to determine which server receives the request. This source IP Hash load balancing uses an algorithm that takes the source and destination IP address of the client and server and combines them to generate a unique hash key. This key is used to allocate the client to a particular server. As the key can be regenerated if the session is broken this method of load balancing can ensure that the client request is directed to the same server that it was using previously. This is useful if it's important that a client should connect to a session that is still active after a disconnection. For example, to retain items in a shopping cart between sessions.
- Software Defined Networking (SDN) Adaptive: SDN Adaptive combines knowledge of upper networking layers, with information about the state of the network at lower layers. Information about data from Layers 4 & 7 of the network, and information about the network from layers 2 & 3 is combined when deciding how to allocate requests. This allows information about the status of the servers, the status of the applications running on them, the health of the network infrastructure, and the level of congestion on the network to all play a part in the load balancing decision making.
While deploying a load balancer makes sense only when we have multiple servers, it often makes sense to deploy a reverse proxy even with just one web server or application server. We can think of the reverse proxy as a website's "public face." Its address is the one advertised for the website, and it sits at the edge of the site's network to accept requests from web browsers and mobile apps for the content hosted at the website. The benefits are two-fold:
- Increased security - No information about our backend servers is visible outside our internal network, so malicious clients cannot access them directly to exploit any vulnerabilities. Many reverse proxy servers include features that help protect backend servers from distributed denial-of-service (DDoS) attacks, for example by rejecting traffic from particular client IP addresses (blacklisting), or limiting the number of connections accepted from each client.
- Increased scalability and flexibility - Because clients see only the reverse proxy's IP address, we are free to change the configuration of our backend infrastructure. This is particularly useful In a load-balanced environment, where we can scale the number of servers up and down to match fluctuations in traffic volume.
- web acceleration - reducing the time it takes to generate a response and return it to the client. Techniques for web acceleration include the following:
- Compression - Compressing server responses before returning them to the client (for instance, with gzip) reduces the amount of bandwidth they require, which speeds their transit over the network.
- SSL termination - Encrypting the traffic between clients and servers protects it as it crosses a public network like the Internet. But decryption and encryption can be computationally expensive. By decrypting incoming requests and encrypting server responses, the reverse proxy frees up resources on backend servers which they can then devote to their main purpose, serving content.
- Caching - Before returning the backend server's response to the client, the reverse proxy stores a copy of it locally. When the client (or any client) makes the same request, the reverse proxy can provide the response itself from the cache instead of forwarding the request to the backend server. This both decreases response time to the client and reduces the load on the backend server.
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization