Load Balancing the Load balancer

Sriram Ganesan
5 min readSep 10, 2020

The stories of high availability and horizontal scaling of services to handle millions of transactions depend on a service — The Load Balancer.

The Load Balancer can be one at the edge (L4 or L7) or can be internal (ILB).The Traffic Manager services are considered to be Load Balancers as well but for geographically spread targets. The Load balancer fronts any service that needs to scale-out, aka horizontal scaling. This load balancer is again a system in the cloud that does a specific work of efficient routing based on a load-balancing algorithm. Have you ever thought in the lines of “the load balancer can handle only so much. Its just a machine, an IO optimized one, may be. What would happen if the traffic is beyond what it can handle?

This is one of the many scenarios where going Cloud-Native saves you a lot of configuration effort and money. The Cloud Service Provider (CSP) now scales out the load balancer instances to handle the traffic :). Yes the Load Balancers get scaled-out too.

pic source: https://docs.microsoft.com/en-us/azure/application-gateway/application-gateway-autoscaling-zone-redundant

The picture shows an Azure Application Gateway, a Layer 7 Load Balancer. Notice how the scaled out instances are represented. This is to show the ability of the Gateway to auto-scale on load. Now, the LB was fronting the scaled out services, what will front the scaled out LBs? Each LB at the edge is created with a public IP. So how does this work?

Note: These again are my findings from exploring a bunch of basic stuff and I for sure did not get my hands on Azure LB source code or Infrastructure Configuration details. :)

Lets touch upon a few methods in which this scenario can be addressed.To avoid duplicate information, I have left the explanation of the methodologies to be read and understood from the source paper- download “webscale-elasticity-with-modern-load-balancer-white-paper” from avinetworks.com

Approach 1- Tiered Load Balancers

The first approach is straight forward. You front the LBs with another LB in a tiered fashion.

The user’s requests reach the Tier-1 load balancer. The Tier-2 load balancers form the backend pool of Tier-1 LB. Tier 1 LB is usually configured at Layer-4. The obvious downside is the performance on Tier-1 LB. Also there is the usual question of “Does this become the single point of failure?”

Approach 2 — DNS + PROXY LOAD BALANCER

Excerpt from the white paper →

When users access an application, a DNS lookup of the domain name occurs first — e.g. when a user accesses www.facebook.com, the domain name resolves to an IP address, and the user’s browser or app connects to the IP address.

Usually, a domain name resolves to a single IP address. However, multiple IP addresses can be associated with a domain name, and the DNS resolver can step through the list of IP addresses and return a different IP address for each DNS query

This requires you to add multiple A records for your domain name in the Domain Name Registrar’s records.

This approach looks pretty straight-forward and addresses the problems of performance and single point of failure as seen in the previous approach. However for production workloads of sites including amazon.com and Netflix.com, the number of Load Balancer instances can exceed 100. Azure API Gateway offers a maximum of 125 LB instances for a scale-out scenario. One caveat in my opinion is that creating 100+ public IP addresses is a cost overhead. Also the DNS cache Time to Live (TTL) can be holding stale endpoints for a while and may become a tricky situation to handle. Not an elegant way of handling scenarios involving huge volume of requests!

Approach 3- Any-cast LoadBalancer

In this approach we have just one public-IP that is shared by the N number of load balancer instances. The simplicity of this approach lies in just the network configuration. Excerpt from the white paper →

“In this approach, the domain name resolves to a single IP address. The IP address is added to its upstream router with multiple physical load balancers as the next hop. The router performs flow-based equal cost multi- pathing (ECMP) and sends every user flow to a different next hop/physical load balancer”.

When the user hits a FQDN in his/her browser, the request traverses through multiple routers before it reaches the target server. The router that is upstream to the N load balancers has a routing configuration that carries N rows (corresponding to N load balancers) as the next hop for the requested target IP.

Networking has many fascinating things that happen behind the scenes, less do we know to appreciate the technical concepts. At least me! :)

A basic understanding of a router’s route table can help in understanding this approach. An example from one other gem — best practices in IPV4 Anycast Routing. Download and read through this presentation for detailed information.

So when comparing the approaches, approach 3 has none of the limitations of the other 2 and scales well. You can add the required number of load balancer instances (VMs or Physical machines), updating the route table of the upstream router(s) is all it takes to accomplish a load balancer scale-out.

On a finishing note- a small mind tickling question

With a scale-out of Load Balancers (125 instances in Azure app gateway), how the concept of “Rate Limiting” would have been designed and implemented. Each load balancer instance handles thousands of messages per second. However all of these requests are to the same endpoint/FQDN. How do we get an unified counter? :)

As always, happy reading folks. That’s all for today.

--

--

Sriram Ganesan

A passionate developer currently in the journey of a solutions architect. “Under the hood” learning of architectures gives me unparalleled happiness