Tuesday, May 23, 2017

domain name system - Is Round-Robin DNS "good enough" for load balancing static content?




We have a set of shared, static content that we serve up between our websites at http://sstatic.net. Unfortunately, this content is not currently load balanced at all -- it's served from a single server. If that server has problems, all the sites that rely on it are effectively down because the shared resources are essential shared javascript libraries and images.



We are looking at ways to load balance the static content on this server, to avoid the single server dependency.



I realize that round-robin DNS is, at best, a low end (some might even say ghetto) solution, but I can't help wondering -- is round robin DNS a "good enough" solution for basic load balancing of static content?



There is some discussion of this in the [dns] [load-balancing] tags, and I've read through some great posts on the topic.



I am aware of the common downsides of DNS load balancing through multiple round-robin A records:





  • there's typically no heartbeats or failure detection with DNS records, so if a given server in the rotation goes down, its A record must manually be removed from the DNS entries

  • the time to live (TTL) must necessarily be set quite low for this to work at all, since DNS entries are cached aggressively throughout the internet

  • the client computers are responsible for seeing that there are multiple A records and picking the correct one



But, is round robin DNS good enough as a starter, better than nothing, "while we research and implement better alternatives" form of load balancing for our static content? Or is DNS round robin pretty much worthless under any circumstances?


Answer



Jeff, I disagree, load balancing does not imply redundancy, it's quite the opposite in fact. The more servers you have, the more likely you'll have a failure at a given instant. That's why redundancy IS mandatory when doing load balancing, but unfortunately there are a lot of solutions which only provide load balancing without performing any health check, resulting in a less reliable service.




DNS roundrobin is excellent to increase capacity, by distributing the load across multiple points (potentially geographically distributed). But it does not provide fail-over. You must first describe what type of failure you are trying to cover. A server failure must be covered locally using a standard IP address takeover mechanism (VRRP, CARP, ...). A switch failure is covered by resilient links on the server to two switches. A WAN link failure can be covered by a multi-link setup between you and your provider, using either a routing protocol or a layer2 solution (eg: multi-link PPP). A site failure should be covered by BGP : your IP addresses are replicated over multiple sites and you announce them to the net only where they are available.



From your question, it seems that you only need to provide a server fail-over solution, which is the easiest solution since it does not involve any hardware nor contract with any ISP. You just have to setup the appropriate software on your server for that, and it's by far the cheapest and most reliable solution.



You asked "what if an haproxy machine fails ?". It's the same. All people I know who use haproxy for load balancing and high availability have two machines and run either ucarp, keepalived or heartbeat on them to ensure that one of them is always available.



Hoping this helps!


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...