high availability - Understanding the nameserver aspect of a DNS based failover system

Tuesday, November 28, 2017

high availability - Understanding the nameserver aspect of a DNS based failover system

As part of a project I'm involved in, system is required with as close to 99.999% uptime as possible (the system involves healthcare). The solution I am investigating involves having multiple sites which in turn have their own load balancers and multiple internal servers, and their own replicated database which is synchronised with every other site. What sits in front of all of this is a DNS based failover system that redirects traffic if a site goes down (or is manually taken down for maintenance).

What I'm struggling with however is how the DNS aspect functions without preventing a single point of failure. I've seen talk of floating IPs (which present that point of failure), various managed services such as DNSMadeEasy (which don't provide the ability to fully test their failover process during their free trial, so I can't verify if it's right for the project or not) and much more, and have been playing around with simple solutions such as assigning multiple A records for a domain name (which I understand falls far short given the discrepancies between how different browsers will interact with such a setup).

For a more robust DNS based approach, do you simply stipulate a nameserver for each location on a domain, run a nameserver at each location, and update each nameserver's independent records regularly when a failure is detected at another site (using scripts run on each nameserver to check all other sites)? If so, aren't there still the same issues that are found with regularly changed A records (browsers not updating to the new records, or ignoring very low TTLs)?

Here's a visual representation of how I understand the system would work.

I have been reading around this subject for several days now (including plenty of Q&As on here), but feel like I'm missing a fundamental piece of the puzzle.

Thanks in advance!

Blog

Tuesday, November 28, 2017

high availability - Understanding the nameserver aspect of a DNS based failover system

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server