Sunday, July 31, 2016

web server - Transparent geographical DR website failover

We've already got webservers that are loadbalanced. And even though outages shouldn't happen, they do, for a variety of reasons. (central switch failure, misconfigured ISP routers, backbone failures, DOS attack on shared infrastructure) I want to put a second set of servers in a completely different geographical location with entirely different connections. I can sync the SQL servers with a number of different techniques, so that's not a problem. But what I don't know how to do is transparently redirect existing user web sessions to the backup servers when the primary goes down or becomes unreachable.



AFAIK, the three most common ways of dealing with this are:




  • DNS load balancing, which uses a very-low TTL to intelligently

    resolve DNS requests to server IPs in the best environment.

  • Intelligent Redirection, which uses a 3rd site to authoritatively
    redirect users to well-known, but secondary DNS names like
    na1.mysite.com and eu.mysite.com.

  • Use an intelligent, minimal proxy server to relay the requests to different sites while hosting the proxy server in the cloud somewhere.



But in the case of a site failure, the first would leave users unable to reach the server until the TTL causes clients to requery DNS and resolve to the DR site, or causes excessive extra DNS requests. The second method still leaves us with a potential single-point-of-failure (although I could see multiple A-records being used to duplicate the master "login" role between environments) but still doesn't redirect users when the site that they're currently using goes down. And the third isn't redundant at all if the cloud goes down. (as they all have from time to time)



From what I know about networking, isn't there a way that I can give 2 different servers in 2 geographically separated environments the same overlapping IP address and let IP packet routing take over and route traffic to the server accepting requests? Is this only feasible with IPv6? What is it called and why don't DR site failovers currently use such a technique? Update: This is called anycast. How do I make this happen? And is it worth the trouble?




To clarify: this question is specific to HTTP server traffic only with service interruption allowed for up to 60 seconds. Users should not need to close their browser, go back to the login page, or refresh anything. Mobile users cannot accept an extra DNS query for every page request.

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...