Tuesday, August 25, 2015

domain name system - Global high availability setup question

I own and operate visualwebsiteoptimizer.com/. The app provides a code snippet which my customers insert in their websites to track certain metrics. Since the code snippet is external JavaScript (at the top of site code), before showing a customer website, a visitor's browser contacts our app server. In case our app server goes down, the browser will keep trying to establish the connection before it times out (typically 60 seconds). As you can imagine, we cannot afford to have our app server down in any scenario because it will negatively affect experience of not just our website visitors but our customers' website visitors too!



We are currently using DNS failover mechanism with one backup server located in a different data center (actually different continent). That is, we monitor our app server from 3 separate locations and as soon as it is detected to be down, we change A record to point to the back up server IP. This works fine for most browsers (as our TTL is 2 minutes) but IE caches the DNS for 30 minutes which might be a deal killer. See this recent post of ours visualwebsiteoptimizer.com/split-testing-blog/maximum-theoretical-downtime-for-a-website-30-minutes/



So, what kind of setup can we use to ensure an almost instant failover in case app data center suffers major outage? I read here www.tenereillo.com/GSLBPageOfShame.htm that having multiple A records is a solution but we can't afford session synchronization (yet). Another strategy that we are exploring is having two A records, one pointing to app server and second to a reverse proxy (located in a different data center) which resolves to main app server if it is up and to backup server if it is up. Do you think this strategy is reasonable?



Just to be sure of our priorities, we can afford to keep our own website or app down but we can't let customers' website slow down because of our downtime. So, in case our app servers are down we don't intend to respond with the default application response. Even a blank response will suffice, we just need that browser completes that HTTP connection (and nothing else).



Reference: I read this thread which was useful serverfault.com/questions/69870/multiple-data-centers-and-http-traffic-dns-round-robin-is-the-only-way-to-assure

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...