Saturday, June 6, 2015

domain name system - Is there a way to add another redundant DNS along with CloudFlare?



My question stems from the recent incident with CloudFlare on July 2nd deploying bad software on their systems and causing downtime to most of their clients, including us. https://blog.cloudflare.com/cloudflare-outage/




Is there a way to have CloudFlare manage our DNS, but on top or along with it to have a redundant nameserver on standby in case CloudFlare goes down again?
Or some kind of solution that adds redundancy outside of the CloudFlare network but will work while we have our service still on CloudFlare?
I'm open to anything along these lines.



The only thing that pops into mind is to have an external observer app monitor the state of our setup and if there is a major issue like the recent one at the DNS level, just have the registrar DNS servers changed with a standby DNS server maintained by us through the registrar API. That standby DNS server would provide some fail-safe records that allow our project to be reached, while lacking the CDN and traffic load-balancing capabilites of CloudFlare.
Would this be a possible and even preferred solution to my question?



Some background on our project's setup:
It is of paramount importance that our project has zero downtime, and we work really hard for this.
At the server/service level our project is fully redundant, we have redundant servers for all services (haproxy, application, database) in three zones in the US. The database is handled by Galera Cluster multi-master which on top of its internal negotiation mechanism is monitored by a custom external observer app that can reconfigure one of the DB servers to act as master even if all three DB servers somehow become split from one another. So even if two zones fail, the remaining zone will have its database promoted to master, and the other two will remove themselves from the cluster, awaiting manual intervention - this is a worst case scenario.
At the front of this setup sits CloudFlare which does load balancing at the DNS level for us, and the traffic gets distributed among the three zones, then in turn gets distributed from one service to the next depending on server load and connection time, and also allows for cross-zone distribution of requests.
Since CloudFlare is so massive we have wrongly not regarded it as a single-point-of-failure, which it actually is, as we saw with the recent downtime event caused by them.




As something worth mentioning, we cannot and do not want to part ways with CloudFlare, they are great for us 99.9% of the time, their Argo service gets us massive speed increase, and besides, we would just end up with another CDN that would still be a single-point-of-failure, and prone to the same issues.


Answer



Of interest will be: https://blog.serverfault.com/2017/01/09/surviving-the-next-dns-attack/



You will have to give up managing your DNS from the Cloudflare console. But https://www.cloudflare.com/dns/ offers several alternatives for when you don't want to allow them (completely) manage your DNS.




Cloudflare requires users to change their DNS when signing up for Cloudflare. If you’re not able to move or change your DNS to Cloudflare, you can set up Cloudflare via CNAME with an Enterprise subscription. You can also set up Cloudflare as a secondary DNS provider ...






Primary-Secondary DNS



An existing DNS provider acts as the primary DNS, including management of records and resolution. Record updates are made to the primary DNS provider. Once configured, the primary DNS provider automatically updates Cloudflare’s DNS. Both Cloudflare’s DNS and the primary DNS provider see DNS traffic, with recursive servers deciding which DNS to use.



No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...