Monday, February 6, 2017

Some DNS servers in the world giving wrong IP address for our domain?



Our domain, grahamhancock.com is being wrongly resolved by a few people around the world, but it resolves correctly for most people.



When I run through a list of free open DNS providers, about 90% resolve correctly and give information consistent with our zone file. 10%, however, do not, and claim the IP address to be one linked to some Amazon EC2 instance which we've never owned or used ever in the past. Here are some example DNS servers giving the wrong information:




dig www.grahamhancock.com @173.84.127.88
dig www.grahamhancock.com @209.222.18.222


How could these servers have the wrong information, and how can we get back control of the situation?



Could this be something malicious, or a misconfiguration? We're a 1-million-hits-a-month site, with good search rankings, so we're probably a target for something malicious. The wrong IP address that the erroneous server are returning to some people points to some get-rich-quick site on an AWS EC2 instance.



What should we do?


Answer




Drifter is correct, you have a nameserver configuration problem. Here's the tail end of the output from dig +trace +additional www.grahamhancock.com:



grahamhancock.com.      172800  IN      NS      ns1.grahamhancock.com.
grahamhancock.com. 172800 IN NS ns2.grahamhancock.com.
grahamhancock.com. 172800 IN NS server.grahamhancock.com.
ns1.grahamhancock.com. 172800 IN A 199.168.117.67
ns2.grahamhancock.com. 172800 IN A 199.168.117.67
server.grahamhancock.com. 172800 IN A 199.168.117.67
;; Received 144 bytes from 192.35.51.30#53(f.gtld-servers.net) in 92 ms


www.grahamhancock.com. 14400 IN CNAME grahamhancock.com.
grahamhancock.com. 14400 IN A 199.168.117.67
grahamhancock.com. 86400 IN NS ns2.grahamhancock.com.com.
grahamhancock.com. 86400 IN NS ns1.grahamhancock.com.com.
;; Received 123 bytes from 199.168.117.67#53(ns2.grahamhancock.com) in 17 ms


Your glue records are pointing to an IP address of 199.168.117.67, which returns the correct response. Your zone however is defining nameserver records ending in com.com. If we +trace one of those nameservers instead...



com.com.                172800  IN      NS      ns-180.awsdns-22.com.

com.com. 172800 IN NS ns-895.awsdns-47.net.
com.com. 172800 IN NS ns-1084.awsdns-07.org.
com.com. 172800 IN NS ns-2015.awsdns-59.co.uk.
;; Received 212 bytes from 192.26.92.30#53(c.gtld-servers.net) in 22 ms

ns1.grahamhancock.com.com. 30 IN A 54.201.82.69
com.com. 172800 IN NS ns-1084.awsdns-07.org.
com.com. 172800 IN NS ns-180.awsdns-22.com.
com.com. 172800 IN NS ns-2015.awsdns-59.co.uk.
com.com. 172800 IN NS ns-895.awsdns-47.net.

;; Received 196 bytes from 205.251.195.127#53(ns-895.awsdns-47.net) in 16 ms


...we end up at someone's AWS hosted nameservers.



Your problem is something known as a glue record mismatch. Remote nameservers are initially learning about your domain via the glue records, but once those remote servers perform a refresh they end up querying the bogus nameservers that you've defined with an extra .com at the end.



This is not your only problem. You are listing the same IP address three times in your glue records, which is extremely volatile. You should always have multiple nameservers, they should never share a subnet or upstream network peer, and they should never be located at the same physical location. As matters currently stand, any brief routing problem between DNS servers and your single server will cause your domain to be temporarily unreachable.







Update:



This Q&A has been featured on the front page and is getting lots of comments. Unfortunately, that includes people who are just a little too eager to reply to this answer without checking to see if their points have already been addressed in the expanded comments.



The detail that most people seem to be overlooking is the comment that I'm quoting here:




  • [...] geo-redundant DNS servers prevent scenarios where a brief routing interruption results in temporary negative caching of nameservers. However brief the negative caching period ends up being, it will almost certainly exceed the amount of time that there was a connectivity interruption. [...] the number of scenarios where lack of DNS geo-redundancy won't create sporadic and difficult to troubleshoot availability problems is exactly zero.




If you think my understanding of negative caching of nameservers is wrong, that's open game for discussion, but outside of that you need to bring something to the table other than "it's a small site and who cares if both the website and DNS server are down at the same time". If you're saying this you don't understand the topic nearly as well as you think you do.



Second Update:



I went ahead and wrote a canonical Q&A that we can link to whenever the single DNS server topic comes up in the future. Hopefully this puts the matter to rest.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...