Monday, June 5, 2017

domain name system - Why does this DNS lookup fail for me but work for others?

Day One



I have to hide the actual host names, so I'm hoping there is still enough information to answer this question...




I'm trying to resolve a certain host name (let's pretend it's www.example.com, but this is not the actual host name). A simple dig request works, but when I try to do a series of dig starting from a root nameserver, I hit a dead-end. Here's an example:



# Starting with arbitrarily-chosen root nameserver
$ dig @198.41.0.4 www.example.com
(returns the usual list of TLD .com nameservers)

# Using a.gtld-servers.net
$ dig @192.5.6.30 www.example.com
(returns a list of 5 example.com authorities)



At this point, I tried each of the 5 example.comauthorities. Three of them fail with status SERVFAIL, and the remaining two time out. Here's a SERVFAIL example:



;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 33577
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.example.com. IN A


;; Query time: 74 msec
;; SERVER:
;; WHEN: Tue Mar 8 10:10:33 2011
;; MSG SIZE rcvd: 37


I tried this multiple times, from my own machine at home and from a remote machine in our co-lo, and both machines consistently get the same results.



However,





  • As I mentioned above, dig www.example.com (without specifying an @server) works fine.

  • This DNS trace utility is able to resolve the host name, and it clearly shows that it's using one of the name servers that times out for me!



Can anybody help me figure out what's going on?



EDIT 1: In case it helps, what should happen is that this host name should ultimately resolve to a CNAME record pointing to www.example.com.edgesuite.net, which should in turn resolve to another CNAME record pointing to an Akamai edge server.



EDIT 2: Per Joris's recommendation, I ran dig +trace www.example.com, and it actually failed to find a result. It gets to the same list of example.com authorities that I found before, and stops there.




Caching seems like a very likely culprit (and I did think of this earlier), but the weird part is that the actual host name isn't that popular. Would it be cached on two different ISP local nameservers if I'm the first person to request it? :-)






Day Two



OK, I've discovered a few things:





  1. The two example.com authorities that I thought were timing out (as opposed to the other three, that were returning SERVFAIL) are not actually timing out. They just require a much longer timeout. If I use dig +time=10, for example, then I do eventually get back a result.

  2. I've tried this from several servers around the U.S., and the story is the same -- using dig www.example.com returns a result very quickly, but dig @ns1.example.com (or @ns2.example.com) requires using a large timeout parameter.



So my new questions are:




  1. Could the result really be cached on a variety of proxying DNS servers, even though it's not a commonly-used host name? The TTL is 54,000 (or 15 hours, if I understand correctly).

  2. If not, then is it possible that ns1.example.com is somehow configured to return a result more quickly to proxying DNS servers than to my own dig queries (some kind of white list)? Or is that just crazy talk?

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...