Thursday, May 7, 2015

domain name system - BIND9 DNS Sec Fails at New COLO

I'm working on setting up a new site including new infrastructure and services. I have a Bind DNS server running on CentOS 7.3 and I recently noticed that recursive lookups for external resources are failing. This doesn't happen in our Legacy infrastructure.



Both the new colo and legacy colo have internet access. I'm able to route to external resources in both (8.8.8.8 for instance). Though the ISP and route would differ.



To try and troubleshoot and eliminate config differences between the new/old DNS servers themselves I setup two fresh CentOS 7 VMs. One in the old infra and one in the new infra. I used the same image and method to build both so that they would be the same (minus hostname/ip) post build.



I installed Bind (Ver 9.9.4) and configured both as simple recursive DNS servers (no zone specific configuration or otherwise). Both have the default CentOS configs at:
/etc/named.conf
/var/named/




The only changes I made were in /etc/named.conf:




  • removed 'listen-on port 53 { 127.0.0.1; };' (this makes it listen on port 53 on all devices).

  • set listen-on-v6 port 53 { none; }; (Do not listen on ipv6)

  • set allow-query { any; }; (allow any host to query)



This results in a default /etc/named.conf that looks like:




//
// named.conf
//
// Provided by Red Hat bind package to configure the ISC BIND named(8) DNS
// server as a caching only nameserver (as a localhost DNS resolver only).
//
// See /usr/share/doc/bind*/sample/ for example named configuration files.
//
// See the BIND Administrator's Reference Manual (ARM) for details about the
// configuration located in /usr/share/doc/bind-{version}/Bv9ARM.html


options {
listen-on-v6 port 53 { none; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named_stats.txt";
memstatistics-file "/var/named/data/named_mem_stats.txt";
allow-query { any; };

/*

- If you are building an AUTHORITATIVE DNS server, do NOT enable recursion.
- If you are building a RECURSIVE (caching) DNS server, you need to enable
recursion.
- If your recursive DNS server has a public IP address, you MUST enable access
control to limit queries to your legitimate users. Failing to do so will
cause your server to become part of large scale DNS amplification
attacks. Implementing BCP38 within your network would greatly
reduce such attack surface
*/
recursion yes;


dnssec-enable no;

managed-keys-directory "/var/named/dynamic";

pid-file "/run/named/named.pid";
session-keyfile "/run/named/session.key";
};

logging {

channel default_debug {
file "data/named.run";
severity dynamic;
};
};

zone "." IN {
type hint;
file "named.ca";
};


include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";


I also set each respective server to resolve against itself, and only itself, in /etc/resolv.conf.



From my perspective this should eliminate all other differences except:





  • physical/server hypervisor

  • Colo/network/ISP



I tested Recursive DNS queries on both (to resources like google.com, amazon.com, dropbox.com, etc.).



As with the production environment, in the test environment the recursive queries work from the old infra but not from the new. The dig +trace for the server in the new infra indicates that it can't get the ip address for the root NS:



dig @10.50.60.111 google.com +trace +additional


; <<>> DiG 9.9.4-RedHat-9.9.4-38.el7_3.3 <<>> @10.50.60.111 google.com +trace +additional
; (1 server found)
;; global options: +cmd
. 518400 IN NS b.root-servers.net.
. 518400 IN NS d.root-servers.net.
. 518400 IN NS i.root-servers.net.
. 518400 IN NS g.root-servers.net.
. 518400 IN NS j.root-servers.net.
. 518400 IN NS a.root-servers.net.
. 518400 IN NS l.root-servers.net.

. 518400 IN NS k.root-servers.net.
. 518400 IN NS e.root-servers.net.
. 518400 IN NS h.root-servers.net.
. 518400 IN NS f.root-servers.net.
. 518400 IN NS c.root-servers.net.
. 518400 IN NS m.root-servers.net.
dig: couldn't get address for 'b.root-servers.net': no more


The answer to this should be served by the local bind server itself as we are using the default root hints packaged at /var/named/root.ca




A quick look at the log (/var/named/data/named.run) revealed that the server in the new infra appears to be disregarding these responses because it received an 'insecure response':



validating @0x7fc3c8055510: . NS: got insecure response; parent indicates it should be secure
error (insecurity proof failed) resolving './NS/IN': 198.41.0.4#53


But, the server in the old infra does not have this issue. I tried disabling dnssec (in /var/named.conf) and also passing in +nodnssec to the dig. This results in getting one step further in the recursion but we still fail to get the NS for the com. domain for what appears to be the same reason. Though, in this case the answer would come from the root servers.



I've been looking for the answer and will continue to do so. But, I don't understand what factors would cause this error to occur in one colo/network and not the other when the Server/BIND configuration is otherwise the same. If anyone has any ideas of what would cause this or where I should look next I'd love to hear it.




In general, I'm trying to understand the following:
Could this still be a simple bind misconfiguration?
Could this could be caused by local network configuration?
Do I need to configure something ISP or External DNS side for this to work with the new ISP/IPs?

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...