I've been reviewing BIND/DNS documenatation and I've been unable to find
a clear answer. tl;dr - querying a secondary nameserver for a delegated
zone A record does not work with recusion enabled. And, by defition,
doesn't work with recursion disabled either, since all that is defined
in the zone from our point of view is the NS and glue record.
Software stack: bind-9.3.6-4 on CentOS 5.4 x86 for the secondary nameserver;
bind-9.2.4-30 on Centos 4.7 x86 for the primary nameserver.
I will use master and primary, slave and secondary, as synonyms,
respectively.
Our setup is as follows ( names/IPs changed to protect the innocent ):
ns.pr.example.com == primary nameserver, 10.10.0.1, 192.168.0.1
ns1.pr.example.com == secondary nameserver, 10.11.0.1, 192.168.0.2
ns2.pr.example.com == secondary nameserver, 10.11.0.2, 192.168.0.3
delegated.pr.example.com == delegated sub-zone
nsdelegated.pr.example.com == authoratative NS for
delegated.pr.example.com sub-domain, 10.11.0.5 NOT under our control!
You'll notice that ns1 and ns2 can talk to ns.pr.example.com over an
shared network - 192.168.0.0/24. However, ns.pr.example.com cannot
talk to the nsdelegated.pr.example.com host, which only has a
10.11.0.0/24 address.
The 192.168 network is a stand-in for our public-IP space; but the 10.10
and 10.11 networks are private, closed networks used for cluster
computing. Connecting ns.pr.example.com to the 10.11 network, either
directly or through a static route, is out of the question.
On the primary nameserver, ns.pr.example.com, the following defition is
added to the zone file, along with an updated serial:
/etc/named.conf:
zone "pr.example.com" {
type master;
file [db.filename];
};
db.filename:
delegated.pr.example.com. IN NS nsdelegated.pr.example.com.
nsdelegated.pr.example.com. IN A 10.11.0.5 ; glue record
This is replicated to the slave servers, ns1 and ns2. The record can be
seen, both in the flat files, and confirmed with dig:
slave example
dig -t ns +short @ns1 delegated.pr.example.com
nsdelegated.pr.example.com IN A 10.11.0.5
master example
dig -t ns +short @ns delegated.pr.example.com
nsdelegated.pr.example.com IN A 10.11.0.5
The nsdelegated server itself is responsive:
dig -t a +short @nsdelegated.pr.example.com randomhost.delegated.pr.example.com
10.11.0.222
But, a lookup on the secondary nameserver with the recursion-desired bit
set ( the default ) fails.
dig +recurse +short -t a @ns1 randomhost.delegated.pr.example.com
[no output]
It also fails on the primary server, ns, but that would be expected
since there is no way for ns.pr.example.com to contact 10.11.0.5 and
answer the request. Non-recursive queries also fail, since the relevant
information must be fetched from the nsdelegated.pr.example.com server.
My question is: why are the recursive questions to the secondary
nameservers failing? They have the correct delegation information, an NS
record and a glue record, and they are able to contact the delegated
nameserver.
My hunch is that, as a secondary nameserver, it may somehow be 'passing
on' the recursive question to the primary nameserver, where it then
fails. But I can't find any documentation to this effect, and it doesn't
make intuitive sense.
Any ideas, or debugging suggestions? I turned on maximal logging for
named, as well as query logging, but I couldn't get good information.
There wasn't an obvious "show me the lookups you do on behalf of
clients" log.
Thanks.
Of course you need to specify the delegated zone in named.conf, otherwise bind will think it is just a dotted record it should have, since it is authoritative for the pr.example.com zone.
What you want is something like this. In the master named.conf you specify a new zone (and accordingly a new zone in the slaves):
zone "delegated.pr.example.com." { type master; file [db.filename]; };
and the zone file should be:
delegated.pr.example.com. NS nsdelegated.pr.example.com.
nsdelegated.pr.example.com. IN A 10.11.0.5 ; glue record
Now the main DNS server and its slaves know about the new zone and things should work.
==== EDIT
Mistake, the SOA record is not in ps.example.com but it is in the zone definition of delegated.pr.example.com. Fixed that.