Tuesday, November 11, 2014

domain name system - Need help understanding Windows DNS, DHCP and dynamic PTRs

I have inherited management of a set of AD servers running AD-backed DNS. One of them is running DHCP.

For IP ranges which are served by this Windows DHCP server, the in-addr.arpa zones are being served authoritatively by the AD DNS servers to allow AD to be happy with itself and allow dynamic DNS to work properly.

For each of these in-addr.arpa zones, I am also pulling secondary from my unix/linux name servers running BIND9.

I have begun frequently seeing errors like this in my logs on the *nix servers:

Jun  8 13:40:07 ns1 named[6083]: general: warning: '': TTL differs in rdataset, adjusting 900 -> 1200

I understand from a technical, DNS-perspective what is happening here. There are >1 records for the and they have different TTLs. BIND is normalizing the TTLs and informing me about it. I have confirmed that this is the case by grabbing a manual AXFR of the zone:

ns1 0 /home/jj33 ># xfer 37.19.172.in-addr.arpa ad-dns | grep '^3\.'                  < 900 IN PTR 0509-l3-tmbxt.example.ad. 1200 IN PTR 0402-3p2jf41.example.ad.

In looking at the Windows DNS and DHCP tools, it seems very likely based on the lease time that 0402-3p2jf41.example.ad either returned its lease or went away and never came back, allowing the lease to expire. 0509-l3-tmbxt.example.ad came along, picked up the IP, and inserted its name for the record.

So, with all the explanation done, I have several questions:

  • In the DDNS process, who actually defines the TTL of the reverse record? The DNS server, the DHCP server, or the DHCP client?

  • Why are the stale in-addr.arpa records not being deleted? Should the DNS server know to delete existing DDNS names when a new DDNS name is submitted?

  • Why would this suddenly start occurring? This system has been running for years with this only happeneing a coupld of times before, and always with the same IP address. In the last few days it has happened multiple times with multiple IPs.

  • Would scavenging help? We don't currently have it turned on. While I'm going to look into it for its own merits, I'm not sure it would help in this situation since the default seems to be 7 days before scavenging.

  • Does this situation require action, or if I ignore it will "something" purge the stale records eventually (I really, really hate waiting for divine intervention, but it seems odd that we wouldn't have faced this before).

Aside from the questions, can anyone share any hard-won experience related to this issue? Thanks.

UPDATE 1: It appears that, on a per-RR level, scavenging is actually turned on. It is turned on at the DNS server level, it is turned off at the zone level, and now that I have the advanced view turned on I can see that it's turned on for dynamically-inserted records as well. So, it's not an issue of scavenging not working, it's an issue of the 7 day lag plus the (apparently new?) TTL differences.

UPDATE 2: The DHCP scope also has "Discard A and PTR records when lease is deleted" checked. This feels like a failure on the part of the DHCP server since the lease for the original PTR is gone from the DHCP server...

Thanks for your answers so far. I'm digesting them and gathering information to see if the noise I'm seeing is a lot of records generating a few logs each or a few records generating a lot of logs each. Also, auditing my PTRs to see if this double-record thing is common but I'm only noticing it because the TTLs have started to mismatch. I'm leaning toward this being a non-issue that scavenging will address but I'd still like to understand where the differing TTLs come from


Here's an article from Microsoft that describes the dynamic DNS process with their DHCP server: http://technet.microsoft.com/en-us/library/cc787034(WS.10).aspx

The stock behaviour of W2K and up is for the client to request the DHCP server register the PTR record on behalf of the client, and the client registers the A record itself. The DHCP server can be made to register the A record and the PTR record (including for pre-Windows 2000 clients that can't make DDNS registrations themselves).

There is an optional setting to have the DHCP server delete the A and PTR records when a lease is discarded. If the lease hasn't time-out, though, the records won't be deleted.

You absolutely should be aging and scavenging your DDNS zones. If you're aging and scavenging, this will eventually "purge". If you're not, it won't.

This Microsoft support article explains how to set the TTL value for DNS resource records registered by DHCP servers (originally in a hotfix, now just built-in to the OS): http://support.microsoft.com/kb/322989

To alter the behaviour of client computers in DNS registrations, have a look in Group Policy in the DNS Client node under the Network subnode of the Administrative Templates node of the Computer Configuration. In there, you'll find that you can force the clients to register their PTR records, rather than having it done by the DHCP server (if you so desire), and you can set the TTL on records registered by clients.

I'm not sure why this would suddenly start occurring. Some configuration had to change, but I'm at a loss as to tell you where. Start talking to your co-admins about any changes they might've made in the DHCP server configuration or in the group policy settings for clients' dynamic DNS behaviour.

I can't say I've seen the behaviour of multiple clients registering the same PTR record. That's odd. I'll have to defer to someone else on that. I will say that all of my reverse-zones are always AD integrated and require secure updates, but I don't know that that would have an effect on this.

In my experience, just having aging and scavenging turned on makes a world of difference in eliminating stale records. The default 7 day interval has worked well for me.

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...