Tuesday, November 1, 2016

RFC that requires DNS servers to respond to unknown domain requests



My domain registrar and DNS provide currently ignores DNS requests to unknown domains. By ignore I mean black-holes and never responds which causes my DNS clients and resolver libraries to retry, back off, and finally timeout.



dig @NS3.DNSOWL.COM somedomainthatdoesntexist.org
...

;; connection timed out; no servers could be reached


In surveying other popular domain name services, I see that this behavior is pretty unique since other providers return an RCODE of 5 (REFUSED):



dig @DNS1.NAME-SERVICES.COM somedomainthatdoesntexist.org
dig @NS-284.AWSDNS-35.COM somedomainthatdoesntexist.org
dig @NS21.DOMAINCONTROL.COM somedomainthatdoesntexist.org



All return something like the following:



;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 64732


or



;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 31219



Returning REFUSED or NXDOMAIN immediately is appropriate IMHO as opposed to just dropping the request on the server room floor.



When I complain to my provider about their servers not responding, they ask me to quote the RFC that their servers are violating. I know it's strange that they are asking me to prove that their servers should respond to all requests but so be it.



Questions:




  • It is my stipulation that unless there are duplicate request ids or some sort of DOS response, a server should always respond to the request. Is this correct?

  • What RFC and specific section should I quote to support my stipulation?




To me, it is bad to not respond to a DNS query. Most clients will back off and then retransmit the same query to either the same DNS server or another server. Not only are they slowing clients down but they are causing the same query to be done again by their own or other servers depending on the authoritative name servers and NS entries.



In RFC 1536 and 2308 I see a lot of information about negative caching for performance reasons and to stop retransmission of the same query. In 4074 I see information about returning an empty answer with an RCODE of 0 so the client knows there is not ipv6 info which should cause the client to ask about A RRs which is another example of an empty response.



But I can't find an RFC which says that a DNS server should respond to a request, probably because it is implied.



The specific problem happens when I migrate my domain (and the associated DNS records) to their servers or the first X minutes after I register a new domain with their service. There is a lag between the time the authoritative name servers change (which is pretty damn fast these days) and their servers starting serving my DNS records. During this lag time, DNS clients think that their servers are authoritative but they never respond to a request -- even with a REFUSED. I understand the lag which is fine but I disagree with the decision to not respond to the DNS requests. For the record, I understand how to work around these limitations in their system but I'm still working with them to improve their services to be more in line with the DNS protocol.



Thanks for the help.







Edit:



Within a couple of months of posting this and following up with my provider, they changed their servers to return NXDOMAIN for unknown domains.


Answer



Shane's advice is correct. Failure to migrate data from one authoritative server to another prior to initiating a cutover is an invitation for an outage. Regardless of what happens from that point onward, this is an outage initiated by the person who swung the NS records. This explains why more people are not making this complaint to your provider.



That said, this is still an interesting question to answer so I'm going to take my crack at it.







Basic functionality of DNS servers is covered by documents RFC 1034 and RFC 1035, which collectively form STD 13. The answer must either come from these two RFCs, or be clarified by a later RFC which updates it.



Before we continue, there's a massive pitfall here outside the scope of DNS that needs to be called out: both of these RFCs predate BCP 14 (1997), the document which clarified the language of MAY, MUST, SHOULD, etc.




  • Standards which were authored before this language was formalized MAY have used clear language, but in several cases did not. This led to divergent implementations of software, mass confusion, etc.

  • STD 13 is unfortunately guilty of being interpretive in several

    areas. If language is not firm on an area of operation, it is
    frequently necessary to find a clarifying RFC.



With that out of the way, let's start with what RFC 1034 §4.3.1 has to say:





  • The simplest mode for the server is non-recursive, since it
    can answer queries using only local information: the response

    contains an error, the answer, or a referral to some other
    server "closer" to the answer. All name servers must
    implement non-recursive queries.




...




If recursive service is not requested or is not available, the non-

recursive response will be one of the following:




  • An authoritative name error indicating that the name does not
    exist.


  • A temporary error indication.


  • Some combination of:



    RRs that answer the question, together with an indication
    whether the data comes from a zone or is cached.




    A referral to name servers which have zones which are closer
    ancestors to the name than the server sending the reply.


  • RRs that the name server thinks will prove useful to the
    requester.





The language here is reasonably firm. There is no "should be", but a "will be". This means that the end result must either be 1) defined in the list above, or 2) allowed for by a later document on the Standards Track which amends the functionality. I am not aware of any such verbiage existing for ignoring the request and I would say that the onus is on the developer to find language which disproves the research.




Given the frequent role of DNS in network abuse scenarios, let it not be said that DNS server software doesn't provide the knobs to selectively drop traffic on the floor, which would technically be a violation of this. That said, these are either not default behaviors or with very conservative defaults; examples of both would be the user requiring the software to drop a specific name (rpz-drop.), or certain numerical thresholds are being exceeded (BIND's max-clients-per-query). It is almost unheard of in my experience for the software to completely alter the default behavior for all packets in a way that violates the standard, unless the option is one that increases tolerance for older products violating a standard. That is not the case here.



In short, this RFC can and does get violated at the discretion of operators, but usually this is done with some manner of precision. It is extremely uncommon to completely disregard sections of the standard as is convenient, especially when the professional consensus (example: BCP 16 §3.3) errs in the favor of it being undesirable to generate unnecessary load on the DNS system as a whole. Unnecessary retries from dropping all requests for which no authoritative data is present is less than desirable with this in mind.






Update:



Regarding it being undesireable to drop queries on the floor as a matter of course, @Alnitak shared with us that there is currently a Draft BCP covering this topic in detail. It's a bit premature to use this as a citation, but it does help to reinforce that community consensus aligns with what is being expressed here. In particular:





Unless a nameserver is under attack, it should respond to all
queries directed to it as a result of following delegations.
Additionally code should not assume that there isn't a delegation
to the server even if it is not configured to serve the zone.
Broken delegations are a common occurrence in the DNS and receiving
queries for zones that the server is not configured for is not
necessarily an indication that the server is under attack. Parent
zone operators are supposed to regularly check that the delegating
NS records are consistent with those of the delegated zone and to

correct them when they are not [RFC1034]. If this was being done
regularly, the instances of broken delegations would be much lower.




This answer will be updated when the status of this document changes.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...