Wednesday, October 14, 2015

domain name system - Why did this website DNS change fail in some parts of the US



Background



We switched shared hosts from a local mom and pop to crystaltech. The site was all copied over and ready for the DNS change. The mom and pop site offered no tools to manage DNS ourselves, so we had to make a phone call and time the final move of the database with their changing of the DNS.



What happened
From Oregon on Comcast, and the old host in Oregon on a regional fiber provider. The old host updated their DNS servers to point at our new IP address. They changed it from 207.x.x.x to 67.x.x.x



In Oregon, I flushed my DNS on my windows 7 machine on two computers and in less than 30 seconds the domain name site.com was working as expected. Ping tracert confirmed the

domain name was resolving to the new IP at the new host.



In Florida, the client was reporting the domain name was resolving to the old site. (We put up "we're moving" pages on the old host server.) Her mac pinged the domain name and it resolved to the old host. We rebooted, then flushed the DNS in the terminal. 6 hours later her computer still cannot resolved to the new IP address.



One machine in the office in FL can access the new site correctly. Other machines in the South FL area cannot access the site. A call to the AT&T tech confirmed that ping and tracert from his area (same network) were resolving properly.



What the heck is wrong?



Why doesn't my client's mac use the nameservers to get the IP address? The IP address is different and only the nameserver on record knows it.




Is it AT&T? Is it OSX? Is it the old host DNS? Is it the new host DNS? Is it because the old host's TTL on their DNS server was 72 hours?



I have since changed the name servers with the domain registrar, so in a couple of days this will all be over, but I still curious what the heck happened.



related questions



How to change web host and have minimal downtime for email



How to change web host for my small site with minimal downtime




https://superuser.com/questions/96425/convince-my-bosss-mac-the-website-has-moved


Answer



If you left the DNS TTLs at 3 days, naturally it's going to take up to three days for the new records to be provided to everywhere, as anyone who requested the record a second before the change will (correctly) cache them for the next three days. You should definitely drop the TTL to something like five minutes next time you do a move (and I'm a little surprised that your new hosting company didn't mention this to you -- one of the many reasons to get someone experienced in this sort of thing to manage a move).



However, even if you had dropped the TTL, there are some ISPs who operate non-standards-compliant recursive resolvers, which override the TTLs on all the records they serve so they they don't match up with the TTLs of the records set by the authoritative DNS servers. This is evil, and I'd love to do unpleasant things to the people who set them up that way, but they are what they are. The best solution to this problem is to either:




  1. Have a separate hostname that the new web host responds for, and reconfigure the webhost you're moving from to do a 302 (temporary) redirect to this name, so that any requests that do go to the old server get redirected to the new server. This is neat, but you need to leave the temporary name forever (because people may have linked to the temporary name) which gets ugly, and it doesn't work real well for HTTPS (you need a wildcard or separate SSL cert for the other name).

  2. Use DNAT to redirect traffic from the old webhost to the new one. This requires significant cooperation from your old webhost, and you pay for all the traffic to the old webhost twice (one going into the server and one going back out to the new server) but it is completely transparent to users and works perfectly with HTTPS.



No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...