Monday, November 13, 2017

lamp - Troubleshooting mysterious server freezes on Amazon EC2

I have an Amazon EC2 instance running LAMP on Ubuntu Natty/11.04. On three separate occasions within the last few months, two of which in the last two weeks, the server has just... stopped. It becomes unresponsive and stops responding to connection attempts (SSH or otherwise), but the EC2 control panel still reports it as running. Each time I had to reboot the instance through the console, with ensuing data loss.




So, now I'm trying to diagnose the issue, but I'm coming up blank and I need advice on what else to check for. Syslog contains nothing suspicious -- on each occasion, the last thing that happened was munin running its regular five-minute cronjob, although since I don't know exactly when the machine stopped working, I can't say how close the cron log is to the point of freezing. After that, it's as if the machine was simply not running until the point where it was restarted, after which point syslog contains what looks to me like normal dmesg output.



There seems to be no correlation between traffic volume and the time of these freezes. Each occasion has been far removed from peak traffic times.



What else can I look at to attempt to figure out what has been causing these issues? What might the issue be?



ADDENDUM: The server was not under heavy load at any occasion when it went down. CPU and memory use were both well and safely under limits. There was plenty of free disk space (tens of gigabytes). There is nothing strange in Apache or MySQL logs either, they just stop operating at that time. This is a medium/high-CPU instance.

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...