I have an Amazon EC2 instance running LAMP on Ubuntu Natty/11.04. On three separate occasions within the last few months, two of which in the last two weeks, the server has just... stopped. It becomes unresponsive and stops responding to connection attempts (SSH or otherwise), but the EC2 control panel still reports it as running. Each time I had to reboot the instance through the console, with ensuing data loss.
So, now I'm trying to diagnose the issue, but I'm coming up blank and I need advice on what else to check for. Syslog contains nothing suspicious -- on each occasion, the last thing that happened was munin running its regular five-minute cronjob, although since I don't know exactly when the machine stopped working, I can't say how close the cron log is to the point of freezing. After that, it's as if the machine was simply not running until the point where it was restarted, after which point syslog contains what looks to me like normal dmesg output.
There seems to be no correlation between traffic volume and the time of these freezes. Each occasion has been far removed from peak traffic times.
What else can I look at to attempt to figure out what has been causing these issues? What might the issue be?
ADDENDUM: The server was not under heavy load at any occasion when it went down. CPU and memory use were both well and safely under limits. There was plenty of free disk space (tens of gigabytes). There is nothing strange in Apache or MySQL logs either, they just stop operating at that time. This is a medium/high-CPU instance.
No comments:
Post a Comment