linux - How to stop Apache from crashing my entire server?

Wednesday, February 4, 2015

linux - How to stop Apache from crashing my entire server?

I maintain a Gentoo server with a few services, including Apache. It's fairly low-end (2GB of RAM and a low-end CPU with 2 cores). My problem is that, despite my best efforts, an over-loaded Apache crashes the entire server. In fact, at this point I'm close to being convinced that Linux is a horrible operating system that isn't worth anyone's time looking for stability under load.

Things I tried:

Adjusting oom_adj for the root Apache process (and thus all its children). That had close to no effect. When Apache was overloaded it would bring the system to a grind, as the system paged out everything else before it got to kill anything.

Turning off swap. Didn't help, it would unload memory paged to binaries of processes and other files on /, thus causing the same effect.

Putting it in a memory-limited cgroup (limited to 512 MB of RAM, 1/4th of the total). This "worked", at least in my own stress tests - except the server keeps crashing under load (basically stalling all other processes, inaccessible via SSH, etc.)

Running it with idle I/O priority. This wasn't a very good idea in the end, because it just caused the system load to climb indefinitely (into the thousands) with almost no visible effect - until you tried to access an unbuffered part of the disk. This caused the task to freeze. (So much for good I/O scheduling, eh?)

Limiting the number of concurrent connections to Apache. Setting the number too low caused web sites to become unresponsive due to most slots being occupied with long requests (file downloads).

I tried various Apache MPMs without much success (prefork, event, itk).

Switching from prefork/event+php-cgi+suphp to itk+mod_php. This improved performance, but didn't solve the actual problem.

Switching I/O schedulers (cfq to deadline).

Just to stress this out: I don't care if Apache itself goes down under load, I just want the rest of my system to remain stable. Of course, having Apache recover quickly after a brief period of intensive load would be great to have, but one step at a time.

Right now I am mostly dumbfounded by how can humanity, in this day and age, design an operating system where such a seemingly simple task (don't allow one system component to crash the entire system) seems practically impossible - or at least, very hard to do.

Please don't suggest things like VMs or "BUY MORE RAM".

Some more information gathered with a friend's help:
The processes hang when the cgroup oom killer is invoked. Here's the call trace:


[] ? prepare_to_wait+0x70/0x7b
[] mem_cgroup_handle_oom+0xdf/0x180
[] ? memcg_oom_wake_function+0x0/0x6d

[] __mem_cgroup_try_charge+0x32d/0x478
[] mem_cgroup_charge_common+0x48/0x73
[] ? __lru_cache_add+0x60/0x62
[] mem_cgroup_newpage_charge+0x3b/0x4a
[] handle_mm_fault+0x305/0x8cf
[] ? schedule+0x6ae/0x6fb
[] do_page_fault+0x214/0x22b
[] page_fault+0x1f/0x30

At this point, the apache memory cgroup is practically deadlocked, and burning CPU in syscalls (all with the above call trace). This seems like a problem in the cgroup implementation...

Blog

Wednesday, February 4, 2015

linux - How to stop Apache from crashing my entire server?

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server