capacity - Apache: "Server seems busy", but lots of idle processes

Tuesday, September 23, 2014

capacity - Apache: "Server seems busy", but lots of idle processes

I should note that I'm not a sysadmin. You'll figure that out very shortly. :)

In a nutshell: Apache keeps taking a breather during heavy loads and all processes go idle. This is a polling server that is used by applications. The polls come from a lot of different endpoints. From time to time (every 4-5 minutes) if I'm watching top, HTTPD processes go idle all at the same time, stalling traffic for 10 seconds or so. It then recovers. The delay is problematic.

Server is serving a lot of traffic. These are application polls via HTTPS, not web pages (though I doubt Apache knows the difference)

The pauses noted above cause the traffic to become lopsided: after some time, I get a WHOLE BUNCH OF TRAFFIC, then a lull, then a WHOLE BUNCH OF TRAFFIC again

Each poll requires a small database dip

Apache logs

Sometimes, but not always (mostly after a restart), I get these messages in error_log. Most of the time when it happens, I see nothing in the error_log.


[Mon Jun 30 17:55:17 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 31 idle, and 98 total children
[Mon Jun 30 17:55:18 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 16 children, there are 14 idle, and 98 total children
[Mon Jun 30 17:55:44 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 74 idle, and 99 total children

[Mon Jun 30 17:55:54 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 61 idle, and 99 total children
[Mon Jun 30 17:56:00 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 0 idle, and 97 total children
[Mon Jun 30 17:56:02 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 16 children, there are 36 idle, and 99 total children
[Mon Jun 30 17:56:03 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 39 idle, and 99 total children
[Mon Jun 30 18:08:17 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 18 idle, and 99 total children
[Mon Jun 30 18:08:18 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 16 children, there are 63 idle, and 98 total children
[Mon Jun 30 18:08:19 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 74 idle, and 97 total children

Apache Config (old config commented out)

just showing config items that I suspect are relevant


#Timeout 60
Timeout 20
KeepAlive on
MaxKeepAliveRequests 1000
KeepAliveTimeout 2


IfModule prefork.c
        StartServers            85
        MinSpareServers         85
        MaxSpareServers         100
        ServerLimit             100
        MaxClients              100
        #StartServers       60
        #MinSpareServers    60
        #MaxSpareServers    85
        #ServerLimit        85

        #MaxClients         85
        MaxRequestsPerChild 1000
/IfModule

Note that there's no difference between old and new configs in behavior.

Environment
EC2, c1.medium, mod_perl, persistent database connections, separate RDS server, no errors showing in MySQL error logs and no errors showing in Apache logs

As an aside, I've seen suggestions to install mod-status, but i haven't figured out how to do so, and I don't know what to look for if I do.

Answer

Mystery solved.

In case this happens to anyone else:
The network connection (inside VPC via LAN interface) between Apache and database server was getting congested. Upgrading the database server to a larger instance solved the problem (for the time being).

Background: Amazon takes snapshots of your database every 5 minutes for its point-in-time restore feature. It downloads the binary log on your RDS instance to do so.

Every 5 minutes, the binary log gets transmitted (presumably to another EBS), and in my case that transmission congested the LAN interface. Apache stalls while it waited for the network connection every five minutes, and connections would pile up, with some ultimately aborting.

Blog

Tuesday, September 23, 2014

capacity - Apache: "Server seems busy", but lots of idle processes

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server