I am investigating a strange behavior on 3 CentOS machines serving as a MongoDB replica set, whereas one of them is also hosting a PHP web app utilizing the MongoDB replica set. The basic setup is as follows:
- Node 1: CentOS 5.8, MongoDB 2.6.10 (acting as PRIMARY), Apache 2.2.23 (running a PHP web app with MongoDB driver 1.6.10)
- Node 2: CentOS 5.11, MongoDB 2.6.10 (acting as SECONDARY), Apache 2.2.23 (running nothing than an empty index.html, called up by Nagios every few minutes)
- Node 3: CentOS 5.11, MongoDB 2.6.10 (acting as SECONDARY), Apache 2.2.23 (running nothing than an empty index.html, called up by Nagios every few minutes)
Now, all of them are experiencing a nearly constant 100% CPU load. The load is caused by a large number of httpd processes being launched. Even on nodes 2 and 3, which have almost no HTTP traffic. The CPU usage of the mongod process is vanishingly small on each machine.
This is what the top
output on node 2 looks like:
The output looks very similar on nodes 1 and 3.
This is what the httpd access log on node 2 looks like:
Having a large number of httpd processes but a very small amount of actual HTTP requests seems strange to me.
When I check netstat -p
on node 2 I see something like this:
The open mongod sockets should be the replication workers or replica set heartbeats, but what is really striking in the netstat -p
output: The additional number of open httpd (?!) sockets coming from MongoDB port 27017 on their counterpart (node 3).
As a result of this, logging on to the machines (e. g. via SSH) becomes tremendously slow after a while. Restarting httpd works in the short term, the number of httpd processes and CPU load instantly drops to a normal level. But after a few hours, the httpd processes/sockets fill up again and the machine is back on 100% CPU load. Restarting httpd does not have any impact on the replica set's operation.
I am not sure but I guess the prefork/worker configuration of Apache is nothing special:
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
To cut a long story short...
- Is it really the MongoDB replica set, which is somehow affecting the web server and why?
- Why does Apache (httpd) even care about the sockets buzzing on these random 37000 ~ 60999 ports? (Shouldn't it just handle ports 80/443?)
- What can I do to solve or at least to isolate the problem?
Answer
It turned out the huge number of httpd processes, open connections and the consequential high CPU load somehow resulted from an old and broken SSL configuration (duplicate VirtualHost
with different expired certificates) in Apache's conf.d
folder.
Since there were no error messages at all I triple-checked the configurations and made some trial-and-error attempts. Removing the SSL configuration led to a significant decline of launched httpd processes as well as connections being left open. The CPU load dropped to a regular level.
The problem is solved, wherever this strange phenomenon came from and what the broken SSL configuration had to do with it. Nonetheless I am still wondering about the weird relation of the MongoDB replica set members and httpd sockets still appearing in netstat -p
.
No comments:
Post a Comment