Sunday, September 27, 2015

apache 2.2 - Excessive number of sleeping processes in CentOS - howto diagnose?




I have a a large number of sleeping processes, about 600, the majority of which are Apache processes.



Should I kill all these sleeping processes, or will that make Apache fail completely?



Why are these processes sleeping in the first place?




The Server is running CentOS 6 with Apache 2.2.


Answer



Killing all of your system's sleeping processes isn't going to solve any problem let alone the problem you're having.






Being structured and methodical in your approach is much better than flapping around wildly.



Personally I find Scientific Method (others call it something different) a wonderful tool to pull out of the system administration kitbag when diagnosing problems.





  1. What is the actual problem you're trying to solve ?




A service stops responding.1





  1. So, now we know what the actual problem is we're solving we have some direction. Let's gather some information to help us figure out a solution.





    • Is the problem time related? Does it happen regularly or randomly.

    • Check your logs, all of them, not just the particular services's logs as something else may be causing the problem. Log entries generally have timestamps, this is to help you correlate events across multiple applications and services - use them. If necessary increase the log verbosity too.

    • Watch what your system is doing. Use tools like top, vmstat, iostat, sar, ps,tcpdump or even full blown monitoring systems.


  2. Analyse the information you have gathered. What is actually happening on the system when the service stops responding? What is the state of the system's resources ?


  3. Take appropriate action to remediate. Hopefully it's pretty obvious what's going on, you're running out of memory and OOM killer comes out to play, your swap activity is too high, your run queue is too long, you're iobound etc. If it's not obvious then you're probably not gathering the correct data - you know what to do, go back to 2.


  4. Monitor what the changes introduced at 4. do.


  5. Did the changes fix the problem ? Is it better? Is it worse ? Is there no difference ? Where you go from here depends on what you find. You may need to go back to 2. and gather more pertinent data or 3. to reanalyse what data you have or 4. because you identified a number of potential solutions.



  6. Document your findings and the changes you made.


  7. Go back to bed/home from work/to the pub.




1 This could be anything though 'My server is slow', 'My server is using too much memory' ...


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...