Tuesday, July 21, 2015

top - How can I find the source of my high load issues on Ubuntu server?

We have an Ubuntu 10.4 VPS serving a Rails site which often shows pretty high load, but doesn't have high CPU or memory numbers. Reading a lot of other questions here suggests to me that this is an I/O issue (i.e. there are processes which are stuck in I/O wait state and therefore driving up load). I'm trying to track down those processes, but not having much luck. I'd appreciate help with (a) ways to identify the guilty processes, and/or (b) confirmation that I'm asking the right question.



Here's a snapshot of top:





top - 18:28:49 up 5 days, 3:07, 2 users, load average: 1.79, 1.83, 1.73
Tasks: 82 total, 1 running, 81 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 99.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.1%st
Mem: 1794980k total, 1780384k used, 14596k free, 13356k buffers
Swap: 524284k total, 3116k used, 521168k free, 1012272k cached


Notice low swap, CPUs mostly idle; that's why I think we're I/O bound instead of memory or CPU bound.




Here's iostat (I've obfuscated the server name):




$ iostat -x 1 3
Linux 2.6.35.2-xenU (our.server.com) 03/25/11 _x86_64_ (2 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
1.75 0.19 0.50 0.31 0.01 97.24


Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvdap1 0.01 11.52 2.19 3.18 145.12 117.55 48.97 0.08 15.60 1.67 0.90
xvdap9 0.01 0.01 0.00 0.00 0.10 0.14 62.62 0.00 13.20 6.09 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.00 0.00 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvdap1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdap9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.00 0.00 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvdap1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdap9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


iotop won't run on this box:





$ iotop
Could not run iotop as some of the requirements are not met:
- Linux >= 2.6.20 with I/O accounting support (CONFIG_TASKSTATS, CONFIG_TASK_DELAY_ACCT, CONFIG_TASK_IO_ACCOUNTING): Not found
- Python >= 2.5 or Python 2.4 with the ctypes module: Found


ps seldom finds any processes in the D state:





$ sudo ps -eo pid,user,state,cmd | awk '$3 ~ /D/ { print $0 }'
976 root D [kjournald]
$ sudo ps -eo pid,user,state,cmd | awk '$3 ~ /D/ { print $0 }'
$ sudo ps -eo pid,user,state,cmd | awk '$3 ~ /D/ { print $0 }'
$


What's my next troubleshooting step?




ETA: I ran vmstat:




$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 3116 509372 22880 773232 0 0 18 15 24 14 2 0 97 0


That wa value of 0 makes me wonder if I/O is really the problem.




Also, yes, I know load in the 1.x range isn't really a problem - but this app has a history of ramping up load until it chokes, and if I can track the source while it still has a low fever I might spare a fatality (to torture a metaphor).

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...