performance - What is the relation between IO wait utilisation and load average

Thursday, June 29, 2017

performance - What is the relation between IO wait utilisation and load average

Load average uses processes that are running or runnable or in uninterrupted sleep state. So do the processes in uninterrupted sleep state correspond with the %wa as per the top command? Both are referring to threads waiting for IO so it seems intuitive to assume that if one increases, the other will as well.

However, I'm seeing quite the opposite. %wait doesn't increase, the %idle is high and the load average is also high. I've read other questions on this but I have't found a satisfactory answer because they don't explain this behaviour.

If the %wait does not include uninterrupted sleep state, then what is
it exactly? Is it that the %wait does not really correspond with the
load? (eg. the load could be 10 on a 2 CPU machine but it contributes
to only 30% wait%)

And how is this IO different from the IO referred

to in uninterrupted state? What is a possible remedy in this case?

Clearly increasing CPU wouldn't help because there's tasks in the queue which the CPU is not picking up.

Another situation where it seems unintuitive that load average and CPU utilisation don't add up:

This situation is a bit different. The CPU idle time is high, the load average high (often double the number of CPUs), no disk I/O, so swap usage, some network I/O. There are no processes in uninterruptible sleep, the run queue goes up high frequently. How is the CPU still idle though? Shouldn't I expect the CPU to be at 100% utilisation? Is it that the high number of tasks can't be put on the CPU because they are waiting on network (or something else?)? It only seems reasonable to assume that that those tasks each consume very little time on CPU. Is that correct? What is the bottleneck in this case? Is it correct to say that increasing the CPU will not help? How can I find out what to configure or which resources to increase in order to reduce the load average?

sar -n TCP,ETCP,DEV 1

sar

netstat number of connections
netstat

iostat
iostat

vmstat
vmstat

uptime
uptime

top
top

nicstat
nicstat

Blog

Thursday, June 29, 2017

performance - What is the relation between IO wait utilisation and load average

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server