ubuntu 10.04 - What would cause an average load of 10-30 (rather than 10-30%)

Friday, December 4, 2015

ubuntu 10.04 - What would cause an average load of 10-30 (rather than 10-30%)

I'm not sure whether this would be better titled "Why would Nagios need to monitor a load reaching 30".

Situation:
I am setting up Nagios for our network and have reached the stage of setting up NRPE on the *nix boxes. I had already (on paper) gotten a rough idea of where I wanted notifications set up. For a particular server, as an example, it looks like this:

1 minute: warn at 90%, crit at 100%
5 minutes: warn at 80%, crit at 90%
15 minutes: warn at 60%, crit at 70%

The server runs two virtual cpus so I plan to use the -r parameter to get a per-cpu result (yeah I know this isn't really per cpu, it's the load for all of them divided by the number of them and I am OK with that).

so I was absolutely ready to set this up, when I saw the defaults on the NRPE config file:

command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20

This put me off. I started wondering if I really understand load averages. I see that the -r parameter is not used and so load averages above 1 are normal, but does this suggest the default there is for a 30-cpu system? I saw this question for which the answer suggests using [number of cpu's] * 10 for the critical 5 minute notification (one minute maybe?) which further supports the use of values far higher than I planned. I mean, without seeing the defaults there I would have gone with

command[check_load]=/usr/lib/nagios/plugins/check_load -r -w 0.9,0.8,0.6 -c 1.0,0.9,0.7

but now I am doubtful. I know that no one from the internet can tell me the correct values to use for our situation and I do not expect anyone to, I would be very thankful if someone can tell me whether or not I grossly misunderstand load and need to start my detective work on useful values again. For what it is worth, I got those values just based on having run top every once in a while for the past 6 months on the server in question. Usually it sits between .4 per cpu (.8) and .55 per cpu (1.1) for 1 minute avg.

Blog

Friday, December 4, 2015

ubuntu 10.04 - What would cause an average load of 10-30 (rather than 10-30%)

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server