Sunday, December 13, 2015

ubuntu - Uneven CPU core utilization



tl;dr: First CPU core is consistently saturated, all other cores are consistently under-loaded.




A VM, inside Ubuntu-based Xen XCP:




$ uname -a
Linux MYHOST 2.6.38-15-virtual #59-Ubuntu SMP Fri Apr 27 16:40:18 UTC 2012 i686 i686 i386 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 11.04

Release: 11.04
Codename: natty


This VM has 8 CPU cores.



There are 10 single-threaded worker processes running on this VM, which are connected via FCGI interface to the nginx server (listening on a local network port).



Under synthetic load from AB, only first core of eight is ever loaded to 100% (as seen from htop). It remains under very high load more or less constantly, and all other cores are loaded anywhere from 0 to 100%, more or less randomly (and CPU load of these cores is jumping around).




Here is what I typically see under load in htop:





1 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||99.3%] Tasks: 70, 35 thr; 11 running
2 [||||||||||||||| 15.0%] Load average: 3.86 1.05 0.39
3 [||||||||||||||||||||||||||||||||||| 36.7%] Uptime: 22 days, 06:31:57
4 [|||||||||||||||| 15.7%]
5 [||||||||||||||||||||| 22.4%]
6 [||||||||||||||||||| 19.9%]

7 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 71.2%]
8 [|||||||||||||||||||||||||||||| 31.3%]
Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||394/4028MB]
Swp[ 0/5362MB]

PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
26213 www-data 20 0 49748 26952 2448 R 29.0 0.7 10:42.61 /usr/bin/luajit2
26227 www-data 20 0 50172 27412 2452 R 27.0 0.7 10:43.53 /usr/bin/luajit2
26221 www-data 20 0 50736 27948 2452 R 27.0 0.7 10:39.02 /usr/bin/luajit2
26234 www-data 20 0 50128 27232 2452 R 27.0 0.7 10:36.36 /usr/bin/luajit2

26218 www-data 20 0 50232 27376 2452 R 26.0 0.7 10:39.32 /usr/bin/luajit2
26214 www-data 20 0 51268 28496 2452 R 26.0 0.7 10:58.15 /usr/bin/luajit2
26232 www-data 20 0 50420 27588 2452 R 25.0 0.7 10:39.21 /usr/bin/luajit2
26217 www-data 20 0 50236 27348 2452 R 25.0 0.7 10:34.44 /usr/bin/luajit2
26219 www-data 20 0 50748 27960 2448 R 23.0 0.7 10:45.30 /usr/bin/luajit2
26239 www-data 20 0 49772 27188 2452 R 22.0 0.7 10:39.39 /usr/bin/luajit2
26368 www-data 20 0 10856 3796 968 S 15.0 0.1 1:12.62 nginx: worker process
26369 www-data 20 0 10652 3504 968 S 2.0 0.1 1:12.75 nginx: worker process
26372 www-data 20 0 10520 3504 968 S 0.0 0.1 1:18.64 nginx: worker process
...



During the load test all worker processes are in R, load test runs for about 10-15 minutes (and performance is about 700-900 hits/second). Traffic is, of course, generated from external machines.



Looks like this CPU core load disbalance is the main performance bottleneck, and if all cores were loaded evenly, performance could be higher.



Any clues on how to troubleshoot this issue?



Please tell me if I can provide more information.


Answer




It looks like CPU0 receives every eth1 interrupt, and there are a lot of them.




$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
283: 113720624 0 0 0 0 0 0 0 xen-dyn-event eth1
284: 1 0 0 0 0 0 0 0 xen-dyn-event eth0
285: 2254 0 0 3873799 0 0 0 0 xen-dyn-event blkif
286: 23 0 0 0 0 0 0 0 xen-dyn-event hvc_console
287: 492 42 0 0 0 0 0 295324 xen-dyn-event xenbus

288: 0 0 0 0 0 0 0 222294 xen-percpu-ipi callfuncsingle7
289: 0 0 0 0 0 0 0 0 xen-percpu-virq debug7
290: 0 0 0 0 0 0 0 151302 xen-percpu-ipi callfunc7
291: 0 0 0 0 0 0 0 3236015 xen-percpu-ipi resched7
292: 0 0 0 0 0 0 0 60064 xen-percpu-ipi spinlock7
293: 0 0 0 0 0 0 0 12355510 xen-percpu-virq timer7
294: 0 0 0 0 0 0 803174 0 xen-percpu-ipi callfuncsingle6
295: 0 0 0 0 0 0 0 0 xen-percpu-virq debug6
296: 0 0 0 0 0 0 60027 0 xen-percpu-ipi callfunc6
297: 0 0 0 0 0 0 5374762 0 xen-percpu-ipi resched6

298: 0 0 0 0 0 0 64976 0 xen-percpu-ipi spinlock6
299: 0 0 0 0 0 0 15294870 0 xen-percpu-virq timer6
300: 0 0 0 0 0 264441 0 0 xen-percpu-ipi callfuncsingle5
301: 0 0 0 0 0 0 0 0 xen-percpu-virq debug5
302: 0 0 0 0 0 79324 0 0 xen-percpu-ipi callfunc5
303: 0 0 0 0 0 3468144 0 0 xen-percpu-ipi resched5
304: 0 0 0 0 0 66269 0 0 xen-percpu-ipi spinlock5
305: 0 0 0 0 0 12778464 0 0 xen-percpu-virq timer5
306: 0 0 0 0 844591 0 0 0 xen-percpu-ipi callfuncsingle4
307: 0 0 0 0 0 0 0 0 xen-percpu-virq debug4

308: 0 0 0 0 75293 0 0 0 xen-percpu-ipi callfunc4
309: 0 0 0 0 3482146 0 0 0 xen-percpu-ipi resched4
310: 0 0 0 0 79312 0 0 0 xen-percpu-ipi spinlock4
311: 0 0 0 0 21642424 0 0 0 xen-percpu-virq timer4
312: 0 0 0 449141 0 0 0 0 xen-percpu-ipi callfuncsingle3
313: 0 0 0 0 0 0 0 0 xen-percpu-virq debug3
314: 0 0 0 95405 0 0 0 0 xen-percpu-ipi callfunc3
315: 0 0 0 3802992 0 0 0 0 xen-percpu-ipi resched3
316: 0 0 0 76607 0 0 0 0 xen-percpu-ipi spinlock3
317: 0 0 0 16439729 0 0 0 0 xen-percpu-virq timer3

318: 0 0 876383 0 0 0 0 0 xen-percpu-ipi callfuncsingle2
319: 0 0 0 0 0 0 0 0 xen-percpu-virq debug2
320: 0 0 76416 0 0 0 0 0 xen-percpu-ipi callfunc2
321: 0 0 3422476 0 0 0 0 0 xen-percpu-ipi resched2
322: 0 0 69217 0 0 0 0 0 xen-percpu-ipi spinlock2
323: 0 0 10247182 0 0 0 0 0 xen-percpu-virq timer2
324: 0 393514 0 0 0 0 0 0 xen-percpu-ipi callfuncsingle1
325: 0 0 0 0 0 0 0 0 xen-percpu-virq debug1
326: 0 95773 0 0 0 0 0 0 xen-percpu-ipi callfunc1
327: 0 3551629 0 0 0 0 0 0 xen-percpu-ipi resched1

328: 0 77823 0 0 0 0 0 0 xen-percpu-ipi spinlock1
329: 0 13784021 0 0 0 0 0 0 xen-percpu-virq timer1
330: 730435 0 0 0 0 0 0 0 xen-percpu-ipi callfuncsingle0
331: 0 0 0 0 0 0 0 0 xen-percpu-virq debug0
332: 39649 0 0 0 0 0 0 0 xen-percpu-ipi callfunc0
333: 3607120 0 0 0 0 0 0 0 xen-percpu-ipi resched0
334: 348740 0 0 0 0 0 0 0 xen-percpu-ipi spinlock0
335: 89912004 0 0 0 0 0 0 0 xen-percpu-virq timer0
NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts
LOC: 0 0 0 0 0 0 0 0 Local timer interrupts

SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts
IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts
RES: 3607120 3551629 3422476 3802992 3482146 3468144 5374762 3236015 Rescheduling interrupts
CAL: 770084 489287 952799 544546 919884 343765 863201 373596 Function call interrupts
TLB: 0 0 0 0 0 0 0 0 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 0 0 0 0 0 0 0 0 Machine check polls

ERR: 0
MIS: 0

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...