Sunday, July 22, 2018

debian - HP DL380 G7 + Smart Array P410i + sysbench -> poor raid 10 performance



I have running system with low IO utilization:




  1. HP DL380G7 ( 24gb RAM )

  2. Smart Array p410i with 512mb battary backed write cache

  3. 6x SAS 10k rpm 146gb drives in RAID10

  4. Debian Squeze linux, ext4 + LVM, hpacucli installed




iostat (cciss/c0d1 = raid10 array, dm-7 = 60G lvm partition for test):




Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0,00 101,20 0,00 6,20 0,00 0,42 138,58 0,00 0,00 0,00 0,00
cciss/c0d1 0,00 395,20 3,20 130,20 0,18 2,05 34,29 0,04 0,26 0,16 2,08
dm-0 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
dm-2 0,00 0,00 3,20 391,00 0,18 1,53 8,87 0,04 0,11 0,05 1,84

dm-3 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
dm-4 0,00 0,00 0,00 106,80 0,00 0,42 8,00 0,00 0,00 0,00 0,00
dm-5 0,00 0,00 0,00 0,60 0,00 0,00 8,00 0,00 0,00 0,00 0,00
dm-6 0,00 0,00 0,00 2,80 0,00 0,01 8,00 0,00 0,00 0,00 0,00
dm-1 0,00 0,00 0,00 132,00 0,00 0,52 8,00 0,00 0,02 0,01 0,16
dm-7 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
dm-8 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00


hpacucli "ctrl all show config"





Smart Array P410i in Slot 0 (Embedded) (sn: 5001438011FF14E0)

array A (SAS, Unused Space: 0 MB)


logicaldrive 1 (136.7 GB, RAID 1, OK)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)

physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)

array B (SAS, Unused Space: 0 MB)


logicaldrive 2 (410.1 GB, RAID 1+0, OK)

physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 146 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 146 GB, OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 146 GB, OK)

physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 146 GB, OK)
physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 146 GB, OK)
physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 146 GB, OK)

SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 (WWID: 5001438011FF14EF)


hpacucli "ctrl all show status"





Smart Array P410i in Slot 0 (Embedded)
Controller Status: OK
Cache Status: OK
Battery/Capacitor Status: OK


Sysbench command




sysbench --init-rng=on --test=fileio --num-threads=16 --file-num=128 --file-block-size=4K --file-total-size=54G --file-test-mode=rndrd --file-fsync-freq=0 --file-fsync-end=off run --max-requests=30000



Sysbench results




sysbench 0.4.12: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16
Initializing random number generator from timer.



Extra file open flags: 0
128 files, 432Mb each
54Gb total file size
Block size 4Kb
Number of random requests for random IO: 30000
Read/Write ratio for combined random IO test: 1.50
Using synchronous I/O mode
Doing random read test

Threads started!
Done.

Operations performed: 30000 Read, 0 Write, 0 Other = 30000 Total
Read 117.19Mb Written 0b Total transferred 117.19Mb (935.71Kb/sec)
233.93 Requests/sec executed

Test execution summary:
total time: 128.2455s
total number of events: 30000

total time taken by event execution: 2051.5525
per-request statistics:
min: 0.00ms
avg: 68.39ms
max: 2010.15ms
approx. 95 percentile: 660.40ms

Threads fairness:
events (avg/stddev): 1875.0000/111.75
execution time (avg/stddev): 128.2220/0.02



iostat during test




avg-cpu: %user %nice %system %iowait %steal %idle
0,00 0,01 0,10 31,03 0,00 68,86

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0,00 0,10 0,00 0,60 0,00 0,00 9,33 0,00 0,00 0,00 0,00

cciss/c0d1 0,00 46,30 208,50 1,30 0,82 0,10 8,99 29,03 119,75 4,77 100,00
dm-0 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
dm-2 0,00 0,00 0,00 51,60 0,00 0,20 8,00 49,72 877,26 19,38 100,00
dm-3 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
dm-4 0,00 0,00 0,00 0,70 0,00 0,00 8,00 0,00 0,00 0,00 0,00
dm-5 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
dm-6 0,00 0,00 0,00 0,00 0,00 0,00 0,00 7,00 0,00 0,00 100,00
dm-1 0,00 0,00 0,00 0,00 0,00 0,00 0,00 7,00 0,00 0,00 100,00
dm-7 0,00 0,00 208,50 0,00 0,82 0,00 8,04 25,00 75,29 4,80 100,00
dm-8 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00



Bonnie++ v1.96




cmd: /usr/sbin/bonnie++ -c 16 -n 0

Writing a byte at a time...done
Writing intelligently...done
Rewriting...done

Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 16 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
seo-db 48304M 819 99 188274 17 98395 8 2652 78 201280 8 265.2 1
Latency 14899us 726ms 15194ms 100ms 122ms 665ms

1.96,1.96,seo-db,16,1337541936,48304M,,819,99,188274,17,98395,8,2652,78,201280,8,265.2,1,,,,,,,,,,,,,,,,,,14899us,726ms,15194ms,100ms,122ms,665ms,,,,,,



Questions



So, sysbench showed 234 random reads per second.
I expect it to be at least 400.
What can be the bottleneck ? LVM ?
Another system with mdadm raid1 + 2x 7200rpm drives shows over 200 random reads per second...



Thanks for any help!


Answer



Your system is definitely underperforming based on your hardware specifications. I loaded the sysbench utility on a couple of idle HP ProLiant DL380 G6/G7 servers running CentOS 5/6 to check their performance. These are normal fixed partitions instead of LVM. (I don't typically use LVM, because of the flexibility offered by HP Smart Array controllers)




The DL380 G6 has a 6-disk RAID 1+0 array on a Smart Array P410 controller with 512MB of battery-backed cache. The DL380 G7 has a 2-disk enterprise SLC SSD array. The filesystems are XFS. I used the same sysbench command line as you did:



sysbench --init-rng=on --test=fileio --num-threads=16 --file-num=128 --file-block-size=4K --file-total-size=54G --file-test-mode=rndrd --file-fsync-freq=0 --file-fsync-end=off --max-requests=30000 run


My results were 1595 random reads-per-second across 6-disks.

On SSD, the result was 39047 random reads-per-second. Full results are at the end of this post...




  • As for your setup, the first thing that jumps out at me is the size of your test partition. You're nearly filling the 60GB partition with 54GB of test files. I'm not sure if ext4 has an issue performing at 90+%, but that's the quickest thing for you to modify and retest. (or use a smaller set of test data)



  • Even with LVM, there are some tuning options available on this controller/disk setup. Checking the read-ahead and changing the I/O scheduler setting from the default cfq to deadline or noop is helpful. Please see the question and answers at: Linux - real-world hardware RAID controller tuning (scsi and cciss)


  • What is your RAID controller cache ratio? I typically use a 75%/25% write/read balance. This should be a quick test. The 6-disk array completed in 18 seconds. Yours took over 2 minutes.


  • Can you run a bonnie++ or iozone test on the partition/array in question? It would be helpful to see if there are any other bottlenecks on the system. I wasn't familiar with sysbench, but I think these other tools will give you a better overview of the system's capabilities.


  • Filesystem mount options may make a small difference, but I think the problem could be deeper than that...




hpacucli output...



Smart Array P410i in Slot 0 (Embedded)    (sn: 50123456789ABCDE)


array A (SAS, Unused Space: 0 MB)

logicaldrive 1 (838.1 GB, RAID 1+0, OK)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 300 GB, OK)
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)


SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 (WWID: 50123456789ABCED)


sysbench DL380 G6 6-disk results...



sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16

Initializing random number generator from timer.

Extra file open flags: 0
128 files, 432Mb each
54Gb total file size
Block size 4Kb
Number of random requests for random IO: 30000
Read/Write ratio for combined random IO test: 1.50
Using synchronous I/O mode
Doing random read test

Threads started!
Done.

Operations performed: 30001 Read, 0 Write, 0 Other = 30001 Total
Read 117.19Mb Written 0b Total transferred 117.19Mb (6.2292Mb/sec)
1594.67 Requests/sec executed

Test execution summary:
total time: 18.8133s
total number of events: 30001

total time taken by event execution: 300.7545
per-request statistics:
min: 0.00ms
avg: 10.02ms
max: 277.41ms
approx. 95 percentile: 25.58ms

Threads fairness:
events (avg/stddev): 1875.0625/41.46
execution time (avg/stddev): 18.7972/0.01



sysbench DL380 G7 SSD results...



sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16
Initializing random number generator from timer.



Extra file open flags: 0
128 files, 432Mb each
54Gb total file size
Block size 4Kb
Number of random requests for random IO: 30000
Read/Write ratio for combined random IO test: 1.50
Using synchronous I/O mode
Doing random read test
Threads started!

Done.

Operations performed: 30038 Read, 0 Write, 0 Other = 30038 Total
Read 117.34Mb Written 0b Total transferred 117.34Mb (152.53Mb/sec)
39046.89 Requests/sec executed

Test execution summary:
total time: 0.7693s
total number of events: 30038
total time taken by event execution: 12.2631

per-request statistics:
min: 0.00ms
avg: 0.41ms
max: 1.89ms
approx. 95 percentile: 0.57ms

Threads fairness:
events (avg/stddev): 1877.3750/15.59
execution time (avg/stddev): 0.7664/0.00


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...