I am configuring a server for a small, but read I/O intensive database. It serves as a master index for public access to a larger Oracle RAC database. When looking at the requirements for I/O, it was determined that an array of SSDs would provide the required performance with lower cost than a large number of SAS 15K spindles. I have an HP server, with a Smart Array P400 that will be connected only to the SSDs. The controller has 256MB of BBWC. The SSDs are the Samsung (I believe) manufactured 60GB SLC based 2.5" SATA.
I'm wondering if anyone has insight into the best stripe sizes for RAID 10 or 5, filesystem recommendations? We're going to be doing Oracle 11g, so I believe I'm required to have a filesystem rather than using RAW block device. Server will be running RHEL 5.5.
I've done a ton of reading in the past few months about SSDs, and I'm not opposed to doing a lot more, but my google-fu has begun to fail me in moving forward. The majority of the docs I'm finding on SSD RAID are for people doing a RAID 0 of consumer level SSDs for the boot drive on their home PC to make Windows 7 boot and load games faster. What I'm saying is that I'm not looking for someone to do my job, just provide any experience they've had or a link to a doc they found somehwere.
Thanks in advance!
EDIT for some additional info rather than replying to every individual comment:
The drive space is not a concern at all, as the database is small enough to fit on one of the SSDs without a problem.
Being a very read heavy DB (95%+ random read in 4-8k), I thought I might get better performance from RAID 5 just because I can read from N-1 drives in the array, rather than only the active disks in the mirror, as I've read things that indicate that the Smart Array P400 does not support reading from both sides of the mirror in a RAID 10 set. That said, I'm fairly certain that the controller will end up being a bottleneck before I have to worry about that.
On TRIM: I'm fairly certain that even if these drives supported TRIM (I don't believe they do), then it'd be somewhat difficult getting the TRIM commands pushed through the RAID controller to the individual drives. OS support is also dicey, as Red Hat Enterprise Linux 5 is still based on the 2.6.18 Kernel tree, albeit with a lot of customization to bring in features from later kernel releases. EXT4 is also not officially supported yet, and being a Production box, I need to keep myself in the realm where Red Hat and HP will help me if something goes wrong. I do believe that there is some kind of garbage collection going on at the drive level, though. I have filled the disks several times in the course of different benchmarking, and haven't seen a marked decrease in write speed that I would expect if I were having to wait for the Erase/Program cycle rather than just the Program cycle.
Here is some benchmark data for a 6 drive RAID 10 array, using 256KB Stripe size. Partition is EXT3, aligned at 64 sectors. NOOP scheduler is used, and the NOATIME option is given at mount. I also increased the OS read cache to 8MB (I believe the default is 512K). I used Iozone 3.347 for this test, with a record size of 4KB, and a benchmark file size of 25GB to hopefully take cache out of the picture and measure the actual performance of the drives. I also ran this with four threads (4x25GB files are written by 4 child processes to stress the drive.)
Run began: Mon Aug 30 12:09:57 2010
Record Size 4 KB
File size set to 26214400 KB
Command line used: /opt/iozone/bin/iozone -b /root/4k25g4t.xls -r 4k -s 25g -t 4 -i 0 -i 1 -i 2
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Throughput test with 4 processes
Each process writes a 26214400 Kbyte file in 4 Kbyte records
Children see throughput for 4 initial writers = 253416.93 KB/sec
Parent sees throughput for 4 initial writers = 229461.66 KB/sec
Min throughput per process = 61416.07 KB/sec
Max throughput per process = 64604.90 KB/sec
Avg throughput per process = 63354.23 KB/sec
Min xfer = 24924492.00 KB
Children see throughput for 4 rewriters = 259375.90 KB/sec
Parent sees throughput for 4 rewriters = 234136.11 KB/sec
Min throughput per process = 63879.16 KB/sec
Max throughput per process = 65675.30 KB/sec
Avg throughput per process = 64843.97 KB/sec
Min xfer = 25497648.00 KB
Children see throughput for 4 readers = 490873.09 KB/sec
Parent sees throughput for 4 readers = 490830.09 KB/sec
Min throughput per process = 119007.65 KB/sec
Max throughput per process = 124878.35 KB/sec
Avg throughput per process = 122718.27 KB/sec
Min xfer = 24984912.00 KB
Children see throughput for 4 re-readers = 477533.65 KB/sec
Parent sees throughput for 4 re-readers = 477503.03 KB/sec
Min throughput per process = 115802.55 KB/sec
Max throughput per process = 121579.46 KB/sec
Avg throughput per process = 119383.41 KB/sec
Min xfer = 24973364.00 KB
Children see throughput for 4 random readers = 35728.62 KB/sec
Parent sees throughput for 4 random readers = 35728.53 KB/sec
Min throughput per process = 8926.97 KB/sec
Max throughput per process = 8937.35 KB/sec
Avg throughput per process = 8932.16 KB/sec
Min xfer = 26183936.00 KB
Children see throughput for 4 random writers = 23527.42 KB/sec
Parent sees throughput for 4 random writers = 20701.37 KB/sec
Min throughput per process = 5757.43 KB/sec
Max throughput per process = 6035.68 KB/sec
Avg throughput per process = 5881.86 KB/sec
Min xfer = 25011236.00 KB
"Throughput report Y-axis is type of test X-axis is number of processes"
"Record size = 4 Kbytes "
"Output is in Kbytes/sec"
" Initial write " 253416.93
" Rewrite " 259375.90
" Read " 490873.09
" Re-read " 477533.65
" Random read " 35728.62
" Random write " 23527.42
Answer
Some points I didn't see in other answers so far:
- A high end server SSD will ge bood for about 30.000 IO. RealSSD go up to 50.000
- As such, you CAN use RAID 5. Point. Your bottleneck is very likely going to be the RAID controller which simply is not made with SSD IOPS in mind, so it will max out it's CPU.
In general, SSD are about 100 times as fast as SAS drives in random IO. Some more. Depending on your requirements it is totally feasible to replace a RAID 10 of SAS with a RAID 5 of SSD and still get ahead - significnatly - both in IOPS as well as price.
Optimal stripe size is typical multiple of 64k - especially as SSD read / write in these segments anyway. TRIM is not necessarily needed then (no partial writes)... but it would be really nice to have that.
MS has some papars on SSD in databases which apply to oracle as well (same principle - optimizion IOPS). Oracle should have some, too.
No comments:
Post a Comment