Tuesday, April 2, 2019

zfs error behind LSI raidcontroller



So ZFS is reporting some "read issues", so it would seem that this disk is failing, based on the fact nothing given in the ZFS-8000-9P document reports has occurred we are aware of. These disks are fairly new, the only issue we had recently was a full ZFS.



The ZFS runs on top of a LSI MegaRAID 9271-8i, all disks run "raid 0" per disk. I am not very familiar with this raid card, so I found a script that returns data derived from the megacli command line tool. I added 1 drive to show the setup, they are all setup the same. (system disks are different)



zpool status output



  pool: data
state: ONLINE

status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: none requested
config:

NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0

raidz2-0 ONLINE 0 0 0
br0c2 ONLINE 0 0 0
br1c2 ONLINE 0 0 0
br2c2 ONLINE 0 0 0
br0c3 ONLINE 0 0 0
br1c3 ONLINE 0 0 0
br2c3 ONLINE 0 0 0
r2c1 ONLINE 0 0 0
r1c2 ONLINE 0 0 0
r5c3 ONLINE 0 0 0

sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
r3c1 ONLINE 0 0 0
r4c1 ONLINE 2 0 0
... cut raidz2-1 ...
errors: No known data errors



The output of LSI script



Virtual Drive: 32 (Target Id: 32)
Name :
RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0
Size : 3.637 TB
Sector Size : 512
Is VD emulated : No

Parity Size : 0
State : Optimal
Strip Size : 512 KB
Number Of Drives : 1
Span Depth : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default

Encryption Type : None
PI type: No PI

Is VD Cached: No


The script doesn't report any faulty disk, nor does the raidcontroller mark the drive as faulty. I found some other topics zpool error that gave the advice to clear the error and run a scrub. Now my question is, when is the threshold to run a scrub, how long would this take (assuming this zfs raid will take a performance hit for running scrub) Also when this disk is really fautly, will hot-swapping initialize a "rebuild" ?
All the disks are "Western Digital RE 4TB, SAS II, 32MB, 7200rpm, enterprise 24/7/365". Is there a system that will check for zfs errors, since this was just a routine manual check ?



zfs version : 0.6.4.1 zfsonlinux




I know 2 read errors are not allot, but i'd prefer to be replacing disks to early then to late.


Answer



zfs scrub is the "system that will check for zfs errors". It will take as long as it takes to read all data stored in the volume (going in sequential order of txg, so it can be seeking a lot, depending on how full the pool is and how the data was written). Once started, zfs status will show some estimate. Running scrub can be stopped.



If you want something to periodically check zpool status, the simplest way would be to run something like zpool status | grep -C 100 Status periodically (once a 6 hours) and email the output if any. You could probably find a plugin for your favourite monitoring system, like nagios. Or it'd be pretty straightforward to write yourself.



Just hot swapping the drive will not trigger resilver. You will have to run zfs replace for that to happen.



The read error you are seeing may as well be some kind of controller mishap. Even though it's an enterprise hardware, these (HW RAID) controllers sometimes behave weird. And these errors may, for example, be a result of a command taking too long - controller being busy with whatever. That's why I try to stay away from those unless necessary.




I'd go with checking the SMART data on the drive (see man smartctl) and scrubbing the pool. If both look OK, clear the errors and do not mess with your pool. Because if the pool is near full reading all the data during resilver can actually trigger another error. Start panicing once you see errors on the same drive again ;).



btw. for best performance you should use n^2+2 drives in RAIDZ2 vdevs.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...