zfs error behind LSI raidcontroller

Tuesday, April 2, 2019

zfs error behind LSI raidcontroller

So ZFS is reporting some "read issues", so it would seem that this disk is failing, based on the fact nothing given in the ZFS-8000-9P document reports has occurred we are aware of. These disks are fairly new, the only issue we had recently was a full ZFS.

The ZFS runs on top of a LSI MegaRAID 9271-8i, all disks run "raid 0" per disk. I am not very familiar with this raid card, so I found a script that returns data derived from the megacli command line tool. I added 1 drive to show the setup, they are all setup the same. (system disks are different)

zpool status output

  pool: data
 state: ONLINE

status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0

          raidz2-0  ONLINE       0     0     0
            br0c2   ONLINE       0     0     0
            br1c2   ONLINE       0     0     0
            br2c2   ONLINE       0     0     0
            br0c3   ONLINE       0     0     0
            br1c3   ONLINE       0     0     0
            br2c3   ONLINE       0     0     0
            r2c1    ONLINE       0     0     0
            r1c2    ONLINE       0     0     0
            r5c3    ONLINE       0     0     0

            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            r3c1    ONLINE       0     0     0
            r4c1    ONLINE       2     0     0
... cut raidz2-1 ...
errors: No known data errors

The output of LSI script

Virtual Drive: 32 (Target Id: 32)
Name                :
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
Size                : 3.637 TB
Sector Size         : 512
Is VD emulated      : No

Parity Size         : 0
State               : Optimal
Strip Size          : 512 KB
Number Of Drives    : 1
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default

Encryption Type     : None
PI type: No PI

Is VD Cached: No

The script doesn't report any faulty disk, nor does the raidcontroller mark the drive as faulty. I found some other topics zpool error that gave the advice to clear the error and run a scrub. Now my question is, when is the threshold to run a scrub, how long would this take (assuming this zfs raid will take a performance hit for running scrub) Also when this disk is really fautly, will hot-swapping initialize a "rebuild" ?
All the disks are "Western Digital RE 4TB, SAS II, 32MB, 7200rpm, enterprise 24/7/365". Is there a system that will check for zfs errors, since this was just a routine manual check ?

zfs version : 0.6.4.1 zfsonlinux

I know 2 read errors are not allot, but i'd prefer to be replacing disks to early then to late.

Answer

zfs scrub is the "system that will check for zfs errors". It will take as long as it takes to read all data stored in the volume (going in sequential order of txg, so it can be seeking a lot, depending on how full the pool is and how the data was written). Once started, zfs status will show some estimate. Running scrub can be stopped.

If you want something to periodically check zpool status, the simplest way would be to run something like zpool status | grep -C 100 Status periodically (once a 6 hours) and email the output if any. You could probably find a plugin for your favourite monitoring system, like nagios. Or it'd be pretty straightforward to write yourself.

Just hot swapping the drive will not trigger resilver. You will have to run zfs replace for that to happen.

The read error you are seeing may as well be some kind of controller mishap. Even though it's an enterprise hardware, these (HW RAID) controllers sometimes behave weird. And these errors may, for example, be a result of a command taking too long - controller being busy with whatever. That's why I try to stay away from those unless necessary.

I'd go with checking the SMART data on the drive (see man smartctl) and scrubbing the pool. If both look OK, clear the errors and do not mess with your pool. Because if the pool is near full reading all the data during resilver can actually trigger another error. Start panicing once you see errors on the same drive again ;).

btw. for best performance you should use n^2+2 drives in RAIDZ2 vdevs.

Blog

Tuesday, April 2, 2019

zfs error behind LSI raidcontroller

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server