Monday, April 18, 2016

raid - ProLiant DL180 G6 with Smart Array P410 failed logical drive (keeps failing and needing rebuild)

I have an issue with a bunch of DL180 each with P410 smart arrays with 2 logical drives, one is for the root filesystem, and the other is a large-ish 10TB filesystem that is exported by nfs.



The boxes are primarily nfs servers, and are frequently maxed out and are the bottleneck in the processing chain.



Every so often one of these 10TB logical drives fails and needs to be rebuilt. this happens about once a month, and it a pain.



The message is " Message: This logical drive has failed and cannot be used. All data on this logical drive has been lost."



We have tried updating the firmware on the disk array, and the kernel module, and various flavours of linux have been used for the host OS, debian, CentOS, and xfs and ext3 have been tried as filesystem types. However the logical drives still regularly need rebuilding from backups.




I have attached a hpacucli diagnostic output for one of the failed drives. http://pastebin.com/9zTiuSAN



some interesting output items;



Smart Array P410 in slot 1 : Identify Controller
RAM Firmware Revision 2.00
ROM Firmware Revision 2.00



Any suggestions on what might be the problem, or how I might go about instrumenting these arrays/disks to get an idea of what is causing the drive to fail?




# cat output.txt  | grep -B 2 'Drive Firmware Rev'
Drive Model ATA GB1000EAMYC
Drive Serial Number WMATV2509266
Drive Firmware Revision HPG2
--
Drive Model ATA GB1000EAMYC
Drive Serial Number WMATV1739564
Drive Firmware Revision HPG2
--

Drive Model ATA GB1000EAFJL
Drive Serial Number 9QJ456MN
Drive Firmware Revision HPG8
--
Drive Model ATA GB1000EAFJL
Drive Serial Number 9QJ45RS3
Drive Firmware Revision HPG8
--
Drive Model ATA GB1000EAFJL
Drive Serial Number 9QJ460P0

Drive Firmware Revision HPG8
--
Drive Model ATA GB1000EAFJL
Drive Serial Number 9QJ454YN
Drive Firmware Revision HPG8
--
Drive Model ATA GB1000EAFJL
Drive Serial Number 9QJ4664M
Drive Firmware Revision HPG8
--

Drive Model ATA GB1000EAFJL
Drive Serial Number 9QJ457M9
Drive Firmware Revision HPG8
--
Drive Model ATA GB1000EAFJL
Drive Serial Number 9QJ46Q9E
Drive Firmware Revision HPG8
--
Drive Model ATA GB1000EAFJL
Drive Serial Number 9QJ4630X

Drive Firmware Revision HPG8
--
Drive Model ATA GB1000EAFJL
Drive Serial Number 9QJ454PD
Drive Firmware Revision HPG8
--
Drive Model ATA GB1000EAFJL
Drive Serial Number 9QJ45Z0Y
Drive Firmware Revision HPG8
--

Drive Model HP DF0146B8052
Drive Serial Number 3QN1KS7H00009949SQ4M
Drive Firmware Revision HPD5
--
Drive Model HP DF0146B8052
Drive Serial Number 3QN1KNFS00009949UX4F
Drive Firmware Revision HPD5

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...