Monday, April 6, 2015

hardware raid - HP Code 341 "Physical Drive State: Predictive failure. This physical drive is predicted to fail soon."

this is my first post so bear with me here :-)



Background:
I have a RAID5 setup with 4 disks that have been working perfect for years.

After one drive failed, I installed new drives which where rebuilding but
it will mark new drives as smart status fail.



Hardware:
HP Proliant ML350 G6, 6Gb RAM, Xeon E5620, BIOS D22
Windows server 2008R2 (updated)
Smart Array P410i
4x 300Gb SAS disks 10k



Situation:

I bought 6 new hard drives and during rebuild of the array it gets smart status failed. I have tried 5 out of 6 drives and all get the same error.



What I have done:
Updated to latest FW for the raid controller and a couple of the hard drives.



I tried to set up another logical drive with raid1 using two of these new drives to test them and it worked without any problem.



All drives have been inserted and taken out during power-on.



Thoughts and questions:

Could really all 5 be broken?



Is there any way to clear the smart-fail status?



Here is a fresh taken report from the ACU:
https://dl.dropboxusercontent.com/u/15772069/report-4bef4a9c-00000cf8-00000000.zip , the important-looking bit of which says:



ACU Version                             8.70.9.0
Diagnostic Module Version 5.2.64.0
INFOMGR Version 6.0.1.0

Time Generated Monday November 23, 2015 9:22:40AM

Device Summary:
Smart Array P410i in Embedded Slot

Consolidated Error Report:
Controller: Smart Array P410i in Embedded Slot
Device: Physical Drive 1I:1:2
Message: The physical drive has failed.
Controller: Smart Array P410i in Embedded Slot

Device: Physical Drive 2I:1:6
Message: Physical Drive State: Predictive failure. This physical drive is predicted to fail soon.
Controller: Smart Array P410i in Embedded Slot
Device: Physical Drive 2I:1:7
Message: Physical Drive State: Predictive failure. This physical drive is predicted to fail soon.

Report for Smart Array P410i in Embedded Slot
---------------------------------------------

Smart Array P410i in Embedded Slot : Device Error Report


Device Severity Error
--------------------- -------- ----------------------------------------------------------------------------------------
Physical Drive 1I:1:2 Critical The physical drive has failed.
Physical Drive 2I:1:6 Warning Physical Drive State: Predictive failure. This physical drive is predicted to fail soon.
Physical Drive 2I:1:7 Warning Physical Drive State: Predictive failure. This physical drive is predicted to fail soon.






I replaced the drives with another batch and now it seems that smart staus is ok but sometimes Windows complains about faulty blocks or so. :S



These 6 new drives with smart status fail is now inside another server. It is a ML350 G6 with Latest firmware on the P410i controller and also the drives(I think). I have put all 6 drives into a new RAID5 drive and just initialized it finished. It seems working very well but still smart status fail.



I did a CTRL ALL SHOW CONFIG DETAIL, and here is the output. Is there any way of reset the smart staus or so?



Smart Storage Administrator CLI 2.60.18.0


Detecting Controllers...Done.

Type "help" for a list of supported commands.
Type "exit" to close the console.



`
=> ctrl all show config detail



Smart Array P410i in Slot 0 (Embedded)
Bus Interface: PCI
Slot: 0
Serial Number: 5001438005EDDF40

Cache Serial Number: PACCQ9SY70JU
Controller Status: OK
Hardware Revision: C
Firmware Version: 6.64-0
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 15 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: No
Queue Depth: Automatic

Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: Permanently Disabled
Cache Status Details: The cache is disabled because one or more attached batteries are not supported by the currently running firmware.

Cache Ratio: 25% Read / 75% Write
Drive Write Cache: Disabled
Total Cache Size: 256 MB
Total Cache Memory Available: 144 MB
No-Battery Write Cache: Disabled
Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: Failed (Replace Batteries)
SATA NCQ Supported: True
Number of Ports: 2 Internal only

Driver Name: HpSAMD.sys
Driver Version: 8.0.4.0
PCI Address (Domain:Bus:Device.Function): 0000:04:00.0
Host Serial Number: CZJ941003H
Sanitize Erase Supported: False
Primary Boot Volume: None
Secondary Boot Volume: None



Port Name: 1I
Port ID: 0

Port Connection Number: 0
SAS Address: 5001438005EDDF40
Port Location: Internal



Port Name: 2I
Port ID: 1
Port Connection Number: 1
SAS Address: 5001438005EDDF44
Port Location: Internal




Internal Drive Cage at Port 1I, Box 1, OK



  Power Supply Status: Not Redundant
Drive Bays: 4
Port: 1I
Box: 1
Location: Internal


Physical Drives

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS HDD, 300 GB, Predictive Failure)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS HDD, 300 GB, Predictive Failure)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS HDD, 300 GB, Predictive Failure)



Internal Drive Cage at Port 2I, Box 1, OK



  Power Supply Status: Not Redundant
Drive Bays: 4
Port: 2I
Box: 1

Location: Internal


Physical Drives
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS HDD, 300 GB, Predictive Failure)
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS HDD, 300 GB, Predictive Failure)
physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS HDD, 300 GB, Predictive Failure)



Array: A
Interface Type: SAS

Unused Space: 0 MB (0.0%)
Used Space: 1.6 TB (100.0%)
Status: OK
Array Type: Data



  Logical Drive: 1
Size: 1.4 TB
Fault Tolerance: 5
Heads: 255
Sectors Per Track: 32

Cylinders: 65535
Strip Size: 64 KB
Full Stripe Size: 320 KB
Status: OK
Caching: Disabled
Parity Initialization Status: Initialization Completed
Unique Identifier: 600508B1001030354544444634300500
Disk Name: \\.\PhysicalDrive0 (Disk 0) (Bus: 0,Target: 4,Lun: 0)
Mount Points: Offline 500 MB Partition Number 1, C:\ 146.0 GB Partition Number 2, D:\ 1.2 TB Partition Number 3
Logical Drive Label: A0017FDF5001438005EDDF40CEA0

Drive Type: Data
LD Acceleration Method: All disabled

physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: Predictive Failure
Drive Type: Data Drive
Interface Type: SAS

Size: 300 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: HPDG
Serial Number: 6SE52A2P0000B213CFP2
WWID: 5000C500437249DD
Model: HP EG0300FAWHV
Current Temperature (C): 33
Maximum Temperature (C): 62

PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 1I:1:2
Port: 1I
Box: 1
Bay: 2
Status: Predictive Failure

Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: HPDG
Serial Number: 6SE51A8S0000B213BAGM
WWID: 5000C5004371B7C5
Model: HP EG0300FAWHV

Current Temperature (C): 33
Maximum Temperature (C): 68
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 1I:1:3
Port: 1I
Box: 1

Bay: 3
Status: Predictive Failure
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: HPDG
Serial Number: 6SE519840000B212DHLM

WWID: 5000C500437278E1
Model: HP EG0300FAWHV
Current Temperature (C): 31
Maximum Temperature (C): 63
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 2I:1:5

Port: 2I
Box: 1
Bay: 5
Status: Predictive Failure
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000

Firmware Revision: HPDG
Serial Number: 6SE521XZ0000B213B62Z
WWID: 5000C50043760E91
Model: HP EG0300FAWHV
Current Temperature (C): 32
Maximum Temperature (C): 63
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None


physicaldrive 2I:1:6
Port: 2I
Box: 1
Bay: 6
Status: Predictive Failure
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Drive exposed to OS: False

Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: HPDG
Serial Number: 6SE519140000B213A6SG
WWID: 5000C5004371FB05
Model: HP EG0300FAWHV
Current Temperature (C): 33
Maximum Temperature (C): 67
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown

Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 2I:1:7
Port: 2I
Box: 1
Bay: 7
Status: Predictive Failure
Drive Type: Data Drive
Interface Type: SAS

Size: 300 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: HPDG
Serial Number: 6SE5194L0000B213BC7W
WWID: 5000C50043720255
Model: HP EG0300FAWHV
Current Temperature (C): 31
Maximum Temperature (C): 60

PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None


SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250
Device Number: 250
Firmware Version: RevC
WWID: 5001438005EDDF4F

Vendor ID: PMCSIERA
Model: SRC 8x6G



=>`



Here is a fresh ADUreport too after init is done:
https://dl.dropboxusercontent.com/u/15772069/ADUReport%20after%20init.zip

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...