Tuesday, June 2, 2015

Replacing hard drive in RAID 5 array on HP ProLiant DL380p Gen8 Server



I'm a relatively new (and the only) system admin at my organization. We have an HP ProLiant DL380p Gen8 Server that is no longer under any sort of support contract from HP. We're using it as a Hyper-V host to 4 virtual servers. The virtual host itself isn't being backed up, but the virtual servers running on it are backed up to Azure. (We only need the physical server to last a few more months until I move the last remaining app server to the cloud, and switch all our users/machines to Azure AD from on premise AD). The server's RAID controller is a Smart Array P420i Controller.



Yesterday, one of the 300 GB drives in the server's RAID 5 array (there's three drives in the array in total) started to alternately flash green and amber. According to page 102 of the manual and the server's iLO interface, this drive is in a "Degraded (Predictive failure)" state.




This is literally my first time ever replacing a RAID drive on a production server, and I want to make sure I don't screw it up. As the only admin, I don't have anyone that I can ask for help.




  1. Do I have to wait for the drive to actually fail before swapping it out? Or can I swap it out now, pre-emptively?


  2. Can the drive simply be hot swapped out (as in push the eject button, pull it out, and pop the new drive in)? Will the RAID array begin to rebuild automatically, or do I need to tell the controller/Windows about the existence of the new drive?


  3. Is there any risk/benefit to cold swapping the drive instead? The server technically doesn't need to stay up during off hours, so I could stay behind to cold swap it. BUT, this answer says that there's a danger to cold swapping and "that this must be done while the system is running"... It's an older server model, but I don't understand why there would be a problem cold swapping.


  4. I've read about additional drives failing when trying to rebuild a RAID 5 array. Since this drive technically isn't failed, but is only "predicted to fail", does this in any way lessen the likelihood of another drive failing (since if they were to fail soon, they would be in the same state as this one, and not in a healthy state)? This is more for my own peace of mind lol...





Thanks for all your help!


Answer




  1. No, you better replace it as soon as possible. As other drive might fail, so to do it early is the best course of action to take.


  2. Yes it can be hotswapped. On the host itself, as it's a Hyper-V on Windows, if you have the raid utility installed you can see the status of the RAID. You can issue a rebuild from there to prevent a restart.


  3. No, but I would do it while the server is running.


  4. Yes, as when a new drive will be inserted the rebuild will make extra stress on the remaining drive, as such it's why it's best to do it early. The idea is if one drive fail, other might fail too soon. To illustrate it, it's like car tire, they got the same threadwear, so in logic can possibly fail soon as one felt already.



No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...