Testing & deliberately rebuilding PERC 6/i array

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

moonslug

New Member
Aug 14, 2013
6
0
0
Hey guys, I wanted to run some PERC 6/i behavior by the forum and see if it's to be expected.

I have my 6/i running in my ESXi whitebox. It currently has two WD Reds in RAID1. I have the OMSA VIB installed on the host and OMSA Manage Node installed on my PC, which is where I configured the RAID. I was curious how it exactly it would treat a failure, so to test, I deliberately failed one of the drives by pulling SATA power while it was online. OMSA reported a degraded state and one inaccessible physical drive, and the volume stayed online, as expected.

At this point, I reconnected power to the drive, and shortly after OMSA detected both drives as 'foreign'. It did NOT show the non-failed disk as healthy. The only OMSA task I seem to be able to execute now is under "foreign configuration options". As soon as I began the repair, the volume went offline in ESXi. OMSA showed that it began to rebuild the RAID. It remained offline for the duration of the rebuild. Took a few hours. When it was done, I had to reconnect the storage adapter in ESXi, but no data was lost. During this process, the disk that I did not fail was 'OK', while the disk that I failed was the one that went through the rebuilding process.

Here are a couple screenshots I took during this process.

Is this expected? SHOULD the virtual disk go offline during a rebuild? I thought array rebuilding could be performed while leaving the disk online. Is that not true?

I also feel like I should have more options available to me to recover the array. I feel like I should have something like a "Rebuild" option somewhere, such as described here. But, I don't seem to have that (see image three in my gallery with the "no tasks available"). However, I only could perform operations of the "foreign configuration" category. Maybe this is indicative of the way in which I failed the drive? Would a "real drive failure" prompt the controller to behave differently than my physical intervention and disconnecting SATA power?

Just trying to understand my controller better, the manual was not too clear in this regard. Thank you for any help.
 
Last edited:

bwillcox

Member
Jan 20, 2013
32
0
6
Tejas
Do you have the latest firmware from Dell on that PERC? The latest is version 6.3.3-002.

Doing this isn't really a valid test, the reason being is that the drive you pulled still has the RAID configuration metadata on the drive.
If you truly had a drive die and inserted a new clean drive, it would have stayed online and likely would have started a rebuild by itself.

It knocking the "good" drive off line is a bug in the firmware. Older PERC firmwares (especially on the PERC5s) are pretty buggy in my experience with them.

What it should have done is left the good drive online as degraded and seen the reconnected drive as foreign and allowed you to import or clear the foreign configuration in OMSA.
In your case you could have imported it since you knew it was from the same array.

If you know it is not or are unsure then you will always want to clear it, which destroys the metadata on the drive and makes it the same as a clean drive, and then you can use it to rebuild if it doesn't do it itself.

This situation is why we have a policy at work where any not-broken drive that is reclaimed from a system gets secure wiped in another system on a motherboard controller or a HBA, as inserting a drive with foreign metadata is known to badly confuse RAID controllers. The Adaptecs we use a ton of at the office would have the same issue.

-b-
 

moonslug

New Member
Aug 14, 2013
6
0
0
Do you have the latest firmware from Dell on that PERC? The latest is version 6.3.3-002.

Doing this isn't really a valid test, the reason being is that the drive you pulled still has the RAID configuration metadata on the drive.
If you truly had a drive die and inserted a new clean drive, it would have stayed online and likely would have started a rebuild by itself.

...
Thank you for your input. Yes, in retrospect I realize it was a pretty hamhanded way to test a drive "failure", and I'm concerned about my testing methodology.

I do not have the latest firmware. This is a whitebox, not a Poweredge, and the firmware on the 6/i when I bought it is 6.1.1-0047. OMSA is alerting me that it's out of date, but isn't offering a clear way of updating it. Would you or anyone else know how to properly update the firmware on the controller? I have pored over the internet, the user's manual from Dell [pdf link], and the FTP repository [link] offered for the controller, but I can't find any clear way to do this. I'm gathering that if I had a real Poweredge server, I could run a model-specific updating program to handle it. Do you know if this is possible in a custom setup like I have?
 

mrkrad

Well-Known Member
Oct 13, 2012
1,244
52
48
you can change much of this with megascu - hp lsi controllers you can do a pull push and a rebuild will commence by default.
 

moonslug

New Member
Aug 14, 2013
6
0
0
Do you have the latest firmware from Dell on that PERC? The latest is version 6.3.3-002.
Here is version 6.3.1-0003, A14. It's the most recent firmware I could find on Dell's site, published eight months ago. Is there one newer than that? I was unable to find it when I searched for it directly.

Also, it looks like this driver suite contains a collection of .exe utilities and the actual firmware .ROM. Can I pass the controller through to a VM and run it directly, or do I need to use some utility like megascu/megacli?

Thanks.
 

agent0

New Member
May 28, 2013
15
0
1
Did you ever get the firmware updated on your perc 6i if not the process is really easy.

1. you need to create a dos bootable usb key I always use the rufus utility to create my key.
Rufus - Create bootable USB drives the easy way

2. Go to Dells site download the latest firmware copy it over to your usb key.

Driver Details | Dell US

3. Boot your server off of the key than run update.bat.

Also for firmware I don't pass controllers through to vm's I just reboot the physical server and update it from my usb key.