LSI 3008 - "Unrecoverable medium error during rebuild", anyone?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

jcl333

Active Member
May 28, 2011
253
74
28
Hello, here is my situation.

Supermicro x10srh-cf with onboard 3008 flashed to IT mode (FW 16.00.01.00/BIOS 8.37.00.00/EFI 18.00.00.00)
8x HUSMM144CLAR400 SSDs in R10
I create the array, and within a few minutes the background initialization starts throwing these errors.
Here it is in LSI Storage Authority, in MegaRAID Storage Manager the error is 109
drive errors.png

I have replaced the drive on "deviceid 2", twice, same thing happens (I delete and recreate the array to test).
Also, the errors are always in the same locations between 0x80 and 0x86 and keep repeating.

However, the background initialization continues regardless, the VD is accessible the whole time, if it weren't for these errors it wouldn't appear that anything is wrong and all the drives are online.

Other things I will try:
- Noticed there is a newer 16.00.10.00 firmware from 7/2020, I will try that
- Swap the cable with another one (using breakout cable in this case, not a backplane)

I have two identical servers with this configuration, this is the second one, I did not have this problem at all on the first one, everything the same.
The only difference is that the other one is in a SM 16-bay case (not using the bays for these though).

I suppose I could have a bad port, I hope not, that would certainly suck.

But, what I am really wondering is, what does this error mean? I looked it up in the user guide and it is not helpful, so I am wondering if others here have run into this and can maybe let me know what the proper context of this error is. It says "fatal" so probably should not be ignored.

If it were a bad drive, I would expect it to mark it as such.

Any help greatly appreciated.

Thanks

-JCL
 

andrewbedia

Well-Known Member
Jan 11, 2013
701
260
63
Oh I have that board! However, I don't think I use it at all the way you do (CentOS 8, IT mode, ZFS).

> what does this error mean?
Probably unrecoverable read errors or unrecoverable write errors.

>Other things I will try:
these are good ideas.

I'm assuming you mean IR (integrated raid) mode since you're using the controller to define the array rather than some software defined solution?

Is this on Windows? Have you checked with a tool like hard disk sentinel to see if the drives are actually any good or not? Be aware that interpreting SAS SMART data is a little strange if all you've ever seen is SMART data from a SATA drive. I would be looking for errors by the drive itself (e.g. UREs/URWs) and errors in transit (called UDMA CRC errors on SATA drives, can't remember what SAS calls it).