IBM ServeRAID M5014 issues..

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

not_technical

New Member
Jun 6, 2015
19
1
3
54
I now noticed that the drive in slot #1 was making little weird and repetitive noises. The other drives are not making unusual noises.
I connected drive #1 to a PC and I'm not hearing funny noises..
 

not_technical

New Member
Jun 6, 2015
19
1
3
54
With the disk in slot #1 truly disconnected, I am still getting unexpected sense errors:



Disk #1 works just fine in the PC.
 
Last edited:

snazy2000

New Member
Jan 28, 2014
22
14
3
I had this issue as was due to not enough powet going to my backplane, which was odd, as if i put one of the fans in all hdds would error. So i replaced the psu and all good now. Also had it when drives wernt pushed in to backplane enough :)
 
  • Like
Reactions: not_technical

neo

Well-Known Member
Mar 18, 2015
672
363
63
I just reglanced at your logs. Your controller is being reset every second. That is not a drive error.

Could be PSU issues, could be overheating.
 
  • Like
Reactions: not_technical

not_technical

New Member
Jun 6, 2015
19
1
3
54
With the drive in slot #1 removed, errors are less frequent:

MegaRAID Storage Manager 14.08.01.02 Event Log - Generated on Mon Jun 08 17:59:30 CEST 2015
-------------------------------------------------------------------------------------------
ID = 113
SEQUENCE NUMBER = 44904
TIME = 08-06-2015 17:54:10
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x00 0x55 0xe1 0xb6 0x60 0x00 0x00 0x00 0xa0 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 44903
TIME = 08-06-2015 17:54:09
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x00 0x55 0xe1 0xb4 0x00 0x00 0x00 0x00 0x60 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 44902
TIME = 08-06-2015 17:52:44
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x01 0x54 0x79 0x1a 0x00 0x00 0x00 0x00 0x30 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 44901
TIME = 08-06-2015 17:52:44
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x01 0x54 0x78 0xea 0x30 0x00 0x00 0x00 0xd0 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 44900
TIME = 08-06-2015 03:03:58
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x00 0xf7 0x2e 0xa8 0x10 0x00 0x00 0x00 0x80 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 44899
TIME = 08-06-2015 03:03:58
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x00 0xf6 0xf8 0x59 0x20 0x00 0x00 0x00 0x40 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 44898
TIME = 08-06-2015 03:00:34
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x33 0x8c 0x60 0x00 0x00 0x00 0x08 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 44897
TIME = 08-06-2015 03:00:33
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x00 0xf7 0x5f 0x82 0x00 0x00 0x00 0x00 0xe0 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 44896
TIME = 07-06-2015 20:53:47
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x01 0x45 0xde 0xb8 0x00 0x00 0x00 0x01 0x00 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 44895
TIME = 07-06-2015 18:08:42
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x32 0xd6 0x40 0x00 0x00 0x00 0x08 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 44894
TIME = 07-06-2015 18:08:36
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x32 0xda 0x50 0x00 0x00 0x00 0x08 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

ID = 113
SEQUENCE NUMBER = 44893
TIME = 07-06-2015 17:37:37
LOCALIZED MESSAGE = Controller ID: 0 Unexpected sense: PD = -:-:3Power on, reset, or bus device reset occurred, CDB = 0x88 0x00 0x00 0x00 0x00 0x00 0xfd 0x31 0xf3 0x00 0x00 0x00 0x01 0x00 0x00 0x00 , Sense = 0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00
At this point I'm beginning to think that the PSU is the most likely cause.
The system does have active cooling, 2x 120mm intake fans and 1x 120mm exhaust. The PSU is bottom mounted so it has its own isolated airflow.
Could the overall airflow be insufficient? I can't add more fans, but I can replace them with more powerful units.
I assume the card has a temperature sensor, but I don't know how to check the reading..
 

not_technical

New Member
Jun 6, 2015
19
1
3
54
Right, I took a nice powerful Delta fan out of my parts bin and pointed it at the card. Temperature dropped fast to the point where the heatsink was only mildly warm.
I waited a few minutes for good measure, then I launched a benchmarking tool (CrystalDiskMark) to stress the drives and the controller is logging unexpected sense errors like there's no tomorrow..
 

not_technical

New Member
Jun 6, 2015
19
1
3
54
I managed to test with another PSU... it has the same specifications (450W max output, etc.) but is from a different brand (XFX).
Still getting unexpected sense errors.
I do have to say that these PSUs, even though are relatively new products on the market, have only one cable with 3x Molex power connectors, so I use 3x Molex Y-splitters to connect all 7 drives on the same power rail. Could be this a problem? Surely the entire system doesn't even come close to 450W!
 

not_technical

New Member
Jun 6, 2015
19
1
3
54
why are you using molex if you have sata power and aren't using a backplane?
Because my LSI mini SAS fanout cables actually have Molex connectors:



I now have both my RAID groups online again, I may have solved my problem. I will update the thread once I'm sure.
 

not_technical

New Member
Jun 6, 2015
19
1
3
54
Success. All my drives are online and stable with no more errors. Because of this, performance seems to have also improved slightly.
I took one of my two fanout cables and sawed (!) the SAS connectors so that I can separate the data and power. This way I was able to feed the 3 drives of the second RAID group directly using SATA power connectors from the PSU.
The 4 drives of the first RAID group are still powered through Molex, and each uses one of the 4 Molex connectors of the PSU.
Finally, I connected all the fans to the motherboard. The result is that I eliminated all the Molex splitters I had before and the drives are nicely distributed on both rails of the PSU instead of just one.

I re-imported my virtual drive configurations and rebuilt physical disk in slot #1, which went offline right in the middle of the previous rebuild process (as mentioned in the first posts). Next I ran a consistency check on both VDs, and I only have a small number of read errors on the first VD. No unexpected sense/reset errors are logged anymore, not even under stress with benchmarking tools: