Strange problem with two LSI SAS MegaRAID controllers

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

story97

New Member
Aug 31, 2020
4
0
1
Not sure where to start. I'm getting errors reading things from everywhere.

I actually have two controllers in this workstation.
1 LSI MegaRAID 9271-8i
2. LSI MegaRAID 9265-8i


The first controller has 8 2.5" disks in RAID5 as a working production drive
the second controller is running RAID0 with 2 3.5"s as handoff storage to keep redundant data by copying the Acronis backups so there are duplicates, then I send those to an external 8tb drive for further redundancy

I started noticing that MegaRAID Storage Manager would drop off one of the 2.5" drives. Then another.. What was strange is it wasn't the same drives each time.

When this happened it would either come up the drives were unconfigured bad, foreign configured etc. it changed. with RAID5 if I lose two drives that should be total data loss aka the whole array. But i never had to rebuild the arrays, just put the drives back online, or import the configurations again and resume. Once in a great while it would rebuild one of the drives but that was rare

As time went on I couldn't figure it out. Ended up replacing wires including power, reseating cards and drives, Changing drives from one box to another and can't get a consistent read.

Finally I took the card out and left the drives offline.

I started using just the RAID0 with the two 3.5" drives. After a month or so the same thing happened. Same exact problem.

I had dialed it down to assuming it was the card with the bigger array on it. Then when I pulled that card and the drives, same issue. Now the same issue propagated to my smaller drives on the 9265

Can anyone think of what I should be looking at? I find it hard to believe my cards have both gone bad, but I'm lost. Now both cards are not working and I'm running without backups.

Thoughts?
 

story97

New Member
Aug 31, 2020
4
0
1
overheat?
Don't think so. Not only have I attached a fan to the fins on them both, I have a 1 ton portable A/C that blows directly into the case to keep temps low.

generally my CPUs idle around 37 to 40 and when things get hot they still only run up to 50. I've got empty spaces on both sides of the existing card too
 

billc.cn

Member
Oct 6, 2017
49
9
8
Could also be power or signal interference.

Most motherboards provide a way to read out the input voltage levels. The 12V and 5V are hopefully on the same bus as the one feeding the drives, so you can do a continuous log of those to correlate.

Interference should cause command errors which will be logged in the SMART data of the drives.
 
  • Like
Reactions: story97

story97

New Member
Aug 31, 2020
4
0
1
Could also be power or signal interference.

Most motherboards provide a way to read out the input voltage levels. The 12V and 5V are hopefully on the same bus as the one feeding the drives, so you can do a continuous log of those to correlate.

Interference should cause command errors which will be logged in the SMART data of the drives.
Strangely I hadn't considered the voltage, but in reality I'd be really surprised if that's it. I'm running a beast (albeit older technology) and I had two cards in here before, one running RAID0 w/2 drives, and another card with RAID5 8 drives and it ran fine for 2 years.

I started seeing degrading performance with the controller that had the RAID5. it MegaRAID storage would show me 2 dropped drives (rarely ever the same drives) and if I left it, it would go to 4 drives showing missing.

On the other card (9265-8i) ran clean with no issues. then, soon as I started losing everything in the bigger RAID, so i pulled that card out, the same thing showed up. So both lost half the drives in the array, which should be enough for total data loss. But I can re-import them into the array and everything would just sort of pop back in.

Damn, now I've got the voltage in the brain.

My biggest problem is my best friend (of over 30) years and business partner, died Saturday. he was BY FAR the more technical guy. I'm just lost now. Not even sure where to start. Right now the RAIDs are intact and i'm terrified of actually screwing that up and losing all my data (around 15 years worth) is beyond scary.

this really sucks
 

Stephan

Well-Known Member
Apr 21, 2017
920
697
93
Germany
@story97: Buy two JMicron 585 PCIe3x2 cards, those are 5-port SATA 6 GBps. Available since 2018, fast, compatible. Should not cost more than 50 USD per card. There are multiple varieties, get one or two with a largish usually black heatsink.

This will give you 10 non-RAID-ports. Of course then you need OS support to get some kind of RAID going, could use Windows' built-in features. Can do RAID1 (mirror) or RAID5 (striped mirror) there. You would need to move the Windows part itself to a boot drive though. Maybe get a simple SATA SSD with Power-Loss-Protection (PLP) for that. Those are usually more reliable than spinning rust so if you keep up with Acronis backups you don't need more than this one drive. Everything else goes onto motherboard SATAs or the JMB 585 card(s).

Scrap those LSIs, not worth it except if you run external LTO drives. And even then I would look for a more low power alternative. You will likely not figure out what is wrong with the LSIs. Maybe driver problems coming in with OS updates, bad hardware (age), bad firmware on those LSIs, etc.

If you look at the history of LSI Corporation as a company, it was first LSI, then Avago, now Broadcom. I can only imagine how many of the 1st rate engineers with intimate knowledge of their SAS product line got lost on that company trajectory. Industry is also moving more and more to NVMe interfaces. Maybe to be expected in the end.
 
  • Like
Reactions: story97

story97

New Member
Aug 31, 2020
4
0
1
@story97: Buy two JMicron 585 PCIe3x2 cards, those are 5-port SATA 6 GBps. Available since 2018, fast, compatible. Should not cost more than 50 USD per card. There are multiple varieties, get one or two with a largish usually black heatsink.

This will give you 10 non-RAID-ports. Of course then you need OS support to get some kind of RAID going, could use Windows' built-in features. Can do RAID1 (mirror) or RAID5 (striped mirror) there. You would need to move the Windows part itself to a boot drive though. Maybe get a simple SATA SSD with Power-Loss-Protection (PLP) for that. Those are usually more reliable than spinning rust so if you keep up with Acronis backups you don't need more than this one drive. Everything else goes onto motherboard SATAs or the JMB 585 card(s).

Scrap those LSIs, not worth it except if you run external LTO drives. And even then I would look for a more low power alternative. You will likely not figure out what is wrong with the LSIs. Maybe driver problems coming in with OS updates, bad hardware (age), bad firmware on those LSIs, etc.

If you look at the history of LSI Corporation as a company, it was first LSI, then Avago, now Broadcom. I can only imagine how many of the 1st rate engineers with intimate knowledge of their SAS product line got lost on that company trajectory. Industry is also moving more and more to NVMe interfaces. Maybe to be expected in the end.
OK this makes sense. I already do part of what you're talking about. I have two cards in. one m2 500g and one nvme 1tb both on one card (prob is my bios is older and won't recognize the nvme to boot). I have windows installed on the 500g and the nvme i use for a specific reason

So it just sounds like all i need are two new cards. That makes sense i suppose. I was about to add 8 more drives to the raid5 and max out the card. getting new ones seems cheap enough

it's also what I was thinking that the drivers or something internal with windows since both cards run on the same drivers

Thanks!