Urgent help and advice needed please, power supply suffered massive failure and now some of my drives are gone

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Road Hazard

Member
Feb 5, 2018
30
2
8
53
THANKFULLY, this wasn't my main server but it was a secondary server that I back the main one up to and it needs fixed ASAP.

It was a 4U Rosewill server (RSV-L4412U), using a Gigabyte ATX board and a Seasonic power supply. It's been humming along for years with zero problems and then today, the power supply suffered a massive failure. When I shake the ps, there's something lose in there. UPS flipped out, LOUD pop, etc., etc.

I put a new ps in the system and turned it back on to see what all was damaged. While Debian 11 is booting, I see this at the top of the screen for a while:
'mpt2sas_cm1: overriding NVDATA EEDPTagMode setting' ...... but eventually the OS loads and I'm at the desktop.

Looking in the 'Disks' app, I see that my MDADM array is offline (expected because....) of my 11, 8TB drives, only 8 are showing up. I have two, LSI 9207-8i HBA's installed. Card 1 has 2 SFF cables going to it (and controlling a total of 8 drives) and the 2nd card has 1 SFF cable (controlling 3 drives).

At this point you're probably thinking....."well duh, the card with the 3 drives is the culprit" but that's not so. Because the 3 drives that card 2 is controlling.... 2 of the 3 show up. And on the first card, 2 of the 8 drives it's controlling aren't showing up.

I pulled the drives from their disk trays and checked them individually in a USB dock on another system and the other Linux box was able to see that they're all part of an array so I'm fairly confident (crossing my finders) that the data on them is intact.

After some troubleshooting and sitting down to think, I don't believe the motherboard or HBAs are damaged. It's looking more and more like the back plane in the Rosewill is at fault. I think this because as I mentioned above, one of the HBAs can see 2 of its' 3 drives. If the PCI slots were damaged, I don't think EITHER HBA would work or either HBA would see ANY of their attached drives. Sure, I could do some more swapping of cards/drives but I don't want to spend any more time on it and I need to get the backup system up and running fast.

Would appreciate your thoughts on all that but my big question is this..... what's the best way out of this mess?

Option 1. $360 - Buy a new Rosewill RSV-L4412U and a new power supply :) and put everything back together.

Option 2. $219+$80+100=$400 Which is a SuperMicro CSE 826 12x+ 3, quiet, replacement fans (FAN-0104L4)+$100 (or so) for the quiet version of a power supply because stock SuperMicro power supplies are...... noisy. (Any problems with 8TB SATA drives with the SAS826A back plane?)

Option 3. $600 - Buy a CSE-847 (BPN-SAS3-846EL1 back plane). The reason I was thinking of this route is because I currently have 20'ish 4TB drives in my main server and I was thinking about expanding. I like using 4TB drives to reduce resilver times so I could move my existing motherboard into this new unit and take the guts from the Rosewill case and put them in my existing SC-846 chassis. BUT..... the CPU cooler I'm using in my 24 bay SC-846 won't fit in the SC-847 so I'll need a low profile cooler for my i7 7700K so not a huge deal, just mo' money. :(

So there you have it............ would greatly appreciate any and all comments!
 
Last edited:

DavidWJohnston

Active Member
Sep 30, 2020
265
221
43
Well that's an unfortunate situation! I know you're sick of trying stuff, but you might be close to the answer:

Can you plug in the 3 drives directly into the HBA without the backplane(s), and run them out of the drive cages temporarily?

At least that would get your system working again right away while you wait for parts. (And eliminate the HBA lanes and SATA cable as the problem)

If you have time, a link to a pic of your specific backplane would be useful - Plus which cages were the problematic drives in.

Good Luck
 

Road Hazard

Member
Feb 5, 2018
30
2
8
53
Well that's an unfortunate situation! I know you're sick of trying stuff, but you might be close to the answer:

Can you plug in the 3 drives directly into the HBA without the backplane(s), and run them out of the drive cages temporarily?

At least that would get your system working again right away while you wait for parts. (And eliminate the HBA lanes and SATA cable as the problem)

If you have time, a link to a pic of your specific backplane would be useful - Plus which cages were the problematic drives in.

Good Luck
Thanks for the reply!

I could maybe do that.... each cage holds 4 drives and there's 3 cages. The only thing that worries me is..... I'd be turning the backup server on and off a few times and I'm worried that the already damaged back plane might catch on fire as a result. Or is the damage done and there's no possibility of things getting worse?

In theory, that could work (great idea BTW).... my motherboard has 3 SATA ports free and I could hook the ones that belong to the busted cages in there...... just worried about energizing an already damaged part.

I'll try and get some pictures tomorrow morning. Been screwing with this all day.....and dealing with other life surprises (first world problems and all that) and I'm mentally drained.... lol

I'm leaning towards options 2 though. I think used enterprise gear is better than off the shelf/consumer stuff but it would be awesome to be able to 100% verify the drives and HBAs are in perfect working order before buying anything so I know exactly what needs replaced and how much damage was done. Your idea could do the trick...... if it doesn't lead to a fire :)
 

Road Hazard

Member
Feb 5, 2018
30
2
8
53
For anyone keeping score at home.......

Silly me, I thought the Rosewill had one, long, side-to-side back plane but nope.... each cage has its' own, self-contained back plane. I discovered this as I was taking it apart to grab some pictures. I ended going with option 2 and ordered a CSE 826 12x chassis for $284 shipped. It comes with 2, 920W Platinum SQ power supplies.

I booted the Rosewill with a thumb drive (Arch Linux) and removed the SATA breakout cables from the cages and used a spare SSD and tested each, individual connection and the SSD was accessible each time. It doesn't look like anything else is damaged except those cages.

Kicking around the idea of ordering 3 new cages for the Rosewill ($200 total) plus a new ATX power supply ($100) and at least get it operational in case this happens again, which hopefully it won't. :)

The only sucky thing is the 826 us a 2U chassis and I had to order some low profile brackets for the LSI 9207's that won't be here for probably another week and a half. The CSE 826 should be here on Thursday so question..... how dangerous is it to use PCI cards without a bracket for a week or so? Is there some life hack way I can wedge them in there so they don't move? The server is on a heavy duty shelf and of course, doesn't move so should I be OK for a while or not risk it?
 

Road Hazard

Member
Feb 5, 2018
30
2
8
53
Got the 826 chassis but it has 3 fans behind the back plane and each one has a 4 pin connector. The ATX board I'm installing in there only has one, 4 pin connector free. I think my back plane is a SAS826A and I noticed it has 3, 4 pin connectors labeled as: I2C across the back. Will the fans work being powered off those?