DS4243 not getting along with UPS battery power

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

ItsaMeDS4243

New Member
May 11, 2021
9
0
1
Howdy

Not too long ago I picked up a NetApp DS4243 full of drives to finally stop running out of space on my desktop. I hooked it up to an older dell precision workstation running I had kicking around, via an LSI (et al) 9300-8e card, and all running the latest TrueNAS stable. I've also got an old APC BackUPS BR1000G capable of running the whole stack for a few minutes. All has been smooth sailing for the most part, and the last big checkbox before using everything properly is getting UPS shutdown functional. Of course, this is where it's all fallen apart a bit.

At first I thought I was having issues with NUT and the other components of TrueNAS that handle the UPS shutdown, as the system would throw many scsi bus errors and eventually hang the controller PC, requiring a hard reboot. Doing a force shutdown from software (upsmon -c fsd) shut down everything just fine, but actually yanking power to the UPS would cause everything to barf. While troubleshooting I decided to try a shutdown without the data line from the UPS connected and, to my horror, I had the same issue. So it's not related to the software at all, but a hardware problem.

I did the test again with the controller PC hooked up to the wall and only the DS4243 on the UPS, and had the same error. So it seems to just be very upset about the input power, for whatever reason. That particular scenario also causes the shelf to shut the disks off after a period of time, I think 5 minutes or so? Shelf still keeps running so far as I can tell but the disks all power down, light out and everything. Can be started back up by removing+reinserting them, but again I imagine the power changes are making things very upset.

I don't know of any way to get info directly out of the shelf's IOMs for troubleshooting, but I think it's safe to say that the UPS needs to be changed. My question is, has anyone else had a problem like this with a disk shelf freaking out from UPS power? Were you able to resolve it with a different UPS? The newer version of that same APC model (BR1000MS) specifically touts having "Sine wave output", perhaps that would solve the issue? I don't have a different UPS around to test, all roads lead to buying things. If it wasn't obvious, this has very much been a penny pinch setup, but if there's a place to spend money, I have to concede that a fully functioning UPS is where to spend it.

tl;dr my disk shelf freaks out when I switch to battery power and if I spend money on a new UPS I want to be darn diddly sure it fixes the issue.
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,320
800
113
I would guess that it's either due to a too long transition time when mains drops out (APC lists 8-12ms, but maybe the unit is too old or defective) or due to the "Stepped approximation to a sinewave" output. Many SMPS with active PFC do not response well to non-sine input.

But debugging the issue is not going to be easy... Maybe try to boot the shelf with UPS power and see whether the issues are still there? That might rule out the transition time issue.

P.S. APC lists a "Lifetime data recovery warranty" on their product page. You can try to poke them and maybe they will swap your UPS with the newer model instead of risking having to pay for data recovery of a fully loaded disk shelf
 

ItsaMeDS4243

New Member
May 11, 2021
9
0
1
For whatever it's worth, I don't hear any audible fan speed change when the power is pulled, and it does take 1-5 seconds for errors to show up in TrueNAS. Based on that I'd THINK it's unlikely that it's a transition issue, but that's not much to go by.

I don't think the UPS would last long enough to do a full boot on UPS power, it might but it'd be very tight. Honestly with how it's been acting I'm kinda scared of losing a drive with it on battery, too.

The lifetime warranty might be useful in a pinch. The particular units I have now are used, not sure if it'd be worth the trouble of pushing that vs getting a new one and knowing that I can absolutely make a push for a replacement if I have the same issues. Getting a new one would be faster, too.
 

Layla

Game Engine Developer
Jun 21, 2016
215
177
43
40
Harddrives have motors in them - I'm guessing those are more sensitive to sine wave vs square wave input power. Maybe the disk shelf's power supplies don't have enough internal capacitance to keep all the spinning disk motors happy on square wave power?
 

ItsaMeDS4243

New Member
May 11, 2021
9
0
1
Quite possibly. Other units in this house, including this exact one, have been hooked up to PCs running both harddrives powered via a normal PC PSU as well as externals off of the cheapo bricks that came with said externals. In theory it should be fine, but the obvious conclusion is still the shelf PSUs do not like the "stepped sine wave" of the BR1000G.

I went ahead and bit the bullet and ordered a BR1000MS, will update when it gets in.
 

ItsaMeDS4243

New Member
May 11, 2021
9
0
1
Update: New UPS has mostly resolved all of the issues. The bad news is, I still do get some SCSI bus errors when I switch to UPS power, seemingly from the amount of time it takes for the UPS to kick in. "Commands cleared by power loss notification", followed by some retries later on.

The good news is, none of this seems to have any impact on the functionality of the array, data still flows merrily along and I don't even think truenas throws a notification about it in the gui. The shutdown process now behaves exactly like one would expect, after the timer is reached the controller does a soft shutdown and sets the UPS on a shutdown timer to turn off the shelf.

The other bad news is, truenas doesn't seem to enjoy booting up at the same time that the shelf does, I imagine because the shelf takes a wee bit more time to boot than the controller PC does. But this doesn't seem difficult to fix, compared to before.

I'm still curious as to how exactly the power loss errors are happening. My uninformed guess is that the PSUs either have very little buffer in them, or that they're able to sense a quick power drop and notify the controllers to start making arrangements.
 

maes

Active Member
Nov 11, 2018
102
69
28
I'm still curious as to how exactly the power loss errors are happening. My uninformed guess is that the PSUs either have very little buffer in them, or that they're able to sense a quick power drop and notify the controllers to start making arrangements.
Possible on both. One option around the problem would be to get your hands on a full 'on-line' UPS (aka 'double conversion'), although preferably used as they are significantly more expensive than basic models like the BR1000 new. The really nice thing with double-conversion UPSes is that they never 'switch' from line to inverter; they always fully convert line voltage to DC, through the battery, then back to full sinewave and provide complete isolation from the power grid.
 

Layla

Game Engine Developer
Jun 21, 2016
215
177
43
40
Possible on both. One option around the problem would be to get your hands on a full 'on-line' UPS (aka 'double conversion'), although preferably used as they are significantly more expensive than basic models like the BR1000 new. The really nice thing with double-conversion UPSes is that they never 'switch' from line to inverter; they always fully convert line voltage to DC, through the battery, then back to full sinewave and provide complete isolation from the power grid.
Yes, but you pay by the second for that extra "protection" in added conversion losses (a few extra % wasted energy) on your power bill - which adds up quickly over time.