Power Supply Issue? Weird "GHOST" Issue...

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
So I've mentioned some really weird issues in my build and Icy dock thread and thought they deserved their own thread as it may help others, and educate myself all at once.

I've been running this setup for over 2 weeks without any issues.
- 550w EVGA G2 Power Supply
- 8x 3.5" HDD (6x WD RED 5TB, 2x WD RED Pro 2TB)
- 4x Intel 730 240GB (in icy dock)
- 2x Intel S3700 100GB (in icy dock)
- 2x Intel S3500 80GB
- 2x HGST 200GB SAS 12Gb/s SSD

- 2x P3700 800GB 2.5" NVME

The above are running off:
- 2x M1015 PCIE HBA
- 1x LSI 3008 OnBoard HBA
- Onboard SATA (S3500 -- OmniOS/Napp-IT File Server)
- 1x SuperMicro NVME AOC

The rest of the system is just a supermicro 1P board, e5-2670 v3 (idle, no other vm running), 96GB DDR4 RDIMM, and every fan port used on motherboard from 40mm to 140mm fans -- some 3 and 4 Pin.

The issue:

Yesterday I added a 2nd Icy Dock for 4x 15MM SAS SSD utilizing 1 channel of the LSI 3008.

The 1st drive I installed was NOT detected. The other 3 ports worked fine and detected the drives. Hot-removal did not work, I had to reboot OmniOS VM to remove the drives. Thinking it was a bad cable (Adaptec brand) I swapped to a 2nd (new) cable I got from newegg (LSI Brand). The same thing occurred with this cable except it was a different port. Hot-removal did not work with this cable either. At this time one of my M1015 completely errored out, crashed the VM and I had to remove it from pass-through to get it to boot again. (I thought it over heated due to case open, and would fix it later after finding the issue with the new icydokc/cable).

Thinking I maybe got 2 bad cables I put in another new cable (Adaptec brand) and it mirrored the exact problem as the other cable except hot-removal worked for 1 port. Now, thinking maybe hot-add didn't work properly and why the same port on the same cable didn't work I shut down the VM, installed all 4x SSD and booted up. To my surprise the ENTIRE LSI 3008 stopped working, and my 12Gb/s drives were now gone as well as EVERY drive from the new icy dock.

At this point I talked to @coolrunnings82 and he urged me to investigate power usage on the 5V rail as it sounded like I was really pushing the limits with this many drives utilizing a consumer ATX power supply. Kicking myself for not using a 750 or 850 EVGA G2 I had already, thinking this system is well under 550w even at load as I saw no point... well as it turns out the 750 and 850 have the same 5V rail power, and the 550w is only 10 or 15w less... likely enough to matter with what I had under load, but still not enough (if this is the problem) for adding 4x more SSD (and wanting to add 2x more to the other 3008 channel)

The questions / thoughts:

Does this sound like a 5v RAIL issue? I can see the SSD / HDD not being detected due to not enough power but why would the HBAs completely error out and not allow the VM to boot? Do the PCIE slots require 5V for something too and I had drawn too much when I tried to boot with the 4x SSD in addition to the other drives ALL AT ONCE. (VS adding 4x ssd to a system after boot). At this point power is the only thing I can think of that would cause this.

For the record, I removed the new icy dock the 4x new sas ssd, and the 2x 12Gb/s SSD and then re-added the M1015 HBA that stopped working earlier to the VM and it booted just fine and started to work again like no problem. I did not add extra cooling, and in fact it likely got warmer while working on this. I did NOT power-down the system when I did this as I just slid out the icy dock, un-did the cables, and the LSI 3008 is onboard so nothing to remove there. The 2x SSD that were working prior were not in a location to heat the M1015 either, and I did not wait for it to cool either-way.


What are your thoughts on this being a power issue?
Other opinions?

I've had no problem the last 5+ hrs of this being 'fixed' after removing the 6x SSD and Icy Dock (2x 40mm fans). I saw no voltage 'error' in vSpere either for the record. Not too sure how accurate that is.
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,709
517
113
Canada
I would say the power supply is a likely candidate, it's not faulty, just being loaded down heavily. It sounds like it may be teetering on the edge when you have everything connected. The HBA's are probably just error-ing out because the clocks are getting all skewed by the voltage drop. A gazillion weird things will begin to happen when voltages are just a smidge below what the semi's can cope with. Some chips cope better than others in this regard, but all will get very twitchy causing havoc machine wide. If you have a larger supply on hand to test out the theory with, that's certainly the first thing I would do :)

One of those cheap digital power supply testers would give you a visual on the actual voltages at least, or a multimeter if you're stuck, if you have one or can get your hands on one. I have one in my tool kit for those times when I'm looking for a clear go/ no go, otherwise it's just a compass, but it does give me a decent margin to make a sensible determination 95% of the time.
 
Last edited:

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,142
594
113
New York City
www.glaver.org
What are your thoughts on this being a power issue?
Seems likely to me. Modern systems that aren't designed for loads and loads of high-performance drives are normally pretty wimpy on the other voltages.

This system has IPMI, right? What are the voltage sensors reporting? If it is this supply it claims to provide 20A of 5V power. One thing to note is that the combined output for the 3.3V and 5V outputs is limited to 100W. Without seeing a schematic of the power supply I can't say for sure, but that definitely looks like both of those voltages are produced by a single section in the power supply.
Other opinions?
You (and the others building servers in regular cases) probably don't want to hear this, but if you're putting that many drives in a system, you should use a case designed for a server-type system. It could be desktop or rackmount, but it will (hopefully) be designed with real-world usage requirements in mind. Supermicro does a good job, at least based on the SC836 chassis I'm using*. Not all other brands have such a good reputation. How much time did you spend just getting to this point, even if no hardware was damaged?

* Yes, I modify the SC836. But I limit my modifications to re-doing [all of] the cabling. I don't change fans or use outboard PWM controllers to try to make the chassis quieter (at the expense of cooling) and I don't mess with the airflow design. And I'm willing to live with the consequences of any mistakes I make - they'll be obvious right away, like if I incorrectly put the new 24-pin connector on the power supply cable. There is likely to be an unpleasant noise and smell as expensive components go up in smoke.

I don't understand the thinking where people are disconnecting fans in switches to make the switch quieter. Once it goes "kaboom", it'll be very quiet - assuming there are no flames. Sure, the manufacturer designs the switch for the worst possible situation they could see happening - one or more fans failing, or loss of datacenter cooling. But probably not all of that at once, which is what running enterprise switches in a home environment with fan mods gets you - the equivalent of a couple of bad fans and a higher-than-design-assumption ambient temperature.

An integrated solution (chassis, power supply and cooling) is also designed for the manufacturer's estimate of worst-case scenarios. Sure, if you put an Atom motherboard and one drive in there, it'll probably work fine with no fans. But that isn't the intended use.

A desktop style chassis and an unrelated power supply and an assortment of fans from someplace else is just asking for problems once you get to a certain size. My general rule-of-thumb is "if you need a power cable or fan cable splitter, you're doing something wrong."

Other people certainly disagree with me on this, and they have systems that are working. My personal opinion is it is too much of a hassle to deal with this stuff when something goes wrong.
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,709
517
113
Canada
@Terry Kennedy I hear ya, this would be my approach also, when dealing with production deployments. I'm sure most here would agree with you. However, what a lot of us here are doing, often has one foot in the production/ enterprise camp and one foot in the home PC camp. In order to be affordable for the lay man to "get in the game", some compromises have to be made as to how things are sourced and built. To also be able to actually live with the equipment, in your bedroom or home office as a large proportion of us do, as opposed to having it racked in a data center or server room, some corners need to shaved close to the bone. Noise level reduction is high on that list, as is power efficiency. I agree that this may well have consequences, like early equipment failure. This constant compromise approach works reasonably well, most of the time, when applied in a sensible manner. It would not work very well in a production environment as you simply do not have the time to troubleshoot poor design methods. In the case of @T_Minus server, it is a classic case of something production servers rarely ever encounter, expansion. Expansion is a fairly clear line of separation between the production/ enterprise and the home lab. 99% of production servers never see the type and level of expansion applied in the home lab, and there-in lies the issue. Expansion, like all design problems, has to be taken into consideration, either at build time, or at some later juncture, and in this case, it's the latter :)
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,142
594
113
New York City
www.glaver.org
In order to be affordable for the lay man to "get in the game", some compromises have to be made as to how things are sourced and built. To also be able to actually live with the equipment, in your bedroom or home office as a large proportion of us do, as opposed to having it racked in a data center or server room, some corners need to shaved close to the bone. Noise level reduction is high on that list, as is power efficiency.
I know what you mean. However, there's some good equipment available on eBay (or elsewhere) if you wait for it to come around - I got a Dell PowerEdge R710 out of a dumpster, and it still had a year of warranty left. That box had dual X5680s, 48GB of RAM and 6 * 15K disk drives, and its average power consumption is 205 Watts. And that is nowhere near the efficiency of even newer units (it is now 2 generations back). It isn't particularly quiet, though. I think that's one of those classic "choose any two of inexpensive, quiet, or reliable" things, though.
Expansion is a fairly clear line of separation between the production/ enterprise and the home lab. 99% of production servers never see the type and level of expansion applied in the home lab, and there-in lies the issue. Expansion, like all design problems, has to be taken into consideration, either at build time, or at some later juncture, and in this case, it's the latter.
A man's GOT to know his chassis' limitations - Clint Eastwood

On a more serious note, at some point in the expansion it is time to re-evaluate and be prepared to replace some / all of the existing system when it gets to be too unreliable.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
Thanks guys!

I migrated from a SM 846 so I could keep the AIO in my office, but it sounds like now I'm limited a bit on a variety of drives to play around with... I wanted a good # for actual usage in the system, and capacity for hot-swap testing, comparison, etc...

I have a huge Intel workstation/server Tower I'm going to see if I have option to put it in there. It wasn't intended for v3 system but never know :) I'll research it and sharei nfo on that build/swap if I go that route.

For now it seems to be rock solid again after removing the 6x SAS SSD.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
Seems likely to me. Modern systems that aren't designed for loads and loads of high-performance drives are normally pretty wimpy on the other voltages.

This system has IPMI, right? What are the voltage sensors reporting? If it is this supply it claims to provide 20A of 5V power. One thing to note is that the combined output for the 3.3V and 5V outputs is limited to 100W. Without seeing a schematic of the power supply I can't say for sure, but that definitely looks like both of those voltages are produced by a single section in the power supply.
@Terry Kennedy

Correct, I checked Voltage in vSphere Configuration->Health Status and ti was showing 5v and usb 5.02v.

Correct, it claims 22A and the 750 and 850 claim 25a. An improvement but not a huge jump... in fact with the system idling and not working for 1 drive, then booting up 'frsh' with all 4 additional SSD in there crashing I don't think I could safely run the additionall 4 SAS SSD, and 2 more I had room for (in icy dock) with what I had now as each time I'd boot my File Server VM it would not work/crash.

I also saw the limitation of 110W total on 5V shared with 3v.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
8x 3.5" HDD * 5W = 40W
4x 2.5" Intel * 3.8w = 15W
2x 2.5" Intel * 3w = 6W
2x 2.5" Intel * 2W = 4W
2x 2.5" HGST 12Gb/s SAS SSD *10W = 20W
4x 2.5" HGST 6Gb/s SAS SSD * 6W = 24W
= 109W at full write.

Excluding the 2x 2.5" NVME that can use up to 18w each but idle around 4W and before these idle is already pushing the 5V rail to the limits if everything was writing... likely why even at idle (including nvme) were border-line.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
Someone on Jonnyguru suggested using one or two of these:
http://www.mouser.com/ProductDetail/Murata-Power-Solutions/OKY2-T-16-D12P-C/?qs=JV7lzlMm3yJLl4pNO9AXdA==

And converting the 12v PSU power to 5v.

I was thinking I could make a small cube project box with molex (or sata) input power, and molex (or sata) output. Plug in power from PSU -- take the 12v leg run through the above chip -- output that on the 5v pins pass-through 12v too if needed - thus the output 5v comes from 12v actually.

Sounds like a lot of work/time to develop such a box, but this might actually be useful for a lot of people, and would be plug and play essentially.
(Jonnyguru member suggested installing one but cutting power/modding the pSU which i'd like to avoid)

Another thought I had is since I'm not using the PCIE/GFX power at all on my PSU I could have the input accept that type of plug too even.

Anyone have any thoughts?
 
Last edited:

pricklypunter

Well-Known Member
Nov 10, 2015
1,709
517
113
Canada
It might be doable, but the overall cost of designing and building a stable, low ripple, high temperature secondary supply with nice tight regulation and load protection in the event of a failure, is likely well above what a suitable commercial server grade, or close thereto, supply would set you back in the first place. The switching regulator you linked to would not be my first choice for something as critical as this. Also, for others to find it useful, once you get into putting it in an enclosure, if it is not something that is universal in its design (think along the lines of "slides into a 5.25" slot"), then you begin getting folks "modding" it to fit their particular chassis. When things go wrong, and they will if folks begin tinkering with it, it's their disks and data that go up in smoke. None of this is a criticism of your plan per se, but I do think it's a risky move :)
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
Not sure what the 1st part of your reply is in regards too... as I wouldn't be using a secondary power supply, or designing anything for 'more/extra 5v' power from another source.

The idea I thought was a rather simple one, maybe i'm missing something :)
Molex or Sata --> Converter Box --> Molex or Sata

What issue do you see with the item linked to, and doing the above? The PSU used for everything else still manages power, the converter does just that and takes 12v and outputs 5v.

I don't have an opinion on what other people do with the idea, design, concept :) Just that others using desktop PSU with lots of disks may have ran into the same issue I have :) and they could design/use something similar.
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,709
517
113
Canada
The first part of my reply was to highlight that if a secondary supply or convertor were designed for use in this manner, it would likely cost more than just using a suitable primary supply to begin with.

From a quick skim of the datasheet, the DC-DC Buck convertor you linked to is a point of load module, taking its input from a low impedance source. It is not designed to be particularly forgiving of long cable runs or without suitable input filtering, it's load protection is also not really good enough imo. With suitable input filtering, short stout cables etc, you would probably get away with it though :)

edit:

I forgot to mention that it is also non-isolated, so should it fail short input-output it would not be at all friendly towards sensitive semiconductors that can't handle any over voltage. Anyway, you get the idea...
 
Last edited:

cptbjorn

Member
Aug 16, 2013
100
19
18
I'm skeptical and don't think you are doing anywhere near 100W on the 5v rail unless something is shorting or faulty. I think step 1 should be measuring (current, droop, ripple, dropouts etc) and go from there
 

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
He has a lot of SSD that are using 5v rail. I think he is only the right track.
I asked here for how to to get power for 3.5" hdd in an NUC and the buck was a suggested method (I never got to do it as I decided it was just not worth the effort)

You could get a pico PSU and take 12v in and you have drive output for some devices if you want a simple way to test your theory.
 

lumdol

Member
Jun 9, 2016
30
2
8
39
I signed up here just to post a reply to this thread.

I have had a very similar issue with a jbod unit for my server involving the Icydock 6x units and had a half dozen in my server.

Perhaps this might apply to your situation- either way it's worth entertaining.

My first assumption was the same as yours: power supply or cables.

Worth noting: smartmontools was reporting:
Ultra DMA CRC ERR, which is generally a cable related problem I believe. I was also getting longer than normal spinups, which I believe can indicate power supply issues. Drives were falling off mysteriously or sometimes with lots of fanfare.

So, I doubled the size of the power supply to be absolutely sure (mega overkill enterprise unit) and pricey brand new cables from atto- and then the problem continued happening.


After driving myself absolutely insane for the better part of a month, researching the far corners of web forums, back and forth with various manufacturers who just blamed each other I decided to simply replace every single possible element within the unit and test the outcome- I just didn't have the time or psychic energy to continue to troubleshoot piecemeal. So, the power supply, plus new HBA, hard drives, new 36 port expander, new cables internally and externally, even the jbod supermicro power board. Literally every single component replaced except the chassis itself. After painstakingly testing each and every component I came to the conclusion that it was an issue with the Icydock. I dropped in a supermicro 8x 2.5" and presto- everything worked without skipping a beat.

Icydock support seems to not want to acknowledge this is the issue. But I am absolutely certain after the long process of elimination.

To be certain I wasn't overlooking something I even swapped back in the Icydock a number of times to confirm and the ghost issue continued to occur-

Main symptoms:

Drives disappearing suddenly

Drives not initializing

Different drive slots appear working / non-working after swapping cables or power cycling- seemingly without any rhyme or reason

Weird intermittent smart errors above that appeared power related

Disks seemed unusually "loud" and "clicky" compared to in other enclosures.

Can I offer an involved technical explanation as to why? Unfortunately not. But I would be very interested to know what various forum members more experienced than I think.

I'm assuming it has something to do with the power regulation in the icy dock units. Since this was a backup system for important personal documents I wanted to be absolutely sure it wasn't a problem with any of the other hardware- I really made sure.

I sent back the 6 units to Icydock and awaiting a response. They did everything humanly possible to discourage me from doing so. They also assured me it could not be them as this is the first instance they've ever heard of this issue- which seems unlikely. I hope they are able to provide a more detailed explanation and not try to dodge taking responsibility. Perhaps it was a manufacturing hiccup with their current run. Also, if you notice on Amazon/ newegg both the Icydock 6x and the star tech 12x 2.5" units have reviews which outline similar strange issues.

I'm just not sure what I am going to do with any icy dock gear once they get around to sending me some kind of replacement or provide a credit of some kind.

I wanted to chime in and provide my two cents as this was so frustrating for me personally I wanted to try and spare you some of the troubleshooting process if indeed our problems are related.

Try the supermicro and let us know your results. Can't hurt.
 
Last edited:

pricklypunter

Well-Known Member
Nov 10, 2015
1,709
517
113
Canada
Welcome to the madhouse :)

Without having one in front of me to pull apart and test, I can't say what is going on inside the icy dock module, but poor power supply design would certainly explain some of the behaviour. If they ever come back to you with an honest explanation, I would be interested to hear it, as I'm sure would everyone else here :)
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
@lumdol @pricklypunter

Thank you for the info! Very detailed, and specific.

I have a NIB Icy Dock from amazon I still need to install and try out with the drives.

I'm hoping next week will be time for me to dig back into this :) I have a handful of things to catch up on, and will get back to this fun home project :)
 

lumdol

Member
Jun 9, 2016
30
2
8
39
Thanks. I couldn't tell you if the new Icydock will give you different experiences. I had 6, half of which came from different sources and exhibited similar issues.

My suggestion would be to remove the Icydock entirely from the mix, and utilize something like the CSE-M14TQC mobile rack, or if you have 2x 5.25 bays, the 8x 2.5" model.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
I have some Intel drive cages I can try and see if they make any difference too.

Might be able to squeeze some in the tower, if not would have to be a little "DIY" capture for the cage ontop the tower... not such a deal since it's a 'home in closet' type deal ;)