Dell C6400 ramps fans up to near full speed all the time

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

clcorbin

Member
Feb 15, 2014
83
15
8
So I decided I wanted to play around with HCI stuff in my homelab. I ended up getting a (used of course!) Dell C6400 w/4x C6420 nodes with Gold 5120 CPUs. I've just started setting everything up and have a problem: The C6400 ramps up the fans to near full speed (17000+) with only one node on and it's CPUs sitting at 36C to 40C. The inlet temperature is only 24C. And this is with dual 2000w supplies installed and powered on (I've had older servers that would go max fan speed if there was only one power supply).

I did remove the heat sinks/clean/reapply heat sink compound (artic silver) and reseat everything. While replacing the heat sink compound DID bring the CPU temperatures down 5 to 10C (the above temps are post new compound) and it did slow the rise of the fans from 20 seconds to a minute or so, it is still screaming like a banshee and is basically useless like this.

Does anyone have any experience with this server and have any ideas on how to trouble shoot this issue?

Thanks for the help!

Clint
 

mrpasc

Well-Known Member
Jan 8, 2022
602
365
63
Munich, Germany
Did you use blanks for the not used slots?
Did you populate the single sled to the upper right slot?
Check the SEL log of the sled if there any entries like “Unable to control the fan speed because a sled mismatch or device incompatibility detected” entries exist.
 
  • Like
Reactions: wifiholic

clcorbin

Member
Feb 15, 2014
83
15
8
The first one I was playing with was in node 1 slot (upper left). At first, I only had the one node in, but I did try it with all four nodes installed (but only node 1 powered up).

Actually, that last sentence is only partially true. I had nodes 1, 2 and 3 fully installed, but node 4 was only slid into place but not seated as the backplane connector on that node was damaged and wouldn't seat into the connector on the C6400. I'll try to reconfigure that sled slightly (I have a BOSS-S1 that has a cable that replaces the HBA330 and it's damaged cable) and see if that makes any difference at all.

I'll check the logs tomorrow (system is powered down for the night!)

And thanks for the input!

Clint
 

wifiholic

Member
Mar 27, 2018
55
59
18
39
That's in line with my experience; if any one or more nodes aren't fully inserted, the fans will spin all the way up.
 

Koop

Well-Known Member
Jan 24, 2024
419
317
63
Sounds like an aggressive fan curve maybe? Any way to modify things with IPMItools? I know there are plenty of Dell models where you explicitly cannot modify the fan curve. Not too familiar with this thing though.

For example on the Poweredge r540 is a server where you can't modify the fan curve at all I think? I'm still learning
 

mrpasc

Well-Known Member
Jan 8, 2022
602
365
63
Munich, Germany
Those Dell su****s changed it with Gen15 and above. You can adjust fans, even in iDRAC but only with the most expensive version of iDRAC (they added new top tier „Datacenter“ for this). Up to version 3.30.30.30 of iDRAC 9 you can change fan behaviour like this example script does but with newer versions they stoped this and only work with Datacenter Version of iDRAC 9 4.00.00.00.
 

clcorbin

Member
Feb 15, 2014
83
15
8
Sounds like an aggressive fan curve maybe? Any way to modify things with IPMItools? I know there are plenty of Dell models where you explicitly cannot modify the fan curve. Not too familiar with this thing though.
The C6XXX and XC6XXX series are definitely much less known than the "run of the mill" R series servers. I've looked everywhere in iDrac on the nodes (no iDrac for the chassis as far as I am aware) and haven't found any were to change the fan curve. What really bugs me is is node 1 spools up like craze with the same CPU temps as say node 3, but when I am only running node 3 (setting up things...), it won't spool up hardly at all and yet have higher CPU temps than node 1 did.

As for IPMItools, I've never used them. Any good documentation on it?

That's in line with my experience; if any one or more nodes aren't fully inserted, the fans will spin all the way up.
I definitely had that issue to being with. The nodes that (shipped separately from the chassis) that ended up for nodes 2 and 4 both had damaged SATA connectors (on the right looking at the front of the node). Those have both been replaced and it does appear it made a difference with some of the nodes. But node 1 is still a screamer.

Oh, and my apologies on the late reply. I was on a road trip all last week.
 
Last edited:

clcorbin

Member
Feb 15, 2014
83
15
8
Ahhhhh now that makes sense and explains EVERYTHING.

@clcorbin what version of idrac is on this? Can you share a service tag #?

Regarding ipmitools google will be your best bet:
https://www.reddit.com/r/homelab/comments/t9pa13
One of the first things I did (to try to solve the fan speed issue) was update all the firmware to the latest and greatest, so the nodes are all on the latest release for iDRAC 9. Node 1 is Dell service tag 91HSCS2.

I'll dig into ipmitools info when I get home.

Thanks!
 

clcorbin

Member
Feb 15, 2014
83
15
8
So. I found a forum reply on Dell that basically said the issue with the fan speeds was the X710-DA2 networking cards (factory installed) were not reporting temperatures correctly, so iDrac was taking steps. They recommended using a slightly older (last 6.x versus the newer 7.0.x) iDrac firmware and expected to have an iDrac firmware update to fix the problem in Sept 2024.

So now if I can just get some optical transceivers that work with the x710 (not really impressed with it right now...) I'll be happy. Even after running the x710 unlock and verifying they were unlocked, the cards went from bitching about unapproved transceivers to bitching about not turning them on because of high thermals. Specifically: "Rx/Tx is disabled on this device because the module does not meet thermal requirements."

Out of all the SFP modules I have, only ONE works with the x710's currently: a copper module that advertises itself as a Finisar FCLF8522P2BTL-E5 1000Base-T (but it links at 10Gb...). No idea why a copper module would work and none of my (six or so different flavors) optical modules will work, even after unlocking the silly card. Makes me appreciate my Mellanox cards all the more.
 
Last edited:

wifiholic

Member
Mar 27, 2018
55
59
18
39
So now if I can just get some optical transceivers that work with the x710 (not really impressed with it right now...) I'll be happy. Even after running the x710 unlock and verifying they were unlocked, the cards went from bitching about unapproved transceivers to bitching about not turning them on because of high thermals. Specifically: "Rx/Tx is disabled on this device because the module does not meet thermal requirements."
I'm sorry, that really sucks. All I have to offer is that I've had good luck with dual-port SFP28 Mellanox ConnectX-4 OCP 2.0 cards in a C6400.
 
  • Like
Reactions: clcorbin

clcorbin

Member
Feb 15, 2014
83
15
8
I'm sorry, that really sucks. All I have to offer is that I've had good luck with dual-port SFP28 Mellanox ConnectX-4 OCP 2.0 cards in a C6400.
I've got a ConnectX 3 dual 40Gb card in each node as well. The OM3 "cables" for them literally came in this afternoon. The plan was to play with hyperconverged stuff through Proxmox and Nutanix and use the 40Gb link on a direct private network for the HC "stuff" and use the 10Gb ports for VM stuff and maintenance (I couldn't find a quad 10Gb port card that would give me dual VM and dual maintenance ports...)
 
  • Like
Reactions: wifiholic

barichardson

New Member
Mar 31, 2022
13
19
3
C64xx sleds are strict about SFP transceivers in the mezz slot needing to report that they are rated for high temps. If not, the NIC will simply refuse to enable TX/RX.

For the Intel X710 dual port OCP mezz card in the service tag you listed these part numbers are compatible:
Dell SFP+ SR high temp optic part number: N8TDR
Intel SFP+ SR high temp optic part number: F8N24
 

clcorbin

Member
Feb 15, 2014
83
15
8
C64xx sleds are strict about SFP transceivers in the mezz slot needing to report that they are rated for high temps. If not, the NIC will simply refuse to enable TX/RX.

For the Intel X710 dual port OCP mezz card in the service tag you listed these part numbers are compatible:
Dell SFP+ SR high temp optic part number: N8TDR
Intel SFP+ SR high temp optic part number: F8N24
Thank you Sir! I'll see if I can't track a few down.
 

clcorbin

Member
Feb 15, 2014
83
15
8
C64xx sleds are strict about SFP transceivers in the mezz slot needing to report that they are rated for high temps. If not, the NIC will simply refuse to enable TX/RX.

For the Intel X710 dual port OCP mezz card in the service tag you listed these part numbers are compatible:
Dell SFP+ SR high temp optic part number: N8TDR
Intel SFP+ SR high temp optic part number: F8N24
I had eight of the Intel F8N24's delivered Monday (I had already ordered them). Needless to say, everyone is happy again.

I'm in a bit of an "interesting" position right now: I don't have sufficient power to run this thing along with my existing home lab "stuff" (R740 serve, Netapp disk shelf, Brocade ICS6610 switch, etc.). MY 1500 watt UPS starts complaining at 80% load and it starts complaining as soon as the fourth node powers up. Given that everything is on a circuit shared with the hallway outlets and lights and it is only 120V 15A, I'm at the limit.

So, I have all the "stuff" coming in to install a dedicated 240V/120V dedicated power for my server rack. That will give me about 2 1/2 times more power available to the rack than it currently has and I won't have to worry about what happens when I plug the steam cleaner into an outlet in the hallway (I try to remember to ALWAYS use the 20A bathroom circuit for that thing!).

Interestingly enough, each node looks like it is only pulling about 140 watts with the OS loaded (and mostly idle). Not as bad as I thought it would be. It will still bump my electric bill by about $30 if I decide to make this the core of my home network, depending on the solar system of course.

I still can't believe I am installing a 240V circuit for my computer play lab!
 

barichardson

New Member
Mar 31, 2022
13
19
3
If you go the route of splitting PSUs across two circuits to balance the load pay attention to the PSU redundancy settings. By default most C6xxx chassis are set to A/B grid redundancy. In this mode the chassis will put all load on the primary PSU and the secondary will only be used if the primary PSU loses power input.
 

clcorbin

Member
Feb 15, 2014
83
15
8
If you go the route of splitting PSUs across two circuits to balance the load pay attention to the PSU redundancy settings. By default most C6xxx chassis are set to A/B grid redundancy. In this mode the chassis will put all load on the primary PSU and the secondary will only be used if the primary PSU loses power input.
My C6400 has the 2000 watt supplies as well. I only plan on bringing in one 240V circuit to a dual receptacle. While I know that having both PS's plugged into a single circuit does reduce reliability, well, I'm not running a hospital here. More than likely, anything that takes down the servers will either be power went out (nothing I can do as I do not have whole home UPS or a generator) or a power cable was unplugged either intentionally (without thinking things through) or accidentally (doing work behind the rack and pulling on something I shouldn't have...). For that second case, have redundant supplies does provide a bit of help given I now have to screw up twice to bring it down hard.

I used to have electrical problems also. I was just about to install some additional circuits and I had the electrician there, but at the last minute what I did instead was I put in two separate banks for the UPS that connects to different circuits in my breaker. For example, one is going to the breaker from the garage and the other one from the hallway light switch, lastly I upgraded power PDU because I noticed that my power would go out when my servers are in the boot process. Lastly, if you are running dual power supply, make sure that you separate the power on two separate banks so it doesn’t overload your system. One of my server has dual 2000 W power supplies and that by itself in the boot process is enough to trip my system.
In my situation, it would ALMOST be as much work to tap into a separate 15A 120V circuit as to install a new run to the panel configured how I want (i.e.: 240V + 120V 20A circuits). It is kind of amusing. I came across a very nice, very new 3000kw APC UPS on auction that went for next to nothing because it was 240V input/output. Nearly no one is going to buy one of those second had because they don't have 240V power at their servers. And if they did, they are running a big enough operation to just buy new equipment with support. Now I wished I had bought the silly thing!
 

bob_dvb

Active Member
Sep 7, 2018
228
129
43
Not quite London
www.orbit.me.uk
Thanks for this thread.

I'm moving house soon and will be able to get a decent solar system with home battery. This has inspired me to try a proper Proxmox cluster (although I should probably try AHV because we use Nutanix at work).

I was curious about the C6400, probably with C6525 modules. It's a big investment, so knowing what the pitfalls are is really helpful. With the correct components, how loud would you say the C6400 is when it's settled down? I don't necessarily want a screaming server all the time.
 

clcorbin

Member
Feb 15, 2014
83
15
8
I'm moving house soon and will be able to get a decent solar system with home battery. This has inspired me to try a proper Proxmox cluster (although I should probably try AHV because we use Nutanix at work).
The nodes I have are technically XC6420s, so they originally came with Nutanix. I plan to test them on the current Nutanix as well. My brother used a Nutanix cluster at a hospital data center and was blown away by just how fast that thing could sling bits from one node to another.

I was curious about the C6400, probably with C6525 modules. It's a big investment, so knowing what the pitfalls are is really helpful. With the correct components, how loud would you say the C6400 is when it's settled down? I don't necessarily want a screaming server all the time.
At this point, I would have to say "it depends". If your nodes come with Intel X710-DA2 mezzanine card, it could be a screamer until they release that updated firmware they are promising in Sept of this year (per forum post on dell's support forum). Apparently, the X710 isn't reporting temperatures properly, so the X6400 is taking steps to make sure everything is actually cool.

If you go with X6525 nodes, then I would assume they come with a different mezzanine card that knows how to report temperatures and aren't too bad. But other than that, I couldn't say.

And take note: This IS data center gear. Even at the lowest fan speeds, you can hear them. My shorty server rack sits in the closet in my "hobby room". With the closet door closed (I have a fan in the closet exhausting the warm air to the attic), I can hear the server rack, but it isn't annoying and a I can't really hear it outside of the hobby room.
 
Last edited: