Dell R330 Extraneuous Fan Noice After 10G Card

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

csementuh

Member
Oct 7, 2019
36
10
8
Pittsburgh, PA
Hello!

I have a Dell R330 that has been running for maybe 8 months now. Just serving as a basic ESX host. It has always ran very cool and with fans on low speed is very bearable. About 2 months ago I put a Mellanox Connect 3 10G card into it. It ran fine for maybe a week but now the fans have decided to kick themselves up a notch and are screaming pretty good at the medium to high level. They don't seem to ever spin down. I can see the 10G card adding a little heat, but the temps as reported form the iDRAC all seem fine. Does anyone know if there's a way to adjust the fan settings to get them to spin back down to reasonable? Thanks!
 

acquacow

Well-Known Member
Feb 15, 2017
784
439
63
42
Usually the OEM boxes freak out and spin up the fans when you add hardware that doesn't have an OEM firmware that talks to their IPMI stuff.

I'd grab a dell firmware for your mellanox card and cross flash it and see if it calms down.
 
  • Like
Reactions: csementuh

csementuh

Member
Oct 7, 2019
36
10
8
Pittsburgh, PA
Usually the OEM boxes freak out and spin up the fans when you add hardware that doesn't have an OEM firmware that talks to their IPMI stuff.

I'd grab a dell firmware for your mellanox card and cross flash it and see if it calms down.
Ahh interesting, that's a valid possibility! The Dell stuff seems to be pretty open and decent in general but maybe just maybe. I'll see if I'm able to find a Dell branded card at work to try also. It seemed weird though, as if the server ran great for a week or so and then decided to go all loud.
 

csementuh

Member
Oct 7, 2019
36
10
8
Pittsburgh, PA
Acquacow you knew it! That was the issue..

Apparently Dell servers have something called "3rd Party PCI Fan Response" that will ramp up fans with certain undetected add-on cards. On the 13th gen server I have there's a way to disable it. Even with the fan offset set to low it will still run them higher. My fans were sitting at 10K RPM before. After the fix they went right down to 3K RPM. My temps went up a few degrees C, but nothing that I'm worried about and they'll still ramp up more if load dictates it.

If you have Dell iDRAC tools installed you can do this remotely:

racadm (*connection info*) get System.ThermalSettings.ThirdPartyPCIFanResponse

Should show "enabled", now disable it:

racadm (*connection info*) set System.ThermalSettings.ThirdPartyPCIFanResponse disabled

Did the trick for me! Now I just need to see if it survives a reboot. If not there are other ways to write the values raw to the iDRAC so they stick.

My Brocade 7250 is now the loudest thing in my rack. :(
 
Last edited:

acquacow

Well-Known Member
Feb 15, 2017
784
439
63
42
More tribal knowledge from my Fusion-io days... we crammed lots of mellanox and fusion-io cards in that Dell didn't recognize. HP is the same way, but HP released a firmware fix that resolved a lot of it.

Glad you got it solved =)
 

csementuh

Member
Oct 7, 2019
36
10
8
Pittsburgh, PA
Haha I'm glad you have suffered with it in the past and figured it out so that you could share such good knowledge. Much appreciated! I have been lucky till now and most of the Dell stuff I've worked with has behaved well with mixed hardware. This one did not. Thanks again!
 

fohdeesha

Kaini Industries
Nov 20, 2016
2,729
3,083
113
33
fohdeesha.com
A little more background, IDRAC controls the fan speed by using the two below configuration files to take inventory of all the parts in a system, then depending on the combination of parts, it assigns a profile that matches. That profile has its own min/max fan speeds etc.

From your R330 (codename Ratchet in the dell FW):

ThermalConfig.txt
ThermalData.txt

A subsection of this inventory-taking is looking at all the PCI devices. It's not that non-dell firmware cards don't talk to ipmi or anything like that, IDRAC simply looks at the PCI IDs of all the PCI cards and matches them to a list of known dell parts. If you have a card with an ID that doesn't match (like your mellanox card), it gets flagged as an unsupported PCI card, and it falls back into the unsupported PCI configuration thermal profile, which as you noticed includes quite high fan speeds. The IDRAC toggle you found turns off this detection. Here's the list of all known PCI ID's it looks for in raw hex, if your card doesn't exist here it gets flagged:

platcfgfld.txt

Editing these files and compiling a new FW image to permanently turn stuff like this off, add card IDs, and alter the fan curves for certain profiles is actually why I ended up coming up with the Idracula exploit in the first place
 

frogtech

Well-Known Member
Jan 4, 2016
1,482
272
83
35
A little more background, IDRAC controls the fan speed by using the two below configuration files to take inventory of all the parts in a system, then depending on the combination of parts, it assigns a profile that matches. That profile has its own min/max fan speeds etc.

From your R330 (codename Ratchet in the dell FW):

ThermalConfig.txt
ThermalData.txt

A subsection of this inventory-taking is looking at all the PCI devices. It's not that non-dell firmware cards don't talk to ipmi or anything like that, IDRAC simply looks at the PCI IDs of all the PCI cards and matches them to a list of known dell parts. If you have a card with an ID that doesn't match (like your mellanox card), it gets flagged as an unsupported PCI card, and it falls back into the unsupported PCI configuration thermal profile, which as you noticed includes quite high fan speeds. The IDRAC toggle you found turns off this detection. Here's the list of all known PCI ID's it looks for in raw hex, if your card doesn't exist here it gets flagged:

platcfgfld.txt

Editing these files and compiling a new FW image to permanently turn stuff like this off, add card IDs, and alter the fan curves for certain profiles is actually why I ended up coming up with the Idracula exploit in the first place
Any idea if recent iDRAC firmwares have up to date PCI ID inventories from the entire collection of Dell parts? Also, any idea just how complete the lists are as it to pertains to said collection? I'm wondering what is the likelihood that you run into the issue of the unsupported thermal profile being implemented in the event that an older platform simply doesn't have a PCI ID for a newer Dell P/N.