Create Custom Fan Speed Maps for SuperMicro X10 Servers via IPMI

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Layla

Game Engine Developer
Jun 21, 2016
215
177
43
40
I got tired of the noise with my SuperMicro X10 Server (4027GR-TRT), but didn't want to get into involved hardware fan swaps which still may produce unsatisfactory results.

For reference, the "Optimal" fan setting decided to run half of the fans at 8000RPM at idle, and the other half at 5000RPM at idle. The four running at 8000RPM were killing me.

The existing 13000RPM fans are actually very quiet at 1800-2200RPM, and they cool the CPUs just fine at idle at this fan speed, so I decided I'd spend an hour or two and write a perl script to control the fans with a custom temp->duty cycle table. This way, when things do need to be better cooled, the existing fans can still spin up and provide enough static pressure to accomplish the job. My main concern was not the noise under load, but the unnecessary noise at idle, and this script allows me to fix that.

Caveats: you would not want to use this in a production environment. This script requires some reliable device to run it on (e.g. a raspberry pi), and it does present a new failure mode for your server. You probably don't want to run this script without having functioning IPMI alert emails so that if something does go wrong, you can quickly be notified and take corrective action. Also, this script comes with absolutely no warranty and you use it at your own risk! :)

All that said, the script works for my purposes. I do run it on a reliable raspberry pi. You could theoretically run it on multiple pi's simultaneously for redundancy. It allows you to set your own temperature->duty cycle mappings for both CPU and GPU. In my case, the GPU temperatures are the gating factor, and the system ends up running at higher minimum fan speeds for two reasons:
1. The GPU temp sensors unfortunately only update once every ~3.5+ minutes!
2. The GPU needs a higher minimum fan speed to stay at a reasonable temperature at idle than the CPUs do.

The script can be found here:
missmah/ipmi_tools

You modify the script to contain your ipmi username, password (highly insecure since it's hardcoded in the script!), and ip address. Setup the number of CPUs and GPUs and FANs you want the script to query.

Note that the script assumes 4 banks of fans that it has to control the temperature of. It wouldn't be hard to automate this based on a constant, but I didn't do it because every server I have has 4 banks.
I felt bad publishing the script in this state, so I added support for a configurable number of fan banks and having the script generate the hex for each bank in a loop :)

NOTE: The script puts your server into "full fan speed mode", which means that you'll have to manually use IPMI to set it back to something like optimal if you want your server to go back to throttling its own fan speeds.

Anyway, I hope this script is useful to someone else. It can be a handy alternative to using naturally quieter fans, which may also have dramatically reduced cooling capacity under heavy load. It's also handy because you can tune the temp->duty cycle ramps yourself. And with some minimal changes, you could also choose other metrics to add ramps for (e.g. PCH temp, memory temps, etc., if those things are potentially undercooled in your system).

My system, at idle, is using 34% fan duty cycle (to keep the GPU at 67*C and CPUs at 39*C).
Not accounting for the GPU, it was happily keeping everything else cool at just an 8% duty cycle (whisper quiet!!!); but, the GPU was overheating during its 3.5 minute measurement windows. I kept bumping the GPU duty cycles up until I reached the point where its temperature stays fully stable between measurements at idle, and that ended up being 34% duty cycle. In a hot room, CPU1 and CPU2 are both 39*C, PCH is 49*C, System temp is 37*C, VRM are 30-40*C, memory is ~35*C, etc.

Okay, I've written a book already - here you go, and I hope someone finds this useful!

Layla

P.S. I'm attaching an image of what the script looks like while it's running (realistically you'd want to use tmux so you can disconnect your shell while it keeps running - note: the script uses about 2% cpu on an RPi while actively working each interval, and 0% while sleeping):
 
Last edited: