Hey Patrick, and other people searching to control W83795G chipset.
Just wanted to tell that I am enhancing your python script to view more information and change more settings. I have no release yet, but currently it looks like this :
Code:
Fan1
Mode : Smart Fan mode
Fan Output Value (FOV): 29.8%
Temperature to Fan mapping Relationships Register (TFMR): fan linked to
Smart Fan Control Table (SFIV)
21C 21C 21C 21C 21C 31C
29% 29% 29% 29% 29% 50%
Fan Output Nonstop Value (FONV): 6%
Fan Output Stop Time (FOST): never stop
Critical Temperature to Full Speed all fan (CTFS): 74
Hystersis of critical temperature: 0C
hystersis of operation temperature: 0C
Fan2
Mode : Smart Fan mode
Fan Output Value (FOV): 29.8%
Temperature to Fan mapping Relationships Register (TFMR): fan linked to
Smart Fan Control Table (SFIV)
21C 21C 21C 21C 21C 31C
29% 29% 29% 29% 29% 50%
Fan Output Nonstop Value (FONV): 6%
Fan Output Stop Time (FOST): never stop
Critical Temperature to Full Speed all fan (CTFS): 74
Hystersis of critical temperature: 0C
hystersis of operation temperature: 0C
Fan3
Mode : Manual mode
Fan Output Value (FOV): 30.2%
Fan4
Mode : Manual mode
Fan Output Value (FOV): 30.2%
Fan5
Mode : Manual mode
Fan Output Value (FOV): 30.2%
Fan6
Mode : Manual mode
Fan Output Value (FOV): 30.2%
Default Fan Speed at Power-on (DFSP): 25%
Fan output step up time: 0.1 sec
Fan output step down time: 0.1 sec
Basically, I have made some research on how to properly set the fan speed reliably on this chipset for an supermicro X8 motherboard (X8DTi-F), and there was not a lot of options. As I have put a lot of time for that research and looking for the best solution and tools I would like to note it somewhere. It should be useful for other people than me.
The various solutions are
1. Changing the fan profile in the bios. This solution don't work. Bios should do their job and configure the chipset properly, but they don't. I have two computer with this chipset. The server allow to set the fan "duty", which basically correspond to a fixed fan speed of 30%, 40%, 70% and 100% for low, perfomance and so on. The other motherboard is basically the same thing but with a lower duty starting point. In both cases, the fans will dynamically scale based on the temperature, they will stay at their lowest duty at all times. On the server board, 30% is too loud and overkill, on the desktop board it's usually not enough, but it not good when the cpu gest to 100% for long perios of time with inadequate cooling.
2. changing the fans to more quiet and less efficient fans. As this board have an BMC, it also requires to use ipmitool to change the fan treshold to lower rpm, or the bmc will kick all fans to maximum speed as it thinks they are falling. This solution does work, but it needs more investisment, and probably case modding and time to search for compatible fans.
3. There are raw ipmi command that can be used to change the fan duty of Supermicro X9 and X10 motherboard. These don't work with X8.
3. Linux FanControl/pwmconfig/linux perl script/freenas script that control fans based on hdd and cpu/any other driver or userspace fan monitoring software : they all suffer from the same issue. As PigLover told above, and as documented here :
w83795 fan control not working
the BMU (board management unit, the thing that allow to control the computer from ipmi) will ping the fan control chipset from time to time to get information about the fan speed. At the same time, to use any of those programs, the computer must load a driver that will ping the same chip with the same interface. As two system use the chip at the same time, there will be conflict. When modifying the configuration of the driver, I can see ipmitool returning wrong data from the sensor. This conflict makes any of these programs unreliable and thus they should not be used. If the linux driver stop working completly because of the conflict, the computer lose it's thermal solution and might overheat if the last fan speed was set low and the cpu now work at 100%. The same could happen if the operating system stall and the thermal functions handled by fancontrol stop working.
To prevent these issue, one would have to disable ipmi, which is too useful to be done in most cases, or simply skip that solution.
4. One could use the base of the fancontrol, the os driver, to set a fixed fan speed to a low value. Instead of having 2 system pooling regularely the chipset and conflicting, the bmu would retain control most of the time, and the user would set the fan speed once per boot, which should work reliably (and does according to statistics here).
Downside is that we lost the ability to increase and decrease depending on temperature. If the goal is to set to a low noise, and the computer get in a loop which uses 100%, hardware could be damaged because of lack of thermal control.
5. Based on this idea, One could use the linux driver w83795 to control the speed. The driver will directly poke the hardware to get information and configure it's fan mode.
w83795 has an experimental fan mode. It will stay experimental forever, as it has been for the last 5 years. From what I looked at the code, it is incomplete and undocumented. It may or may not work based on your bios configuration.
This bring us to Patrik Dufresne solution, and the solution that I also choose.
6. The w83795 chipset already contains functionnality to modify fan speed to adjust temperature based on the cpu heat. A good way to fix the noise issue would be to reconfigure the chipset to a sane configuration. This would be done one time, so would have less probability to conflict with the bmu board querying at the same time. This solution would be managed completly by the chipset at the hardware level, continuing to protect the system when the OS stall. It's a hacky solution, but right now the best solution that I have in mind.
At this point I read the w83795 specification available here
Hardware Monitor,Desktop,Server,Notebook, Networking,Storage,NCT7,W83 - Desktop & Server Series
to find what it does allow and how it can be configured. As Patrick point out, the chipset is capable of a lot of things, and the bios are just too shitty to expose all of these functionnality to the user. The Thermal Cruise and Smart Fan allow the bios to either set a target temperature for the cpu and will handle the fan speed to be near that temperature, or multiple steps to define a fan curve. Too bad the bios set the minimum step "too loud" and all steps to the same speed, disabling all speed control,.
What if we could reconfigure it ? Well we can!
One could expose all the chipset parameter and configured in the w83795 linux driver. The driver already exposes a lot of sensor data like fan speed, temperature and so on. It allow to set the fan speed manually. An experimental flag allow to change the configuration of the automatic fan mode, but it's incomplete. This could could be improved to expose more fan configuration setting in the thermal cruise and smart fan to configure it in user space using shell scripts, the same way one can set the fan speed to manual mode and set a speed.
I tried to read the code in C and understand it but it was just too confused. The code is mostly undocumented, any functionnality of speed control in the experimental flag is undocumented, it's full of macro where it seem you need to look through a lot of base class to get the code that are executed, sysfs flags are created and I can't find where they were setup. The best solution would be to improve the driver, but personally this seems too hard for me.
Patrick made a python script. This script interract with the same hardware register than the driver, and is way easier to read and understand. It is incompleate because it was done for his needs, but it gave me a good base to continue hacking this chipset. By reading the manual, register configuration I was able to extract relevant configuration/sensor like the linux kernel using this program.
I plan to improve it to allow to change the configuration either the smart fan or thermal cruise mode to better value.
This has been a fun ride because I did not know how to interract with i2c devices, but I learn.
The 2 board that I have both have 2 configuration issue
1. the first one is that both are in smart fan mode, but the steps are incorrectly configured. If the cpu is idle/low power, it may be below the first step most of the time, but will still be cooled at the speed defined at the first step. On my server, I cannot even get past the first step at 30% of fan speed when maxing the cpu. The server have also too much high step. A difference of 5 degree between both step change the fan from 40% to 100%. A better configuration would be a very quiet mode at position 1 for when idle, less quiet a position 2 when used lightly and more loud configuration at position 3 to 6.
2. The temperature reported/calculated by the chipset are off when compared with the configured temperature point.
I have intel cpu, and this chip is configured by the bios to query the temperature using PECI protocol. This peci never report the temperature of the cpu as a degree, but report the temperature of the cpu compared to it's critical temperature. Instead of reporting 27 celcius, it will report in my case -42. The cpu maximum temperature before thermal throttle start is 79 celcius (which represent 0 peci), and at 27 degree is it as (79-27) = -42 PECI UNIT. I repeat, these are peci units, and not degree, because at low temperature they are not reliable and might show as -127. This interface as been done this way to improve understanding of what is a safe temperature. Absolute temperature like 60 degree would be very hot for old cpu, but newer cpu handle these temperature very well. This way to present the temperature allow fan control units to work better for a broad variety of cpu as the cpu will basically give the maximum temperature it handle.
However Chipset like the w83795 does not support the peci unit in negative, to work around this limitation, they add a base value to the value reported by the cpu so that at all times the reported value would be positive. In my case, the chipset was configured to add 100 to the cpu value, so 100+ (-42) = 58. The problem is, that value is then compared to the celcius degree configured in the thermal cruise and smart fan table. My cpu is at 27 celcius, but the system handle the fan as if it was at 58 celcius, which is totally wrong.
I am unsure yet which of the solution that I will take, either allow to configure the base unit (100 in this example to a better one so that the resulting calculation is more near the real temperature. Then keeping real celcius temperature in the fan configuration tables. The selected value must be high enough to never fall into negative.
or I could convert the temperature shown and configured to and from related peci value by running a program to map each peci value to a degree for a specific cpu.
or I could say that these configuration are stupid and instead of trying to show false number, I would simply input temperature as peci value to the critical point, like the cpu report.
if done like this, it would look like
Smart Fan Control Table (SFIV)
-40 -30 -20 -15 0 0
8% 15% 40% 80% 100% 100%
Currently I am thinking more about the third option. It is harder to understand and less user friendly, but is more accurate as it exactly like the real hardware control.
Hope I could help some people understand the different solution, this solution, understand how to hack the chip. I hope I have time to finish this software and be done with it soon enough, reading the peci documentation and the chipset and programming a software to hack the thermal control unit of my board because the bios sucks takes just too much time
