I have a Gigabyte server with 2 x AMD Milan 7763, 22 x Micron 9300 Pro 15.36TB, directly attached, Ubuntu 20.04, Linux kernel 5.4 and I am puzzled for some time of this strange behavior: under pure CPU load it goes up to 3.25 GHz on all cores, which is great. However whenever I throw a large IO load using fio tool, it throttles down to 400MHz and stays there as long as the test is running. Behavior is somewhat strange because sometimes is observed consistently, sometimes after a restart it is not. I have upgraded also to kernel version 5.11 and I have observed this behavior there also, though seemed to happen more rarely
The OS is running with following parameters: transparent_hugepage=never pcie_aspm=off nvme.io_poll=1 nvme.io_poll_delay=0 processor.max_cstate=1.
There is no sign of overheating, CPUs stay nicely around 50 degrees and during the event, power usage as reported by IPMI is minimal. I have checked and I can confirm that this is not some fluke in reporting the frequency. CPU is indeed running at 400MHz as during this time, threaded load tests do take about 8 times longer and IPMI server power stays at around 550W instead of ~1.1kW which is typical for full load. CPU governor is set to performance, however no effect. Any suggestions of what I can try?
The OS is running with following parameters: transparent_hugepage=never pcie_aspm=off nvme.io_poll=1 nvme.io_poll_delay=0 processor.max_cstate=1.
There is no sign of overheating, CPUs stay nicely around 50 degrees and during the event, power usage as reported by IPMI is minimal. I have checked and I can confirm that this is not some fluke in reporting the frequency. CPU is indeed running at 400MHz as during this time, threaded load tests do take about 8 times longer and IPMI server power stays at around 550W instead of ~1.1kW which is typical for full load. CPU governor is set to performance, however no effect. Any suggestions of what I can try?