Hello everyone,
not sure if this is the correct spot to post this, as I honestly am unsure what angle to properly approach this problem from:
I have a AMD EPYC 9654QS CPU ( Model No. 100-000000894-04 ) identical to the CPU which is being used in this thread, where a fellow user was/is attempting to overclock the very same CPU.
I run this CPU on a T2SEEP motherboard, which seems to be a slightly modified clone of the supermicro H13SSL-N
My problem is, that SOMETHING is limiting the total power draw of my CPU to 230W, no matter what I do.
I currently run Proxmox on the system but have also tried ESXi and even a straight bare-metal Win10 install, but the issue shows up in every single OS/Hypervisor, and no matter which tools I use, I end up with the same performance limiter, even when I write data directly to the relevant registers.
The weird thing is, that the AMD e-smi-tool reports that the CPU itself is not locked, as can be seen in the output of
when the system is not under load.
The relevant things to note form this output:
PowerLimit is set to 360W
PowerLimitMax is 400W
Current active frequency limit is 3700 MHz and the source of said limit is OPN Max
However, if I run
Relevant changes:
The Frequency limit is now 2276 MHz and the source is cited as "PPT Limit"
The motherboard I'm using is rated for 400W TDP and online benchmarks of the EXACT SAME CPU on the EXACT SAME motherboard such as this lead me to believe that it should be possible to achieve the full 400W TDP this CPU is designed for on my setup. During all of these tests, the CPU does not exceed 45°C in temperature, so thermal throttling is not the issue.
I have tried poking around in the BIOS settings, because it really seems to me like the motherboard is somehow limiting the CPU, but the available menus are extremely limited:



On my current setup, I have not been able to unlock any aditional options or to find anything related to eTDP TDP PPT or similar.
I have determined the UEFI to be an AMI APTIO V UEFI running the GenoaPI-SP5 1.0.0.A AGESA
I have also poked around a bit inside the UEFI files to see if there is perhaps a hidden option which I could somehow unlock or which is blocking me. I've tried using the UEFITool, iptrextractor and UEFIEditor, but as far as I can tell, all the options related to PPT, PBO and TDP are hidden but their defaults are set to "auto" or there are no real limits on them, especially none that set the PPT to 230W
Bear in mind though, that I have never modified, worked on a BIOS/UEFI and am very much stumbling in the dark here, so it is entirely possible that I overlooked something really obvious here.
The one thing I am unsure about, is my PSU. I am feeding the system with a Supermicro PWS-2k04A-1R 2000W PSU, which DOES support PMBus 1.2, and which I have connected, but it is not being properly detected by the BMC, and weirdly being mapped to PSU2 instead of PSU1:



However, the documentation of the motherboard isREALLY sparse non-existant.
There is a port specifically labeled PMBUS on the motherboard, which I have confirmed the pinout of, and which is properly connected to the PSU. Beyond that, I do not know what I could reasonably do to adress this problem, and from the BMC page it seems that the software limit is 4000W anyways, so I do not even know with certainty if fixing this issue is worth pursuing or if I am again barking up the wrong tree here. I will admit that it is suspicous as hell though, but I have not found a way yet to modify/verify if something is going on here.
The board supports Redfish and ipmi, but from what I have managed to figure out so far, none of these can be used to configure CPU parameters.
I am a long-time PC enthusiasts and know my way around computers, but servers are relatively new to me, so it bears repeating that it might be entirely possible that I am overlooking something really obvious here.
I'm sure I have forgotten to mention a bunch of other things I have tried thus far, as I have been banging my head against this for quite a while now already, but I have reached the point where I have to admit that I am in way over my head and I do not know which of the many available paths I should possibly pursue next.
Any and all input about what to try and/or which lines of troubleshooting to pursue would be greatly appreciated.
Kind Regards,
Chiliben
not sure if this is the correct spot to post this, as I honestly am unsure what angle to properly approach this problem from:
I have a AMD EPYC 9654QS CPU ( Model No. 100-000000894-04 ) identical to the CPU which is being used in this thread, where a fellow user was/is attempting to overclock the very same CPU.
I run this CPU on a T2SEEP motherboard, which seems to be a slightly modified clone of the supermicro H13SSL-N
My problem is, that SOMETHING is limiting the total power draw of my CPU to 230W, no matter what I do.
I currently run Proxmox on the system but have also tried ESXi and even a straight bare-metal Win10 install, but the issue shows up in every single OS/Hypervisor, and no matter which tools I use, I end up with the same performance limiter, even when I write data directly to the relevant registers.
The weird thing is, that the AMD e-smi-tool reports that the CPU itself is not locked, as can be seen in the output of
Code:
====================== EPYC System Management Interface ======================
--------------------------------------
| CPU Family | 0x19 (25 ) |
| CPU Model | 0x11 (17 ) |
| NR_CPUS | 192 |
| NR_SOCKETS | 1 |
| THREADS PER CORE | 2 (SMT ON) |
--------------------------------------
-----------------------------------------------------
| Sensor Name | Socket 0 |
-----------------------------------------------------
| Energy (K Joules) | NA (Err: 1 ) |
| Power (Watts) | 122.919 |
| PowerLimit (Watts) | 360.000 |
| PowerLimitMax (Watts) | 400.000 |
| C0 Residency (%) | 0 |
| DDR Bandwidth | |
| DDR Max BW (GB/s) | 154 |
| DDR Utilized BW (GB/s) | 0 |
| DDR Utilized Percent(%) | 0 |
| Current Active Freq limit | |
| Freq limit (MHz) | 3700 |
| Freq limit source | Refer below[*0] |
| Socket frequency range | |
| Fmax (MHz) | 3700 |
| Fmin (MHz) | 400 |
-----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
Failed: to get CPU energies, Err[1]: Energy driver not present
-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
| CPU boostlimit in MHz: |
| cpu [ 0] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 16] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 32] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 48] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 64] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 80] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
| CPU core clock in MHz: |
| cpu [ 0] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 16] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 32] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 48] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 64] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 80] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
-----------------------------------------------------------------------------------------------------------------
*0 Frequency limit source names:
OPN Max
Err[1]: Energy driver not present
============================= End of EPYC SMI Log ============================
The relevant things to note form this output:
PowerLimit is set to 360W
PowerLimitMax is 400W
Current active frequency limit is 3700 MHz and the source of said limit is OPN Max
However, if I run
stress --cpu 192
, the output from the e-smi-tool changes:====================== EPYC System Management Interface ======================
--------------------------------------
| CPU Family | 0x19 (25 ) |
| CPU Model | 0x11 (17 ) |
| NR_CPUS | 192 |
| NR_SOCKETS | 1 |
| THREADS PER CORE | 2 (SMT ON) |
--------------------------------------
-----------------------------------------------------
| Sensor Name | Socket 0 |
-----------------------------------------------------
| Energy (K Joules) | NA (Err: 1 ) |
| Power (Watts) | 226.282 |
| PowerLimit (Watts) | 360.000 |
| PowerLimitMax (Watts) | 400.000 |
| C0 Residency (%) | 100 |
| DDR Bandwidth | |
| DDR Max BW (GB/s) | 154 |
| DDR Utilized BW (GB/s) | 0 |
| DDR Utilized Percent(%) | 0 |
| Current Active Freq limit | |
| Freq limit (MHz) | 2276 |
| Freq limit source | Refer below[*0] |
| Socket frequency range | |
| Fmax (MHz) | 3700 |
| Fmin (MHz) | 400 |
-----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
Failed: to get CPU energies, Err[1]: Energy driver not present
-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
| CPU boostlimit in MHz: |
| cpu [ 0] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 16] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 32] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 48] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 64] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 80] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
| CPU core clock in MHz: |
| cpu [ 0] : 2250 2250 2275 2275 2275 2275 2275 2275 2275 2275 2275 2250 2250 2250 2250 2250 |
| cpu [ 16] : 2275 2275 2275 2275 2275 2275 2275 2275 2275 2250 2250 2250 2250 2275 2275 2275 |
| cpu [ 32] : 2275 2275 2275 2275 2275 2275 2275 2250 2250 2250 2250 2275 2275 2275 2275 2275 |
| cpu [ 48] : 2250 2250 2250 2250 2250 2275 2275 2275 2275 2275 2275 2275 2275 2275 2275 2250 |
| cpu [ 64] : 2250 2250 2250 2250 2275 2275 2275 2275 2275 2275 2275 2275 2275 2250 2250 2250 |
| cpu [ 80] : 2250 2250 2275 2275 2275 2275 2275 2275 2275 2275 2275 2275 2250 2250 2250 2250 |
-----------------------------------------------------------------------------------------------------------------
*0 Frequency limit source names:
PPT Limit
Err[1]: Energy driver not present
============================= End of EPYC SMI Log ============================
--------------------------------------
| CPU Family | 0x19 (25 ) |
| CPU Model | 0x11 (17 ) |
| NR_CPUS | 192 |
| NR_SOCKETS | 1 |
| THREADS PER CORE | 2 (SMT ON) |
--------------------------------------
-----------------------------------------------------
| Sensor Name | Socket 0 |
-----------------------------------------------------
| Energy (K Joules) | NA (Err: 1 ) |
| Power (Watts) | 226.282 |
| PowerLimit (Watts) | 360.000 |
| PowerLimitMax (Watts) | 400.000 |
| C0 Residency (%) | 100 |
| DDR Bandwidth | |
| DDR Max BW (GB/s) | 154 |
| DDR Utilized BW (GB/s) | 0 |
| DDR Utilized Percent(%) | 0 |
| Current Active Freq limit | |
| Freq limit (MHz) | 2276 |
| Freq limit source | Refer below[*0] |
| Socket frequency range | |
| Fmax (MHz) | 3700 |
| Fmin (MHz) | 400 |
-----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
Failed: to get CPU energies, Err[1]: Energy driver not present
-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
| CPU boostlimit in MHz: |
| cpu [ 0] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 16] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 32] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 48] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 64] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
| cpu [ 80] : 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 |
-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
| CPU core clock in MHz: |
| cpu [ 0] : 2250 2250 2275 2275 2275 2275 2275 2275 2275 2275 2275 2250 2250 2250 2250 2250 |
| cpu [ 16] : 2275 2275 2275 2275 2275 2275 2275 2275 2275 2250 2250 2250 2250 2275 2275 2275 |
| cpu [ 32] : 2275 2275 2275 2275 2275 2275 2275 2250 2250 2250 2250 2275 2275 2275 2275 2275 |
| cpu [ 48] : 2250 2250 2250 2250 2250 2275 2275 2275 2275 2275 2275 2275 2275 2275 2275 2250 |
| cpu [ 64] : 2250 2250 2250 2250 2275 2275 2275 2275 2275 2275 2275 2275 2275 2250 2250 2250 |
| cpu [ 80] : 2250 2250 2275 2275 2275 2275 2275 2275 2275 2275 2275 2275 2250 2250 2250 2250 |
-----------------------------------------------------------------------------------------------------------------
*0 Frequency limit source names:
PPT Limit
Err[1]: Energy driver not present
============================= End of EPYC SMI Log ============================
Relevant changes:
The Frequency limit is now 2276 MHz and the source is cited as "PPT Limit"
The motherboard I'm using is rated for 400W TDP and online benchmarks of the EXACT SAME CPU on the EXACT SAME motherboard such as this lead me to believe that it should be possible to achieve the full 400W TDP this CPU is designed for on my setup. During all of these tests, the CPU does not exceed 45°C in temperature, so thermal throttling is not the issue.
I have tried poking around in the BIOS settings, because it really seems to me like the motherboard is somehow limiting the CPU, but the available menus are extremely limited:



On my current setup, I have not been able to unlock any aditional options or to find anything related to eTDP TDP PPT or similar.
I have determined the UEFI to be an AMI APTIO V UEFI running the GenoaPI-SP5 1.0.0.A AGESA
I have also poked around a bit inside the UEFI files to see if there is perhaps a hidden option which I could somehow unlock or which is blocking me. I've tried using the UEFITool, iptrextractor and UEFIEditor, but as far as I can tell, all the options related to PPT, PBO and TDP are hidden but their defaults are set to "auto" or there are no real limits on them, especially none that set the PPT to 230W
Bear in mind though, that I have never modified, worked on a BIOS/UEFI and am very much stumbling in the dark here, so it is entirely possible that I overlooked something really obvious here.
The one thing I am unsure about, is my PSU. I am feeding the system with a Supermicro PWS-2k04A-1R 2000W PSU, which DOES support PMBus 1.2, and which I have connected, but it is not being properly detected by the BMC, and weirdly being mapped to PSU2 instead of PSU1:



However, the documentation of the motherboard is
There is a port specifically labeled PMBUS on the motherboard, which I have confirmed the pinout of, and which is properly connected to the PSU. Beyond that, I do not know what I could reasonably do to adress this problem, and from the BMC page it seems that the software limit is 4000W anyways, so I do not even know with certainty if fixing this issue is worth pursuing or if I am again barking up the wrong tree here. I will admit that it is suspicous as hell though, but I have not found a way yet to modify/verify if something is going on here.
The board supports Redfish and ipmi, but from what I have managed to figure out so far, none of these can be used to configure CPU parameters.
I am a long-time PC enthusiasts and know my way around computers, but servers are relatively new to me, so it bears repeating that it might be entirely possible that I am overlooking something really obvious here.
I'm sure I have forgotten to mention a bunch of other things I have tried thus far, as I have been banging my head against this for quite a while now already, but I have reached the point where I have to admit that I am in way over my head and I do not know which of the many available paths I should possibly pursue next.
Any and all input about what to try and/or which lines of troubleshooting to pursue would be greatly appreciated.
Kind Regards,
Chiliben