Bottleneck on PSU?

Damo

Member
Sep 7, 2022
41
4
8
Server sits idle at about 160w

When I begin running the following inside a VPS (ZFS) power goes up to 300w

Code:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=300G --readwrite=randrw --rwmixread=75
Then on the main node I run

Code:
stress-ng --cpu 128 --io 2 --vm 1 --vm-bytes 470G --timeout 1h --metrics-brief
It seems to never go above 515w


even once CPU/RAM/Disk is running at/near 100%



The original PSU was only 500w on the Supermicro CSE-826 I upgraded that to two PWS-920p-SQ does this sound about right to you guys? I was expecting more and the fact it's near 500 makes me think there maybe a bottleneck somewhere. IPMI is showing no censor errors.

Specs:
Supermicro H12DSI-N6 Motherboard
2x AMD EPYC 7542
512GB RAM (16x32G)
 

Damo

Member
Sep 7, 2022
41
4
8
What does lm_sensors say? You could be running into PB2 limits.
Is this what you want me to run?

Code:
root@prox1:~/Tests# sensors
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +66.5°C
Tccd1:        +66.5°C
Tccd3:        +66.0°C
Tccd5:        +66.2°C
Tccd7:        +67.2°C

nvme-pci-6200
Adapter: PCI adapter
Composite:    +38.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +38.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +45.9°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-a100
Adapter: PCI adapter
Composite:    +42.9°C  (low  = -273.1°C, high = +85.8°C)
                       (crit = +86.8°C)
Sensor 1:     +42.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +57.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 3:     +72.8°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-2200
Adapter: PCI adapter
Composite:    +40.9°C  (low  = -273.1°C, high = +85.8°C)
                       (crit = +86.8°C)
Sensor 1:     +40.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +50.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 3:     +67.8°C  (low  = -273.1°C, high = +65261.8°C)

k10temp-pci-00cb
Adapter: PCI adapter
Tctl:         +57.9°C
Tccd1:        +58.0°C
Tccd3:        +57.5°C
Tccd5:        +56.0°C
Tccd7:        +54.8°C

nvme-pci-6100
Adapter: PCI adapter
Composite:    +40.9°C  (low  = -273.1°C, high = +82.8°C)
                       (crit = +84.8°C)
Sensor 1:     +40.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +56.9°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-a200
Adapter: PCI adapter
Composite:    +39.9°C  (low  = -273.1°C, high = +85.8°C)
                       (crit = +86.8°C)
Sensor 1:     +39.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +54.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 3:     +69.8°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-2100
Adapter: PCI adapter
Composite:    +35.9°C  (low  = -273.1°C, high = +85.8°C)
                       (crit = +86.8°C)
Sensor 1:     +35.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +45.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 3:     +60.9°C  (low  = -273.1°C, high = +65261.8°C)
 

Wasmachineman_NL

Dell Precisions FTW!
Aug 7, 2019
1,383
449
83
Is this what you want me to run?

Code:
root@prox1:~/Tests# sensors
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +66.5°C
Tccd1:        +66.5°C
Tccd3:        +66.0°C
Tccd5:        +66.2°C
Tccd7:        +67.2°C

nvme-pci-6200
Adapter: PCI adapter
Composite:    +38.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +38.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +45.9°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-a100
Adapter: PCI adapter
Composite:    +42.9°C  (low  = -273.1°C, high = +85.8°C)
                       (crit = +86.8°C)
Sensor 1:     +42.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +57.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 3:     +72.8°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-2200
Adapter: PCI adapter
Composite:    +40.9°C  (low  = -273.1°C, high = +85.8°C)
                       (crit = +86.8°C)
Sensor 1:     +40.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +50.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 3:     +67.8°C  (low  = -273.1°C, high = +65261.8°C)

k10temp-pci-00cb
Adapter: PCI adapter
Tctl:         +57.9°C
Tccd1:        +58.0°C
Tccd3:        +57.5°C
Tccd5:        +56.0°C
Tccd7:        +54.8°C

nvme-pci-6100
Adapter: PCI adapter
Composite:    +40.9°C  (low  = -273.1°C, high = +82.8°C)
                       (crit = +84.8°C)
Sensor 1:     +40.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +56.9°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-a200
Adapter: PCI adapter
Composite:    +39.9°C  (low  = -273.1°C, high = +85.8°C)
                       (crit = +86.8°C)
Sensor 1:     +39.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +54.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 3:     +69.8°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-2100
Adapter: PCI adapter
Composite:    +35.9°C  (low  = -273.1°C, high = +85.8°C)
                       (crit = +86.8°C)
Sensor 1:     +35.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +45.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 3:     +60.9°C  (low  = -273.1°C, high = +65261.8°C)
Temps look fine so it's not thermal throttling. I don't know of a way to check CPU power limits/usage under Linux so I can't help you with that.
 
  • Like
Reactions: Damo

Damo

Member
Sep 7, 2022
41
4
8
Temps look fine so it's not thermal throttling. I don't know of a way to check CPU power limits/usage under Linux so I can't help you with that.
Based off the specs above what wattage should I be looking at? I was expecting closer to 650
 

i386

Well-Known Member
Mar 18, 2016
3,370
1,122
113
33
Germany
I'm not sure what op is asking...
If the psu is faulty your system would probably crash the seccond you run any stress test on the cpus.
If you want to see max. power consumption go to the bios, disable everything related to power saving, sett everything to performance mode, set the fans to run at 100% in ipmi while running somehing like prime95
 

Wasmachineman_NL

Dell Precisions FTW!
Aug 7, 2019
1,383
449
83
I'm not sure what op is asking...
If the psu is faulty your system would probably crash the seccond you run any stress test on the cpus.
If you want to see max. power consumption go to the bios, disable everything related to power saving, sett everything to performance mode, set the fans to run at 100% in ipmi while running somehing like prime95
OP runs Linux.
 

Damo

Member
Sep 7, 2022
41
4
8
Okay, use prime 95 for linux :D
Started it now but it seems to use less.

Doing what I did before and setting fans to Full Speed has increased wattage to 541 so I guess my worries are sorted? What do you guys think.