GPUs don't seem to be mentioned in the Gen 10 Plus Ultimate Customization Guide or anywhere else on the interwebs. Maybe the usual option is go with a CPU with an iGPU?
In any case, I thought I would try out a dedicated GPU. HPE does offer a branded version of a Radeon WX 2100, but it is old, slow, and very hard to find. I picked up a Radeon Pro W6400 which was best-performing modern GPU I could see at 50W or under to stay within the power budget and dimensional limitations. I crossed my fingers and hoped for the best. Surprise! It didn't work! Using the latest firmware, I get "Uncorrectable PCI Express Error Detected. Slot 1 (Segment 0x0, Bus 0x9, Device 0x0, Function 0x0). Uncorrectable Error Status: 0x100000" in ILO and in the console:
kernel:[ 31.913091] Uhhuh. NMI received for unknown reason 24 on CPU 0.
kernel:[ 31.913093] Do you have a strange power saving mode enabled?
kernel:[ 31.913094] Dazed and confused, but trying to continue
Further searching indicated that at least one person had fixed this by disabling C-states in the BIOS. I don't know if that's possible on this server, so I switched to "static high performance" power regulation. Same result, same errors.
Does anyone have any suggestions or is this just a quixotic endeavor on unsupported hardware in a locked-down system?
In any case, I thought I would try out a dedicated GPU. HPE does offer a branded version of a Radeon WX 2100, but it is old, slow, and very hard to find. I picked up a Radeon Pro W6400 which was best-performing modern GPU I could see at 50W or under to stay within the power budget and dimensional limitations. I crossed my fingers and hoped for the best. Surprise! It didn't work! Using the latest firmware, I get "Uncorrectable PCI Express Error Detected. Slot 1 (Segment 0x0, Bus 0x9, Device 0x0, Function 0x0). Uncorrectable Error Status: 0x100000" in ILO and in the console:
kernel:[ 31.913091] Uhhuh. NMI received for unknown reason 24 on CPU 0.
kernel:[ 31.913093] Do you have a strange power saving mode enabled?
kernel:[ 31.913094] Dazed and confused, but trying to continue
Further searching indicated that at least one person had fixed this by disabling C-states in the BIOS. I don't know if that's possible on this server, so I switched to "static high performance" power regulation. Same result, same errors.
Does anyone have any suggestions or is this just a quixotic endeavor on unsupported hardware in a locked-down system?