I've got a feeling that my Gen4 Xeon 8461V QS just got fried on X13DEI, all of a sudden started throwing critical errors after each 15-20 minutes in Windows 11 (ending with "clock watchdog timeout" BSOD) and after 5 minutes in BIOS (BIOS was just freezing), I've disconnected all possible devices (incl Sound, VGA, USB, etc.), reset bios, reinstalled windows, left only single RAM, resocketed CPU, all PSU cables, all the same..
I guess will have to ditch Sapphire Rapids platform in the end and go with regular customer grade products.
TBH partially likely my fault, was relocating MCIO-to-SFF8654 cable/board with NVME drives and CPU fan cable has halted CPU1 (stuck in blades), eventually it has overheated.
I would expect thermal event to be kicked in with automatic throttling and emergency shutdown in worst case but it feels like this safety architecture elements might not be the strongest part of Sapphire Rapids for some reasons.
Will try to use slower drives as current one het really hot for some strange reasons too.
Update:
P.S. most likely this is a false alarm, if I use SATA drive then everything works just fine (passes native Intel Processor Diagnostics tests, CPU-Z benchmarks, HWInfo64, not crashing for hours, etc.), therefore I assume there's something with NVMe drives configuration (used different ones Micron/WD and on different slots), those are rapidly getting very hot, so probably some sort of issue with system waiting for response while drive is throttling down and system reports critical device timeout (while Windows was installed on those drives) or something.
Therefore will look for proper heatsink/cooling of NVMe controllers.
I've even plugged other devices into M.2 slots like WiFi/AX210 via M-key to A/E-key adapter and M.2 USB3 adapter (originally bought to be used in 2 slow x2 lanes slots going via PCH/C741) - those devices worked without causing any troubles.
Apologies for a slight negativity towards SPR. and bringing this into Xeon/ES area.