thought maybe there was something "wrong" with the platform
From memory...
I did a Linux kernel compile benchmark with my trash 1 CPU Cascade Lake 8259CL system recently and found that on an updated Arch Linux you lose around 10% performance going from mitigations=off to turning everything back on as the kernel sees fit. That number might be alot worse with old toolchains, which do not emit machine instructions geared towards performance mitigation. In this regard "alot" is wrong with the CPU. But at the current price point, even a 400 HP car only delivering 360 HP finds a buyer. Newer CPUs have other defects, which do not apply to Cascade Lakes surprisingly.
The Lewisburg chipset is a space heater, easy 5-10 watts gone just with that. Might be the quad 10 Gbps 722 NICs in this, sipping power for nothing even on quad 1 Gbps boards, or disabled power saving, or something else. Should be cooled separately, just like DDR4 DIMMs or SAS cards or your Mellanox NIC.
An "empty" 3647 system will idle with proper settings at around 50-60 watts. That is alot more than some 10-20 watts people like Wolfgang of Wolfgang's YT channel get with low power desktop CPUs or desktop Xeons.
Linux kernel and rasdaemon instrumentation that tells you if something is wrong is very stable and nicely chatty compared to desktop CPUs. I like hardware with diagnostics so I can make an educated guess and be right the first time when swapping a component.
By now most functional bugs are known and worked around in software or microcode. This must have been thousands of engineer hours while this chip was in use at hyperscalers or supercomputing clusters.
I trust the Intel 14nm++ process to be more reliable than all their newer nodes.
Alot of PCIe lanes and slots. I always run out with desktop chips.
No stupid license keys to enable silicon accelerators. Alot more engineering and whole-system support like in IBM System z has to happen, before such license schemes start to make sense for the customer. And who uses these accelerators really? General purpose x64 code will also run on EPYC and not be locked-in to some Intel Xeon generation. And if you really, really need something for massively parallel compute, go on ebay and pick up a GPU for 200 bucks.