Hi everyone, I'm looking to build the most cost-effective machine possible for CPU inference with large language models (LLMs), keeping the budget under $1000.
Since the cheapest RDIMM DDR5 is at least six times more expensive than RDIMM DDR4 2667MHz, I have no choice but to use a DDR4-based platform. I'm fully aware that Ice Lake-SP lacks AMX instructions, but 512GB of DDR5 is simply out of my budget.
From my understanding, Ice Lake-SP Xeons are the most capable Intel CPUs for DDR4. I found a few engineering sample (ES) models on eBay that fit my budget (I would need two for my setup):
Xeon 8352Y – $120
Xeon 8358 QVM8 – $120
Xeon 8360Y – $160
Xeon 8368 – $160
For those unfamiliar with CPU inference for LLMs, memory bandwidth is the biggest bottleneck. 32 cores should be enough, but since I'll also be using the machine for other purposes, including gaming, single-core performance might be even more important than the number of cores when comparing these options. I would most likely disable hyper-threading (HT) as well.
My Questions:
Are these ES CPUs generally compatible with Xeon motherboards (specifically the Tyan S7120)?
Do these ES CPUs support dual-socket configurations (this is critical)?
Do all of these CPUs have 8 memory channels?
Which one would you pick? (If they’re functionally similar, I’d choose the cheapest models.)
Do any of these CPUs have unique features that might be useful for my setup?
How do these compare to EPYC Milan CPUs with a similar core count in terms of single-core and multi-thread performance? Most benchmarks compare them against 64-core EPYC models, which isn't helpful for my case.