Pcie switch solution with non Xeon CPU based platforms?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Frank173

Member
Feb 14, 2018
75
9
8
I have a predicament in that I need a multi GPU compute server of which I found a number at Supermicro. I look for a 4U GPU server that can hold around 8-10 double width GPU cards which are communicating via single root complex pcie switch solutions.

At the same time I also need a pretty high performant CPU. I look at a CPU or pair of CPUs that run at least at 3.5 GHrz all core and have around 10 cores each or more. I have not found such among any of the Xeon lineups. I was extremely excited when I saw the workstation Xeon CPUs but quickly found out that they require a different chipset.

Long story short, I wonder whether there are solutions where the main board and CPU could be a consumer rather than enterprise board and where 1 or multiple x16 pcie lanes connect via PLX switches with the GPUs. In essence that is what gpu compute servers do. I find nothing magical about it and wonder whether pcie switches and pcie slots can also be purchased separately. I don't mind ordering a custom enclosure but my main question is whether pcie switches and slots on a board can be purchased separately.

Of course I am also open to any other suggestions or thoughts or criticism calling my idea stupid or crazy. As this is a homelab/workstation build I would not even mind working with non enterprise hardware such as non ECC memory and the like. Ideally I would like a already built solution and would not mind a dual Xeon board but I just did not find any CPUs yet that might fit my need.

Thoughts?
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,805
113
When you run these, you end up using 96/97 port PLX switches which are several hundred each. PCIe routing is also a concern since you need to ensure you have proper trace lengths. I would strongly advise using something that someone else has designed and tested.
 

Frank173

Member
Feb 14, 2018
75
9
8
I get that, I am not planning to build from scratch. Would you know whether any Skylake W based server solutions exits that allow in a 4U chassis and pcie switches to run 8-10 double width gpu compute cards? I have seen workstation solutions but they only handle max 4 cards. I see zero technical issues to implement the same technology on single CPU boards via pcie switches that run 1 or 2 x16 lanes into the CPU. I just guess the demand is too low for such...

My issue is that I want a server that can handle my ai training and inference needs but also performs well on CPU bound workloads, <3. 5G Hrz all core frequencies just don't cut it for my requirements. Am I essentially looking into two different machines?

When you run these, you end up using 96/97 port PLX switches which are several hundred each. PCIe routing is also a concern since you need to ensure you have proper trace lengths. I would strongly advise using something that someone else has designed and tested.
 
Last edited:

Frank173

Member
Feb 14, 2018
75
9
8
What the all-core boost will be remains to be seen, at least I have not seen the specs for this metric yet. Above all I have very bad experience with both the consumer and enterprise AMD processors for my specific use case which is financial modeling applications. The algorithms iterate over millions of data points and involve mathematical and vector computations. However, none of the code is optimized on my end to benefit AVX-512 features. I compared performance between the Threadripper x1950, several EPYC chips, and the i9-7900x. Having normalized the results due to different turbo frequencies, the AMD chips took almost 50% longer to run through my algorithm optimizations. That result was consistent across single core and multiple cores. I myself will not touch AMD chips ever again for this specific use case. I do see the benefits of using AMD chips in storage servers and perhaps other non-math heavy applications. Until today I could not really isolate the issue, but there was an article that I came across that stated that the new .Net 4.6 Framework on which I code many of my base architectural applications which were used to benchmark makes use of optimizations that can benefit from AVX-512. That is as close as I came to a reasonable answer to the performance differences I observed. I am not sure whether it is possible to specifically instruct AVX-512 capable CPUs via software or bios to not use AVX-512, it would be interesting to see the performance deltas then, though I do not currently have access to the EPYC and Threadripper platforms anymore.

I really wished there was Supermicro and Co who would try their hands on a Skylake-W based single CPU GPU compute server. I never understood why they all use dual CPU solutions for GPU compute servers given the most sensible way to have GPUs interact is via single root complex (speaking only of PCIe cards here). My only guess is high memory requirements which I also do not fully buy into as active AI researcher (quant finance researcher to be more precise). Most GPUs can only be loaded with 32GB max of data anyway, no GPU compute server needs to offer more than perhaps 256 or 512 GB of memory. Data Analytics and CPU bound computations are an entirely different game.

Right now yes.

Q1 2019 with the EPYC 7371 and one of the 8 GPU UP EPYC boxes may change that.

More on the 7371 tomorrow.