SXM2 over PCIe

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
Hi,
With many V100 SXM2 cards popping on ebay for 400-700 USD, I believe its time to share potential solution on how to make those SXM2 cards work over PCIE in any suitable workstation/server. The Supermicro initial approach for SXM was a stand-alone board AOM-SXMV


(do note there are multiple models, best to get most simple one - as it will reduce amount of oculinks required.)

This board has 4 oculink connectors x8 (*you only need 2 of them plugged 2x8), and that in turn in supermicro systems is connected over normal pci-e raiser (RSC-GN2-A68) using oculink connector, and a single pcie x8 port that is connected directly from pcie lanes to cpu (like any dedicated pcie slot).



The supermicro motherboard itself has no nvlink chip, or anything special that allows for that AOM-SXMV to work unlike many other systems. AOM-SXMV has no manufacturer/mobo lock either - so it can be connected to any system. (Rest of the connectors in the picture are standard 8pin power cables.)

You would be looking for 2 of those (or a single 16i if you can find it.)

(do note, that you don't really need it to be pcie4.0, the link will work just as fine on pcie 3.0 x8, the only con is slower throughput between your system and the SXM GPU's)

The last thing would be cooling, I would recommend building some 1u small cage for the AOM-SXMV pcb, with 2x 40mm 16k-24k rpm fans for each GPU.

Regards
 
Last edited:
  • Like
Reactions: gb00s and Alfa147x

bayleyw

Active Member
Jan 8, 2014
291
95
28
Does this...actually work?! The OcuLink ports on the AOM-SXMV are indeed routed to the PLX switches, but the other end goes into the bottom slot of the riser and is designed to interface with an Infiniband card plugged into that slot. "Officially" each of the card edge connectors is a PCIe x16 routed to the pair of PLX on the carrier board that handles the GPUs.

The AOM-SXMV is actually a simple beast - each pair of GPUs are connected by x16 to a PLX which connects a single x16 to the CPUs. The NVLinks are just passive traces on the carrier which connect the GPUs to each other, there are no routers.
 

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
I haven't mounted the AOM-SXMV board to a pc/other than supermicro v4 server myself, but this is brief result of few years spent troubleshooting SYS-1029GQ-TVRT systems at work. (support package expired...)

We had ran multiple tests, and were able to use just 2 oculinks (cpu1), and connect the single pcie link not through the port by cpu, but one on the cpu1 raiser (we were also able to attach sxmv board to another but same class supermicro server that only had pcie slots for pcie gpu's (just to test if it failed or our port by cpu failed - it worked). Thus my conclusion its just standard pcie 3.0 link x8.

1668329051985.png
(our ports by cpu on those systems - kept on dying - which forced us to get creative - we used a long raiser cable instead.)

I cannot 100% guarantee it will work for everyone (especially since power delivery itself will be a killer/dead end for many).
If my company ever decoms one of those boxes, i'll try it - but atm bit short on $$$ to buy parts and doing it myself.
(so i'm throwing a ball to other people)
 
  • Like
Reactions: gb00s

svperbia

New Member
Jan 14, 2023
1
0
1
Any more updates on these experiments? I'm thinking about picking up one of these boards to use with a couple of SXM2 v100 I obtained a while ago.
This could work with any SXM2 gpu, and any motherboard with enough pcie lanes to hook up 4 oculink 8i cables?
 

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
Mainly because new Mi cards about to drop sometime this year, making mi210 a quiet 'fart' in compute in comparison for same power usage. (mainly to get through the work faster, and save bucks on power bill)

Google's stadia was running on V340, V420, V520 & V620 or similar cards - Not MI2xx cards.
The MI2xx are only compute cards without any media accel functions so no games would actually run on them)
 

gsrcrxsi

Active Member
Dec 12, 2018
291
96
28
No, not all sxm2 gpu's. Only V100 SXM2 (2 cards only)
what do you mean by 2 cards only. that there are only 2 models of V100 (16 and 32GB) that are compatible? or that the 4x GPU board only works with 2x GPUs installed?
 

gsrcrxsi

Active Member
Dec 12, 2018
291
96
28
The driver situation for the MI210 cards still seems ambiguous to me. can you elaborate on it? there is no driver download from AMD and as far as I can tell it's not supported by ROCm either. am I wrong about that?

I remember seeing a youtube video not too long ago about someone trying to use an older MI25 card, but the lack of driver support prevented them from doing much of anything with the card. I'm worried it would be the same situation with the MI210
 

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
what do you mean by 2 cards only. that there are only 2 models of V100 (16 and 32GB) that are compatible? or that the 4x GPU board only works with 2x GPUs installed?
Yes only 2 models of V100 16GB and 32GB are specifically marked as SXM2.

The driver situation for the MI210 cards still seems ambiguous to me. can you elaborate on it? there is no driver download from AMD and as far as I can tell it's not supported by ROCm either. am I wrong about that?

I remember seeing a youtube video not too long ago about someone trying to use an older MI25 card, but the lack of driver support prevented them from doing much of anything with the card. I'm worried it would be the same situation with the MI210
There are drivers for vmware.
There's little support outside vmware esxi 7-8


Craft computing tried mi25 with kvm to get sriov from it.
That card's sriov drivers only work on specific version of vmware, and those drivers are present for vmware.
There are only few cards that drivers are harder to get as they aren't listed - but can be found on different (same arch gpu's) and they are common drivers for that group - so its bit of a mess to navigate - especially if you are looking for the sriov master drivers, and guests for like V320-V620 gpu's. (as those cards were the ones designed for sriov guests.) (Not most mi cards - mi are meant only for compute)
 

gsrcrxsi

Active Member
Dec 12, 2018
291
96
28
well i mean, if you want to compute, on bare metal without VMs, can you do it? if so, how?

if you can't, then the cards don't seem very useful, despite the strong DP specs.
 

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
for mi25? you can compute anytime ... there are normal drivers available for linux, and common pro driver on windows.

if you got mi210 or better you would want to use sriov drivers, and run on vmware anyway - as its not pretty division like with nv vgpu profiles.
but if i recall there are some common rocm tools for mi cards for normal linux as well, and i don't believe you need to install any driver for it once rocm stack is set correctly. There's even tool rocm-smi (just like nvidia-smi).


1675038873715.png
 
Last edited:

gsrcrxsi

Active Member
Dec 12, 2018
291
96
28
I’m only talking about the MI210 cards. I do see drivers available through MI100. But not the 210.

if I had one of those cards personally I would not be doing any kind of virtualization or SRIOV. I’d want the whole card to do raw compute and use it bare metal. That’s why I’m asking about the drivers. If it’s going to be a viable alternative to the V100 setup for me, it needs to have compute drivers available for bare metal installation on a normal Linux OS.
 

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
the mi210 as stated you do not need drivers for compute; they are in rocm.
 

gsrcrxsi

Active Member
Dec 12, 2018
291
96
28
cool. thanks for clarifying that. that's what I was getting at. I couldn't remember if CDNA2 was supported by ROCM (it wasn't last time I played with it). at least that could get you working. now to see when these finally start showing up.

TitanV's are dropping down to the $700 range now, offer ~7TF FP64 performance, and don't need an additional cooling solution. the price will have to be right to justify the MI210, which will need some kind of cooling solution if used in a more normal setup without a server chassis.
 

line

New Member
May 13, 2023
20
1
1
No, not all sxm2 gpu's. Only V100 SXM2 (2 cards only)


The pcie lanes needs to be directly from CPU; not ones from chipset/plx.

Keep in mind Mi210 are going to be coming to ebay within next few months.
Only V100 SXM2? Can only connect two V100s? What should be connected to the other SXM2 interface then?
I want to buy this GPU motherboard to connect four V100s to my motherboard. I'm not sure if it's feasible.
 

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
If you have motherboard with enough pcie lanes you can do 4 cards.

This is the diagram.

1684001221822.png

One person already succeeded in getting this up (single side) on intel desktop motherboard/ but so far hasn't posted his results on the forums yet.
 
  • Like
Reactions: line

line

New Member
May 13, 2023
20
1
1
If you have motherboard with enough pcie lanes you can do 4 cards.

This is the diagram.

View attachment 28934

One person already succeeded in getting this up (single side) on intel desktop motherboard/ but so far hasn't posted his results on the forums yet.
Can I mix P100 and V100 on this GPU board? For example, inserting one V100 and one P100 to get 32GB of VRAM using NVLINK.