SXM2 over PCIe

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gsrcrxsi

Active Member
Dec 12, 2018
367
121
43
What are the odds of getting a similar setup working with this guy? Would love some guidance on what to look for in these enterprise SXM2 standalone boards. Sounds like straight to CPU PCIe x16 with occulink adapter card(s)/cables and the SXM2 GPU? How do I make sure it's not tied to the OEM like OP mentioned?

Thanks to anyone that can help me out and for making this thread in general!

Dell HRGXG
From what I’ve read previously, these boards will only work with the accompanying Dell system. Hardware locked in some way.

someone please correct me if I’m wrong. I’d love to be wrong.
 

bayleyw

Active Member
Jan 8, 2014
332
108
43
The Dell boards are Dell locked. Also, they are rather expensive, so you might as well just put the $400 bucks down on a C4130 and have an officially supported system.
Once you put down $999 for the baseboard, $800 for the GPUs, and $700 for the waterblocks you're getting pretty close to a 4x PCIe V100 system. The original projects in this post come from when there were a bunch of AOM-SXMV available from China for about $200.
 

Leiko

New Member
Aug 15, 2021
14
0
1
The Dell boards are Dell locked. Also, they are rather expensive, so you might as well just put the $400 bucks down on a C4130 and have an officially supported system.
Once you put down $999 for the baseboard, $800 for the GPUs, and $700 for the waterblocks you're getting pretty close to a 4x PCIe V100 system. The original projects in this post come from when there were a bunch of AOM-SXMV available from China for about $200.
it's still a thing isn't it ?
 

gsrcrxsi

Active Member
Dec 12, 2018
367
121
43
Right now I have no case. It’s just sitting on a box.

but I did buy two custom enclosures from China. Made specifically for this board. Comes with fans and a 1400W PSU.
 

d450d112

New Member
Jun 30, 2024
6
0
1
Right now I have no case. It’s just sitting on a box.

but I did buy two custom enclosures from China. Made specifically for this board. Comes with fans and a 1400W PSU.
Nice! I would love to see pictures once you complete it.
 

gsrcrxsi

Active Member
Dec 12, 2018
367
121
43
browsing XianYu again last night. I came across some 8x SXM2 boards. the listing didnt say what they were from or really any information about them. they had green PCB, 4x pcie x16 slots (grouped together on front/left side), 8x SXM2 spots, and 8x slimSAS-8i or similar connectors which I assume are the main PCie outputs from the board. with lots of 8-pin power on the back edge of the board. and 4x black heatsinks over the 4x PEX9797 bridges.

some quick googling looks like these are almost identical the Microsoft HGX-1 platform circa 2018-ish? there's a STH article about that looks identical : https://www.servethehome.com/microsoft-hgx-1-at-the-ai-hardware-summit/

anyone know if these GPU boards are essentially unlocked and could be used on a normal computer like the AOM-SXMV can with the appropriate PCIe adapter cables? the seller I messaged did not reply.
 
Last edited:

d450d112

New Member
Jun 30, 2024
6
0
1
I have just completed the setup of my server, yet it is failing to detect the GPUs.

When both PCIe connections are connected, only LEDs 2 and 6 display a solid yellow light. But, when a single PCIe connection is used, all 8 LEDs turn solid yellow. Also, the heatsinks near the blue power connectors are warm to the touch but the GPUs does not change temp while powered on.

Any ideas on how to get this beast of a machine working?


EDIT: The GPUs werent screwed in all of the way. Those youtube videos about using the right screwdriver scared me :p
 
Last edited:

gsrcrxsi

Active Member
Dec 12, 2018
367
121
43
I have just completed the setup of my server, yet it is failing to detect the GPUs.

When both PCIe connections are connected, only LEDs 2 and 6 display a solid yellow light. But, when a single PCIe connection is used, all 8 LEDs turn solid yellow. Also, the heatsinks near the blue power connectors are warm to the touch but the GPUs does not change temp while powered on.

Any ideas on how to get this beast of a machine working?


EDIT: The GPUs werent screwed in all of the way. Those youtube videos about using the right screwdriver scared me :p
yeah you don’t need any crazy screwdriver IMO. Both setups I put together I just used a regular screw driver and slowly tightened all screws in order over several passes. Final tightness was just two finger tight. Once it gets snug on all 8 screws you’re pretty much golden.

so I guess you got the cables you were waiting for?
did you end up putting it in a case?
 

d450d112

New Member
Jun 30, 2024
6
0
1
These SXM2's can be nerve-wracking in the beginning.

I ended up ordering the cables from a cheaper supplier. Although RCL eventually responded, I had already canceled my order by then. In the future, I'll make sure to contact the seller first to confirm they respond and ship within a reasonable timeframe.

I haven't gotten a case yet. I might just put everything in a server rack and call it good.

I also need to figure out the cooling. I ordered Dell PowerEdge C4140 heatsinks from eBay, but I had to cut them to fit.
 

Attachments

phantasmagoria

New Member
Jul 25, 2024
3
0
1
Just came across this thread - I picked up a 4xSXM2 P100 SuperMicro server on Ebay recently under the impression I could swap out the P100s for V100s in the near future. It uses the AOM-SXM2 board rather than the AOM-SXMV board this thread talks about. Having just read this thread (and some relevant SuperMicro docs), since someone said something similar much earlier, am I correct in assuming I can't just take out the P100s and slot V100s in their place? What prevents me from doing that? Intuitively it's the same interface so any card should work.

But if it is the case that AOM-SXM2 is incompatible with V100s, would the same AOM-SXM-BRG module allow me to grab a AOM-SXMV board and swap out the current AOM-SXM2 board? Hell, could I Frankenstein an enclosure above the server, hook up the AOM-SXMV into one of the open PCIe risers, and run 8x GPUs? (lol)

My uses will mostly be LLM training and a touch of inference, both at fp16/32 - what performance decrease over V100s can I expect, considering P100s lack tensor cores? Beginning to wonder whether it'd be worth the hassle as prices begin to drop - V100s are going for $185 now and this will only drop as more Blackwell products ship.

It's my first foray into used enterprise hardware for self-hosting. Thanks in advance for putting up with the questions. :)
 

CyklonDX

Well-Known Member
Nov 8, 2022
1,219
420
83
am I correct in assuming I can't just take out the P100s and slot V100s in their place
Thats correct.


would the same AOM-SXM-BRG module allow me to grab a AOM-SXMV board and swap out the current AOM-SXM2 board
yes, you can swap out the aom-sxm2 for aom-sxmv board without any issue. The bridges are the same.
I would only worry about the aom-sxmv lining up with bridges on that server. (by the looks - i'd say 99% they have same layout)
 

gsrcrxsi

Active Member
Dec 12, 2018
367
121
43
What prevents me from doing that? Intuitively it's the same interface so any card should work.
probably the board firmware. Or some hardware incompatibility. I think the V100s have different nvlink capabilities/configuration which can’t be reconfigured between P100/V100.

it would sure be nice if someone found a workaround (something like a resistor mod or similar?) to run V100s on the AOM-SXM2 boards, even if nvlink would be disabled. I do primarily HPC that only uses the GPUs individually so I don’t need it. And all the Chinese sellers that are clutching the SXMV boards are more than happy to sell you a cheap bare SXM2 board.
 

gsrcrxsi

Active Member
Dec 12, 2018
367
121
43
I got my Chinese 3U case for the AOM-SXMV. I'm pretty happy with it overall. It pairs nicely with the 3U size heatsinks I have. the fans it comes with are 120x38mm 4500rpm fans, with a manual knob to adjust fan speed. at full speed it's very tolerable. and keeps all GPUs ~50C.

I did have to flip the fans around since they had the intake on the PCIe side, and the exhaust on the PSU AC-in side which made no sense to me. AC in on the front is better than PCIe out on the front.

ignore the PSU on top. the included PSU is 1400W 240v only, and i didnt have a 240v outlet on my test bench. but it's in the server rack now running fine on 240v with the included PSU. they also have an option with a dual/redundant 1600W PSU option. but the case was like 2x the price.

build quality seems as good as any other server case IMO. everything fits the way it should. though there are no provisions for server rack rails, so it'll have to sit on a shelf. but it is just the right size width-wise to fit in a normal 19" rack.

Edit. forgot the price. price including shipping was about $190 for the case with PSU.

 
Last edited:
  • Like
Reactions: CyklonDX

d450d112

New Member
Jun 30, 2024
6
0
1
I got my Chinese 3U case for the AOM-SXMV. I'm pretty happy with it overall. It pairs nicely with the 3U size heatsinks I have. the fans it comes with are 120x38mm 4500rpm fans, with a manual knob to adjust fan speed. at full speed it's very tolerable. and keeps all GPUs ~50C.

I did have to flip the fans around since they had the intake on the PCIe side, and the exhaust on the PSU AC-in side which made no sense to me. AC in on the front is better than PCIe out on the front.

ignore the PSU on top. the included PSU is 1400W 240v only, and i didnt have a 240v outlet on my test bench. but it's in the server rack now running fine on 240v with the included PSU. they also have an option with a dual/redundant 1600W PSU option. but the case was like 2x the price.

build quality seems as good as any other server case IMO. everything fits the way it should. though there are no provisions for server rack rails, so it'll have to sit on a shelf. but it is just the right size width-wise to fit in a normal 19" rack.

That looks great! After checking this out, I’m considering installing the AOM-SMVX board in my 2U server chassis since my heatsinks are lower profile.

By the way, what's your idle power consumption like? Mine ranges from 45-53W per GPU.
 

gsrcrxsi

Active Member
Dec 12, 2018
367
121
43
That looks great! After checking this out, I’m considering installing the AOM-SMVX board in my 2U server chassis since my heatsinks are lower profile.

By the way, what's your idle power consumption like? Mine ranges from 45-53W per GPU.
I would recommend getting at least the 2U copper colored heatsinks. the heatsinks you have now (pictured before) are 1U, but the airflow direction is transverse to the airflow in a normal chassis, because they are meant for 90deg perpendicular installed position relative to how it's installed on the AOM-SXMV. unless you put the board in sideways, it will likely have bad airflow.

I'm not very concerned with idle power consumption since I run it 24/7 for the most part. but from what i remember, the board and GPUs were idling around 140-150W from the wall depending on fan speed i set. 40-50 per GPU at idle sounds normal-ish. it could be a little elevated if the GPUs aren't dropping down to P8 state at idle.
 
  • Like
Reactions: d450d112