SXM5 (H100) over PCIE Nightmare

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gsrcrxsi

Active Member
Dec 12, 2018
423
144
43
Generally speaking, it is rare for these GPU devices to encounter software or hardware incompatibility issues. When such issues are encountered and remain unresolved for a long time, it is advisable to suspect whether the problem lies with the GPU.
i'm starting to suspect the GPU also.

was the GPU listed as working?
 

aosudh

Member
Jan 25, 2023
65
15
8
Was sold as "They have no way to treat it" lmao, should have known. Can't return, talked to ebay already. I unmounted it to look at the pins, they look good. do I need a VBIOS flash? I don't have an enterprise account.
Screenshot-20250304-231935-Gallery hosted at ImgBB

Sometimes, the problem actually stems from the malfunctioning core rather than issues with the pins or the VBIOS, etc. As you know, sometimes when merchants claim that something cannot be tested, it actually means they encountered problems during the testing process. However, if they just sell it as a defective item, the price won't be high. Therefore, they use the phrase "cannot be tested" to sell a batch of defective products at a high price.
 

MildHotSauce

New Member
Mar 7, 2023
23
8
3
This could be a limit on your motherboard size.

You can only modify bar size at post. Can you try windows?
oh and try removing 3090 *(there might be conflicts between arch)

you may want to try to disable rebar.
You know what? When I was working on those V100s, I tried using this 3090 and they would just not work together at all. I removed the 3090 and all of a sudden everything worked. I think im going to return this thread-ripper for one with onboard graphics
 
  • Like
Reactions: vv111y and CyklonDX

gsrcrxsi

Active Member
Dec 12, 2018
423
144
43
Was sold as "They have no way to treat it" lmao, should have known. Can't return, talked to ebay already. I unmounted it to look at the pins, they look good. do I need a VBIOS flash? I don't have an enterprise account.
Screenshot-20250304-231935-Gallery hosted at ImgBB

This is an earlier VBIOS. I think you will be subject to that issue. But the issue statement calls out sub revision 3. I’m not sure how you can find that value though.
 

CyklonDX

Well-Known Member
Nov 8, 2022
1,553
523
113
You know what? When I was working on those V100s, I tried using this 3090 and they would just not work together at all. I removed the 3090 and all of a sudden everything worked. I think im going to return this thread-ripper for one with onboard graphics
Great to hear - now you have something to try.
 

gsrcrxsi

Active Member
Dec 12, 2018
423
144
43
the hardest part will be tracking down the VBIOS rom file. once you have it, should be pretty easy to flash with nvflash. I would ask the seller if he can help locating one since he seems to be aware of the issues.
 

aosudh

Member
Jan 25, 2023
65
15
8
I contacted the seller, he said people had success when flashing the VBIOS, so I am looking into how i can do that.
HPE, Dell, or other manufacturers may list the latest VBIOS for their GPUs on their after-sales support pages (at least that was the case when I repaired an A100). You can try searching on different OEM venders' websites.
 

CyklonDX

Well-Known Member
Nov 8, 2022
1,553
523
113
I found something that may help you:

Seems to have a newer VBIOS file. I could download it without an account. Not sure if it's only for HPE H100's, though. And it mentions the 8-GPU. It may somehow be for the entire assembly. Feel free to correct me, if I'm wrong in any of this.
It may be hard to apply as it seems to rely on bmc redfish support to apply the patch.
*(I doubt it'll work without)

*thats same on nv side of things


Sadly my enterprise nv app hub doesn't have entitlement for firmware for h100's.
1741271695152.png
1741271774108.png
(I looked up firmware, even glossed over all 64 pages of drivers/patches etc available for download - H100/H200 not there.)


I think starting from sc x11 - most systems came in with bmc & redfish (the x10's do not) ~ redfish is "web" api on bmc
1741272223477.png

I believe if you manage to get x11 system you might be able to apply the firmware upgrade but... let me throw another block for you...
1741272331764.png

You may need to have whole tray for firmware upgrade to work.
 
Last edited:
  • Like
Reactions: Finbester

aosudh

Member
Jan 25, 2023
65
15
8
why such an old 550 version? 550.54.15 is about a year old at this point. not sure if it matters, but i would stick the latest version from that branch to avoid the possibility of edge case issues like kernel issues or anything like that.

and was this a open source version or proprietary version? i would stick to the proprietary for all attempts.
550 driver works very well on the H100 and H20. It may lack support for new features, but it is more stable.
 

gsrcrxsi

Active Member
Dec 12, 2018
423
144
43
550 driver works very well on the H100 and H20. It may lack support for new features, but it is more stable.
Yes 550 branch is fine. That wasn’t my question. I was asking why he picked such an old 550.54 driver that’s a year old. I was saying to use a more recent 550 version like 550.144
 

MildHotSauce

New Member
Mar 7, 2023
23
8
3
I'm having a hard time finding out what the original OEM manufacturer was. It has to be one of these and most of the links go to firmware. Some packages, not all have independent firmware files that maybe(?) could be flashed? OEM Firmware Downloads But like I said I still have no idea what the OEM manufacturer is. Also he said that his other customers pointed him to this https://docs.nvidia.com/dgx/dgxh100-fw-update-guide/dgxh100-fw-update-guide.pdf and that is literally all the info i got from him. TechPowerUp also doesn't have a firmware, I checked. I also input the serial here EnterpriseSupport no love.
 
Last edited:

MildHotSauce

New Member
Mar 7, 2023
23
8
3
If he gave me that as a reference to what others have done, this might be a DGX H100, i've tried dumping the rom to get more clues but it seems pretty locked down, im going to try and use that nvflashk in a live iso and see what i can do. Anyone have access to the latest H100s DGX rom files?
 

gsrcrxsi

Active Member
Dec 12, 2018
423
144
43
i wish i could be more help, this is a bad situation to be in without it working and unable to update the BIOS. if the seller isnt being much help i would explore trying to return it. I saw this kind of setup with these custom PCIe boards and the H100s and thought it might be cool in the future when prices come down, but if it's a non-starter due to a BIOS issue that can't be solved without a 100k system to do the bios update on, then I don't know.