AOC-SHG3-4M2P (4x M.2 NVMe PLX Switch) errors - BIOS settings?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

ARNiTECT

Member
Jan 14, 2020
92
7
8
I am wondering if BIOS settings on my Supermicro X11-SCA-F motherboard (BIOS 1.2), are causing problems with my AOC-SHG3-4M2P (4x NVMe M.2 PLX switch). Perhaps power saving settings?

I previously installed an AOC-SHG3-4M2P into PCIe slot 6 of my Supermicro X11-SCA-F.

I fitted 2x M.2 NVMe drives (Corsair 2TB MP510) and using ESXi 6.7U3, I passed the drives through to a VM (OmniOS/napp-it for ZFS).

The drives worked fine for a few days, even with big file transfers. But they developed problems when I pushed other resources on my system, such as concurrently running virtual desktop gaming on pass-through GPUs.

After 15mins of pushing the other resources, the OmniOS/napp-it VM would then report multiple errors on the drives, which would increase into the 100'000s and slow to a halt until I removed the drives and rebuilt the pool. There were no excessive heat issues I could see.

I repeated this issue with both drives in the AOC-SHG3-4M2P, then each drive separately.

With the NVMe drives in a standard 'PCIe to single M.2' adaptor in the same PCIe slot 6, I have had no problems (over 8 weeks of use now).
I had also tested the ram for over 24 hours without any reported problems.

After speaking with Supermicro, I then tried supplying the adaptor with power directly from the PSU as there is a socket on the adaptor, but no instruction in the manual. This - maybe coincidentally - resulted in me being able to push the system for about 6 hours without issue, but then the errors came back.

Supermicro advised me to return the AOC-SHG3-4M2P and I have since received a replacement, but not yet fitted it.

For other reasons, I also have a new X11-SCA-F motherboard, so the 2 have never met. Both previous and new motherboards work fine with the standard 'PCIe to single M.2' adaptor.

Before I just try the same tests again, I wanted to know if I should look into changing any BIOS settings?
The BIOS settings I have adjusted from default are to allow concurrent use of: GPUs, onboard iGPU and iKVM ie:
- Load Optimised Defaults
- Primary Display = PCI
- Primary PEG = Slot 4 (GPU)
- Primary PCI = Onboard
- Internal Graphics = Enabled
- Option ROM, Video = UEFI

CPU is a Xeon E2278G, 128GB memory, 1000W PSU

Would it matter if I used PCIe slot 4 or 6 on this mother board? My GPU (Quadro P4000) doesn't really fit in slot 6 due to an internal USB header.
 

sovteq

New Member
Feb 11, 2021
1
0
1
I am sorry, I cannot help you with your issue. But have a question for you. Is this card AOC-SHG3-4M2P working with 4 nvme drives with your motherboard X11-SCA-F ?
 

ARNiTECT

Member
Jan 14, 2020
92
7
8
Hi, unfortunately I haven't progressed the matter.
I still have the new AOC-SHG3-4M2P boxed up and a few spare 2TB MP510 ready to go.
a number of X11-SCA-F BIOS versions have been released and I expect OmniOS also needs an update, so fingers crossed!
I recently put together another server to take over duties while I work on the main one. I hope to get on to this in the next few weeks.
 

ARNiTECT

Member
Jan 14, 2020
92
7
8
Update:

After 18 months of using a single Corsair MP510 NVMe in a standard 'PCIe to single M.2' adaptor - without error, I finally decided to try my replacement AOC-SHG3-4M2P with my replaced X11-SCA-F motherboard.

I updated the motherboard BIOS to 1.6a and tried the AOC-SHG3-4M2P in Slot4, but no device recognised, but was ok in Slot6. I installed my existing NVMe drive (a single Corsair MP510 1920Gb) onto the AOC-SHG3-4M2P, connected the 12V power and within half a day of use (3x VMs, sync-write enabled) there were over 100,000 drive errors and the pool had crashed as degraded.

After this, I updated OmniOS from r151032 to r1501038z and napp-it from 20.06 to 21.06a5 and added a SLOG (20Gb vmdk on Optane 900p drive).

It then lasted 9 days without error; however, I didn't really push the server in this time and I was only using 1 of my 3 Corsair MP510 NVMe.
Eventually, there were many read errors.

It appears that there is an incompatibility somewhere.

I am now not passing through the NVMe drives and instead I have put full drive vmdk datastores on each of the 3x NVMe in ESXi for OmniOS to use. I have these datastores setup in Z1, I replicated all my VMs onto the pool and its been running for over 12 hours without error.
 

JoshDi

Active Member
Jun 13, 2019
246
120
43
i know with these cards in supermicro boards that they only support certain slots (even if all support bifurcation). When I spoke to supermicro about this they said its due to Lane Reversal and Polarity
 

veegee

New Member
Dec 9, 2019
6
1
3
i know with these cards in supermicro boards that they only support certain slots (even if all support bifurcation). When I spoke to supermicro about this they said its due to Lane Reversal and Polarity
Bifucation shouldn't have anything to do with it because it has a PCIe switch chip. I've been having issues with this card on specifically an HP DL380 G9. It somehow works just fine on a DL380p G8 which is older, and works fine on a consumer ASUS motherboard. However, on the DL380 G9, the LEDs on the card light up and then all of them shut off at a certain point in POST. It's almost like the card is shutting itself off for some reason.