ESXi crash with AOC-SLG3-2E4 and Intel DC P3600 Passthrough

K D

Well-Known Member
Dec 24, 2016
1,431
309
83
30041
Specs
  • Motherboard : Inte lX11SPL-F
  • CPU : Xeon Gold 6134
  • RAM : 64GB PC2400
  • AIC 1 - ConnectX-3 40Gb adapter
  • AIC 2 - Intel DC P3700 400GB
  • AIC 3 - AOC-SLG3-2E4
This is in a Supermicro 826 chassis with a BPN-SAS3826A-N4 backplane.
2 of the nvme ports are connected to the AOC-SLG3-2E4 using supermicro cables.
ESXi recognizes the 2 P3600 drives.

The issue is that it randomly PSODs once passthrough is enabled for the 2 P3600s.
I was able to install omnios and napp-it. The moment I clicked on the disks menu, it PSOD'd
Tried to install FreeNAS and during the phase of the installer where it detects disks, it PSOD'd

Any help would be appreciated.

1.jpg
 

Rand__

Well-Known Member
Mar 6, 2014
4,577
911
113
I have one but use it for vsan, so have not tried passing it through. Never had an issue like this...
 

K D

Well-Known Member
Dec 24, 2016
1,431
309
83
30041
I had tried swapping out the SSDs with Intel 750 1.2 TB as well as used a different AOC-SLG3-2E4 card with the same results.

I had enabled pass-through immediately after installing these. I'll disable pass through and see how it fares.
 

nitrobass24

Moderator
Dec 26, 2010
1,083
127
63
TX
Make sure that ESX is not using the P3600 as a local read cache, syslog storage, etc.

If it’s in use by the host and you yank it into the VM that could be problematic. Especially if you are seeing this across multiple devices.

Also make sure you have the latest updates to 6.5


Sent from my iPhone using Tapatalk
 

K D

Well-Known Member
Dec 24, 2016
1,431
309
83
30041
vSphere release/ESXi version?
It is on ESXi 6.5.0d Build 5310538

Make sure that ESX is not using the P3600 as a local read cache, syslog storage, etc.
It is not being used for any other function. These are empty drives. The 750s are brand new drives that I plugged in here for the first time.

I just downloaded 6.5.0 U1 again from VMWare. Going to wipe all drives and perform a clean install now.
 

K D

Well-Known Member
Dec 24, 2016
1,431
309
83
30041
OK...
  1. Fresh Install of ESXI 6.5.0 U1 (Build 5969303).
  2. Latest version of Napp-it downloaded and installed.
  3. In Omni-OS -> pkg update (nothing to update). In Napp-It updated to v17.06free
  4. Enabled passthrough for 2 intel 750 SSDs.
Able to see them in Napp-IT. No PSODs in the past hour. But unable to create a mirror of the drives in napp-it (see error below). Letting the system run for tonight to verify if it hasn't PSOD'd by morning.


2017-10-22 01_02_31-napp-it-san __ ZFS appliance.png
 

vrod

Active Member
Jan 18, 2015
233
33
28
28
Do you have a possibility to passthrough the AOC device as well? If so, I would do this. It might also be good to install the Intel ESXi NVMe driver.
 

K D

Well-Known Member
Dec 24, 2016
1,431
309
83
30041
I will install the Intel drivers and see if the behavior changes. But for pass through, it shouldn't matter. I did have an error when trying to install vsan (tried the option to bootstrap vsan via the installer itself). I had to leave for an appointment and didn't have time to see what the error was. Will try it again tonight.
 

vrod

Active Member
Jan 18, 2015
233
33
28
28
The question is just if the AOC is fully compatible with ESXi... I am not so sure and therefore I would still recommend to passthrough that one as well. At least test it and see if it makes a difference
 

K D

Well-Known Member
Dec 24, 2016
1,431
309
83
30041
Installed VCenter Server and tried creating a VSAN cluster. Got an error that it couldnt create a partition. I took a screenshot but looks like I deleted it by mistake so unable to post the error here. I will try to recreate the error and post here.

Also, could I be having issues because I am connecting the AOC-SLG3-2E4 to a BPN-SAS3826A-N4 backplane and not directly to the SSD ?
 

whitey

Moderator
Jun 30, 2014
2,770
866
113
38
Installed VCenter Server and tried creating a VSAN cluster. Got an error that it couldnt create a partition. I took a screenshot but looks like I deleted it by mistake so unable to post the error here. I will try to recreate the error and post here.

Also, could I be having issues because I am connecting the AOC-SLG3-2E4 to a BPN-SAS3826A-N4 backplane and not directly to the SSD ?
You sir may be in need of some 'partedUtil' magic from ESXi cli to clear those partitions, thought newer vSAN releases had a pretty 'splodie' button for that these days though but maybe I am wrong.
 

Rand__

Well-Known Member
Mar 6, 2014
4,577
911
113
vdq -qH iirc will show you if it has partitions which you can remove by partedUtil
 

K D

Well-Known Member
Dec 24, 2016
1,431
309
83
30041
vdq -qH shows no issues with the 3 intel drives and lists them as eligible for use by VSAN.

I'll try this again on my second node which the exact same configuration and if it still gives issues, I'll try direct attaching these drives instead of via the backplane.


2017-10-26 00_17_16-172.16.10.71 - PuTTY.png
 

vrod

Active Member
Jan 18, 2015
233
33
28
28
If I were you I would still check if they had a gpt label. That can also cause issues.