NVME devices in OmniOS - ESXI Passthrough

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

BobTB

Member
Jul 19, 2019
81
20
8
Passing through the drives (SSD and later NVME) to OmniOS in ESXI. I am using this now for 10+ years. First time I noticed this problem was with Intel Optane drives, and now more and more NVME Drives don't work, Micron u.3 7300 and 7450 don't work .. For some strange reason Samsung u.2 NVME PM983 drives work.

I don't actually know what is the difference between the working and none working drives.

No one uses NVME drives in passthrough mode on OmniOS? This was a bug 5+ years ago, anyone here knows if I am missing some setting somehere?


Interestingly all of them work on Solaris 11.4, including Intel Optane u.2 drives.

The errors I see in the logs of OmniOS (I tried a lot of versions) are mostly like below, and then the drives are "retired":

Code:
an 17 17:51:41 omnios nvme: [ID 369395 kern.warning] WARNING: nvme3: command 1/0 timeout, OPC = 6, CFS = 0
Jan 17 17:51:41 omnios nvme: [ID 369395 kern.warning] WARNING: nvme1: command 1/0 timeout, OPC = 6, CFS = 0
Jan 17 17:51:41 omnios nvme: [ID 369395 kern.warning] WARNING: nvme2: command 1/0 timeout, OPC = 6, CFS = 0
Jan 17 17:51:41 omnios nvme: [ID 369395 kern.warning] WARNING: nvme0: command 1/0 timeout, OPC = 6, CFS = 0
Jan 17 17:51:42 omnios nvme: [ID 369395 kern.warning] WARNING: nvme3: command 2/0 timeout, OPC = 8, CFS = 0
Jan 17 17:51:42 omnios nvme: [ID 988005 kern.warning] WARNING: nvme3: ABORT failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 596656 kern.warning] WARNING: nvme3: IDENTIFY failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 318795 kern.warning] WARNING: nvme3: failed to identify controller
Jan 17 17:51:42 omnios nvme: [ID 369395 kern.warning] WARNING: nvme1: command 2/0 timeout, OPC = 8, CFS = 0
Jan 17 17:51:42 omnios nvme: [ID 988005 kern.warning] WARNING: nvme1: ABORT failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 596656 kern.warning] WARNING: nvme1: IDENTIFY failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 318795 kern.warning] WARNING: nvme1: failed to identify controller
Jan 17 17:51:42 omnios nvme: [ID 369395 kern.warning] WARNING: nvme2: command 2/0 timeout, OPC = 8, CFS = 0
Jan 17 17:51:42 omnios nvme: [ID 988005 kern.warning] WARNING: nvme2: ABORT failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 596656 kern.warning] WARNING: nvme2: IDENTIFY failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 318795 kern.warning] WARNING: nvme2: failed to identify controller
Jan 17 17:51:42 omnios nvme: [ID 369395 kern.warning] WARNING: nvme0: command 2/0 timeout, OPC = 8, CFS = 0
Jan 17 17:51:42 omnios nvme: [ID 988005 kern.warning] WARNING: nvme0: ABORT failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 596656 kern.warning] WARNING: nvme0: IDENTIFY failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 318795 kern.warning] WARNING: nvme0: failed to identify controller
Jan 17 17:51:42 omnios genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@18/pci1344,4000@0 (nvme3) down
Jan 17 17:51:42 omnios genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@16/pci1344,4000@0 (nvme1) down
Jan 17 17:51:42 omnios genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@17/pci1344,4000@0 (nvme2) down
Jan 17 17:51:42 omnios genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@15/pci1344,4000@0 (nvme0) down
 
Last edited:

zachj

Active Member
Apr 17, 2019
161
106
43
Search this post for “oprom”


Not saying this is your problem but it’s my best guess.
 

BobTB

Member
Jul 19, 2019
81
20
8
I am on amd epyc 7004 and 9004 platforms, server rack machines.

I tried with

pciPassthru0.msiEnabled = "FALSE"

and with

8086 2700 d3d0 false

(in /etc/vmware/passthru.map), id of course based on the correct device id. But this does not help.

Seems something in the combination of ESXI, OmniOS and NVME passthrough devices is completely broken.
 
Last edited:

zachj

Active Member
Apr 17, 2019
161
106
43
Did you search the blog post for the word “oprom”?

Some devices don’t work if you don’t explicitly pass through their firmware to the vm, which you do by specifying the oprom parameter in the vmx file.

Please note this is guest os specific—you can pass through an amd gpu to windows no problem but to pass it through to a macOS vm requires the oprom parameter.

it’s also hardware specific—nvidia gpus don’t exhibit the same problem.

again I’m not saying this is your problem. I’m just saying it fits the description.
 

BobTB

Member
Jul 19, 2019
81
20
8
I did some more experiments, ubuntu 22.04 and TrueNAS both work work with all of these NVME drives. So this is really a neglected illumos / OmniOS problem.
 

zachj

Active Member
Apr 17, 2019
161
106
43
I did some more experiments, ubuntu 22.04 and TrueNAS both work work with all of these NVME drives. So this is really a neglected illumos / OmniOS problem.
Did you try the oprom fix?
 

BobTB

Member
Jul 19, 2019
81
20
8
Decided to go with Solaris until I manage to find a solution for OmniOS.

fio results, on Ubuntu 22.04 mounted on NFS in ESXI from the Solaris with raidz 4x passthrough Micron 7450 3.8Tb NVME. 283k IOPS. Good enough for me.


Code:
fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite --bs=4k --direct=0 --size 3G --numjobs=2 --runtime=240 --group_reporting
randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
...
fio-3.28
Starting 2 processes
Jobs: 2 (f=2): [w(2)][100.0%][w=1086MiB/s][w=278k IOPS][eta 00m:00s]
randwrite: (groupid=0, jobs=2): err= 0: pid=5396: Fri Jan 19 19:58:53 2024
  write: IOPS=283k, BW=1105MiB/s (1158MB/s)(6144MiB/5562msec); 0 zone resets
    slat (nsec): min=1120, max=67979k, avg=5708.82, stdev=241080.16
    clat (nsec): min=229, max=423703, avg=290.05, stdev=390.13
     lat (nsec): min=1440, max=67985k, avg=6058.32, stdev=241107.23
    clat percentiles (nsec):
     |  1.00th=[  241],  5.00th=[  251], 10.00th=[  251], 20.00th=[  251],
     | 30.00th=[  262], 40.00th=[  262], 50.00th=[  262], 60.00th=[  262],
     | 70.00th=[  270], 80.00th=[  282], 90.00th=[  342], 95.00th=[  510],
     | 99.00th=[  668], 99.50th=[  700], 99.90th=[ 1400], 99.95th=[ 2024],
     | 99.99th=[ 4960]
   bw (  MiB/s): min=  435, max= 3427, per=100.00%, avg=1249.39, stdev=474.27, samples=19
   iops        : min=111444, max=877390, avg=319843.29, stdev=121413.04, samples=19
  lat (nsec)   : 250=3.20%, 500=91.51%, 750=5.00%, 1000=0.14%
  lat (usec)   : 2=0.10%, 4=0.04%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.01%, 250=0.01%, 500=0.01%
  cpu          : usr=8.06%, sys=36.84%, ctx=493, majf=0, minf=23
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1572864,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=1105MiB/s (1158MB/s), 1105MiB/s-1105MiB/s (1158MB/s-1158MB/s), io=6144MiB (6442MB), run=5562-5562msec

Disk stats (read/write):
    dm-0: ios=0/222161, merge=0/0, ticks=0/345240, in_queue=345240, util=80.96%, aggrios=0/223290, aggrmerge=0/59, aggrticks=0/365187, aggrin_queue=365187, aggrutil=81.73%
  nvme0n1: ios=0/223290, merge=0/59, ticks=0/365187, in_queue=365187, util=81.73%
 
Last edited: