NVME devices in OmniOS - ESXI Passthrough

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

BobTB

Member
Jul 19, 2019
82
21
8
Passing through the drives (SSD and later NVME) to OmniOS in ESXI. I am using this now for 10+ years. First time I noticed this problem was with Intel Optane drives, and now more and more NVME Drives don't work, Micron u.3 7300 and 7450 don't work .. For some strange reason Samsung u.2 NVME PM983 drives work.

I don't actually know what is the difference between the working and none working drives.

No one uses NVME drives in passthrough mode on OmniOS? This was a bug 5+ years ago, anyone here knows if I am missing some setting somehere?


Interestingly all of them work on Solaris 11.4, including Intel Optane u.2 drives.

The errors I see in the logs of OmniOS (I tried a lot of versions) are mostly like below, and then the drives are "retired":

Code:
an 17 17:51:41 omnios nvme: [ID 369395 kern.warning] WARNING: nvme3: command 1/0 timeout, OPC = 6, CFS = 0
Jan 17 17:51:41 omnios nvme: [ID 369395 kern.warning] WARNING: nvme1: command 1/0 timeout, OPC = 6, CFS = 0
Jan 17 17:51:41 omnios nvme: [ID 369395 kern.warning] WARNING: nvme2: command 1/0 timeout, OPC = 6, CFS = 0
Jan 17 17:51:41 omnios nvme: [ID 369395 kern.warning] WARNING: nvme0: command 1/0 timeout, OPC = 6, CFS = 0
Jan 17 17:51:42 omnios nvme: [ID 369395 kern.warning] WARNING: nvme3: command 2/0 timeout, OPC = 8, CFS = 0
Jan 17 17:51:42 omnios nvme: [ID 988005 kern.warning] WARNING: nvme3: ABORT failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 596656 kern.warning] WARNING: nvme3: IDENTIFY failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 318795 kern.warning] WARNING: nvme3: failed to identify controller
Jan 17 17:51:42 omnios nvme: [ID 369395 kern.warning] WARNING: nvme1: command 2/0 timeout, OPC = 8, CFS = 0
Jan 17 17:51:42 omnios nvme: [ID 988005 kern.warning] WARNING: nvme1: ABORT failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 596656 kern.warning] WARNING: nvme1: IDENTIFY failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 318795 kern.warning] WARNING: nvme1: failed to identify controller
Jan 17 17:51:42 omnios nvme: [ID 369395 kern.warning] WARNING: nvme2: command 2/0 timeout, OPC = 8, CFS = 0
Jan 17 17:51:42 omnios nvme: [ID 988005 kern.warning] WARNING: nvme2: ABORT failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 596656 kern.warning] WARNING: nvme2: IDENTIFY failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 318795 kern.warning] WARNING: nvme2: failed to identify controller
Jan 17 17:51:42 omnios nvme: [ID 369395 kern.warning] WARNING: nvme0: command 2/0 timeout, OPC = 8, CFS = 0
Jan 17 17:51:42 omnios nvme: [ID 988005 kern.warning] WARNING: nvme0: ABORT failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 596656 kern.warning] WARNING: nvme0: IDENTIFY failed with sct = 0, sc = 0
Jan 17 17:51:42 omnios nvme: [ID 318795 kern.warning] WARNING: nvme0: failed to identify controller
Jan 17 17:51:42 omnios genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@18/pci1344,4000@0 (nvme3) down
Jan 17 17:51:42 omnios genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@16/pci1344,4000@0 (nvme1) down
Jan 17 17:51:42 omnios genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@17/pci1344,4000@0 (nvme2) down
Jan 17 17:51:42 omnios genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@15/pci1344,4000@0 (nvme0) down
 
Last edited:

zachj

Active Member
Apr 17, 2019
270
151
43
Search this post for “oprom”


Not saying this is your problem but it’s my best guess.
 

BobTB

Member
Jul 19, 2019
82
21
8
I am on amd epyc 7004 and 9004 platforms, server rack machines.

I tried with

pciPassthru0.msiEnabled = "FALSE"

and with

8086 2700 d3d0 false

(in /etc/vmware/passthru.map), id of course based on the correct device id. But this does not help.

Seems something in the combination of ESXI, OmniOS and NVME passthrough devices is completely broken.
 
Last edited:

zachj

Active Member
Apr 17, 2019
270
151
43
Did you search the blog post for the word “oprom”?

Some devices don’t work if you don’t explicitly pass through their firmware to the vm, which you do by specifying the oprom parameter in the vmx file.

Please note this is guest os specific—you can pass through an amd gpu to windows no problem but to pass it through to a macOS vm requires the oprom parameter.

it’s also hardware specific—nvidia gpus don’t exhibit the same problem.

again I’m not saying this is your problem. I’m just saying it fits the description.
 

BobTB

Member
Jul 19, 2019
82
21
8
I did some more experiments, ubuntu 22.04 and TrueNAS both work work with all of these NVME drives. So this is really a neglected illumos / OmniOS problem.
 

zachj

Active Member
Apr 17, 2019
270
151
43
I did some more experiments, ubuntu 22.04 and TrueNAS both work work with all of these NVME drives. So this is really a neglected illumos / OmniOS problem.
Did you try the oprom fix?
 

BobTB

Member
Jul 19, 2019
82
21
8
Decided to go with Solaris until I manage to find a solution for OmniOS.

fio results, on Ubuntu 22.04 mounted on NFS in ESXI from the Solaris with raidz 4x passthrough Micron 7450 3.8Tb NVME. 283k IOPS. Good enough for me.


Code:
fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite --bs=4k --direct=0 --size 3G --numjobs=2 --runtime=240 --group_reporting
randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
...
fio-3.28
Starting 2 processes
Jobs: 2 (f=2): [w(2)][100.0%][w=1086MiB/s][w=278k IOPS][eta 00m:00s]
randwrite: (groupid=0, jobs=2): err= 0: pid=5396: Fri Jan 19 19:58:53 2024
  write: IOPS=283k, BW=1105MiB/s (1158MB/s)(6144MiB/5562msec); 0 zone resets
    slat (nsec): min=1120, max=67979k, avg=5708.82, stdev=241080.16
    clat (nsec): min=229, max=423703, avg=290.05, stdev=390.13
     lat (nsec): min=1440, max=67985k, avg=6058.32, stdev=241107.23
    clat percentiles (nsec):
     |  1.00th=[  241],  5.00th=[  251], 10.00th=[  251], 20.00th=[  251],
     | 30.00th=[  262], 40.00th=[  262], 50.00th=[  262], 60.00th=[  262],
     | 70.00th=[  270], 80.00th=[  282], 90.00th=[  342], 95.00th=[  510],
     | 99.00th=[  668], 99.50th=[  700], 99.90th=[ 1400], 99.95th=[ 2024],
     | 99.99th=[ 4960]
   bw (  MiB/s): min=  435, max= 3427, per=100.00%, avg=1249.39, stdev=474.27, samples=19
   iops        : min=111444, max=877390, avg=319843.29, stdev=121413.04, samples=19
  lat (nsec)   : 250=3.20%, 500=91.51%, 750=5.00%, 1000=0.14%
  lat (usec)   : 2=0.10%, 4=0.04%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.01%, 250=0.01%, 500=0.01%
  cpu          : usr=8.06%, sys=36.84%, ctx=493, majf=0, minf=23
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1572864,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=1105MiB/s (1158MB/s), 1105MiB/s-1105MiB/s (1158MB/s-1158MB/s), io=6144MiB (6442MB), run=5562-5562msec

Disk stats (read/write):
    dm-0: ios=0/222161, merge=0/0, ticks=0/345240, in_queue=345240, util=80.96%, aggrios=0/223290, aggrmerge=0/59, aggrticks=0/365187, aggrin_queue=365187, aggrutil=81.73%
  nvme0n1: ios=0/223290, merge=0/59, ticks=0/365187, in_queue=365187, util=81.73%
 
Last edited:

chune

Member
Oct 28, 2013
120
23
18
Did omnios ever address this issue? Whats the best omnios-supported NVME SSD currently? PM983s?
 

dragonme

Active Member
Apr 12, 2016
372
32
28
I did some more experiments, ubuntu 22.04 and TrueNAS both work work with all of these NVME drives. So this is really a neglected illumos / OmniOS problem.
I have found after running opensolaris / ominos with nappit for several years.. that they really concentrate on stability and uptime

even their 'bloody' track is probably more stable than some 'production ready' newer versions of truenas for example..

but at the expense of GLACIAL advancement and addition of features and drivers..

I think with openzfs now centered on Linux and the head branch for devs... if you want features, modern hardware performance, latest ZFS features, etc.. you need to look to a ZFS based on linux.

Proxmox is a potential for those looking to hyperconverge.. but their 'Debian' build on 'ubuntu kernel' sometimes jacks them up.. and the environment is far from a standard linux distor.. so sometimes stability and compatibility suffer

going pure ubuntu with say incus for vm/lxc and some other front end for those liking gui like cockpit to manage zfs

the list goes on .. but you get the gist...

I am experimenting with an ESXI AIO with truenas scale 25.04 hyperconverged storage .. its at the bleeding edge as truenas is now leveraging zfs 2.3 and including a gui incus interface .. as well as its own App Store...

while its far heavier, boots slower, lacks kernel SMB (for now until they get a clue and use ksmbd) , its been stable for several weeks.. but ominos even with the hobbled and not very actively designed napp-it gui .. probably more stable .. if you dont need modern hardware support at the kernel

I think personally.. canonical is on a rampage.. and they are very active developers and supporters of ZFS and lxc etc..
I think also with very user friendly terms for community / free ... I think that going forward serious look to them and ubuntu especially for home lab / use .. should be a focus for many...