nvme namespace suggestions please ...

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

dragonme

Active Member
Apr 12, 2016
342
27
28
I picked up 2 Samsung / Dell pm1725A .. latest firmware 1.6TB pic-e NVME cards... eBay of course.. 70 bucks a pop...

well they came today and each had almost identical ~1.6 PB of data read/written and 48000+hours uptime.. not exactly spring chickens but heath still looked good from what I can make of NMVE tools version of smart output.

The use case will be for use in a ESXI server with 1 VM (nappit) having a 3008 HBA passed to it to manage the large 5x8tb data store pool
my previous servers I had this VM feeding internal AIO NFS volumes back to ZFS for VM storage but in this case I think I will keep the VMs native on a namespace

so ..

first.. ZFS / ESXI what is the best format for the namespaces .. its currently at 512.. I am thinking 4k would make more sense as pools get set up ashift=12 ?

Second... I want to partition a couple namespaces for ZIL and ARC.. the machine will have 128gb of ram.. and honestly I think regular l2arc in ram might be good enough. my current server is running just fine with no ZIL or ARC on the data pool .. but the VMs are on a pool of 2 stripped S3500SSDs

1 machine will have a 10gbe Nic.. but honestly the network is gigabit..

I dont thinK I will be running any active VM workloads on the pool requiring sync write like NFS.. but there will likely be times where NFS might serve out data to other machines.. and of course there will be SMB shares.. like to a VM for plex back end media.. again ... right now that pool on a 5 wide stripe of spinning iron seems to keep up..

I don't think ESXI can namespace the NVME natively.. so what.. should I pass it though to Ubuntu to manage setting up namespaces, then un-attach

each namespace should be treated like a LUN in ESXI.. so I was figuring I would pass though the namespace to Nappit .. to attach to the pool... rather than pass the Pcie device in total .. as I want to keep a namespace or 2 for native VMFS storage..

anywho.. school me on this .. its all new to to me... I go back far enough were storage was on punchcards so be gentle


haha


thanks in advance ..
 

gea

Well-Known Member
Dec 31, 2010
3,432
1,335
113
DE
The easiest method would be using the NVMe under ESXi as a local disk with VMFS.
Then create virtual disks on it for your VMs.
 

dragonme

Active Member
Apr 12, 2016
342
27
28
The easiest method would be using the NVMe under ESXi as a local disk with VMFS.
Then create virtual disks on it for your VMs.
yes.. that was the plan.. instead of using the AIO concept of using napp-it to feed ZFS pooled VMs back to ESXI, I was going to keep the VMs on a namespace natively in ESXI ..

but

I would still like to pass some of the NVME either by namespace or DRM disk for nappit to use as ZIL for the spinning drive pool that it will still control that is largely media.. which I guess in the end wont have much sync write now that it no longer hosts live VMs.. at least not at the moment .. although that could change I guess..
 

dragonme

Active Member
Apr 12, 2016
342
27
28
another oddity I came across..

I formatted the namespace with format id2 which is 4k.. but ESXI could not see it. even though it identified it as the best performance format.

formatted it in 512k the standard 0 option, and it shows up fine as a device?

Code:
   LBA Format Support:
         Format ID: 0
         LBAData Size: 512
         Metadata Size: 0
         Relative Performance: Better performance
      
         Format ID: 1
         LBAData Size: 512
         Metadata Size: 8
         Relative Performance: Degraded performance
      
         Format ID: 2
         LBAData Size: 4096
         Metadata Size: 0
         Relative Performance: Best performance
      
         Format ID: 3
         LBAData Size: 4096
         Metadata Size: 8
         Relative Performance: Good performance
 

gea

Well-Known Member
Dec 31, 2010
3,432
1,335
113
DE
Do not overcomplicate things.
Just use the NVMe as is under ESXi as a local datastore.

In a VM create a virtual disk on the NVMe for ZIL (10GB min).
As you have two NVMe, create two local datastores to allow mirrors ex for a special vdev mirror.
 

dragonme

Active Member
Apr 12, 2016
342
27
28
Do not overcomplicate things.
Just use the NVMe as is under ESXi as a local datastore.

In a VM create a virtual disk on the NVMe for ZIL (10GB min).
As you have two NVMe, create two local datastores to allow mirrors ex for a special vdev mirror.
I would like to take advantage of the latest ZFS direct write to NVME and remove a layer of abstraction it would be best to have nappitt see the PCIE or at least an NVME device right..

what I have played with is this.

Create VMs under the NVME native to ESXI
Create 10 GB namespaces with esxcli
creat a folder to hold RDM disk definitions
create an RDM disk backed by that 10gb namespace

back in the ESXI guy
Add a paravirtual NVME controller to the VM
Add the existing DRM Disk(s) to the VM NVME controller

now the device appears as a NVME block device, and since its RDM, the VM should have native command set controls, no?
 

gea

Well-Known Member
Dec 31, 2010
3,432
1,335
113
DE
Using the ESXi virtual NVME controller for virtual disks on NVMe makes sense but ZFS direct io is different. This is a new feature in OpenZFS 2.3 to bypass Arc caching on fast NVMe pools. This is not yet included in Illumos/OmniOS. You would need a Linux or Windows storage VM with OpenZFS 2.3 to use it (napp-it cs can manage both)
 

dragonme

Active Member
Apr 12, 2016
342
27
28
Using the ESXi virtual NVME controller for virtual disks on NVMe makes sense but ZFS direct io is different. This is a new feature in OpenZFS 2.3 to bypass Arc caching on fast NVMe pools. This is not yet included in Illumos/OmniOS. You would need a Linux or Windows storage VM with OpenZFS 2.3 to use it (napp-it cs can manage both)
but I am sure its a commit or 2 away...

anyway.. if the vdev is attached as slog cache.. zfs shouldn't have write amplification issues on that lun anyway... what we are avoiding with NVME direct writing is the zil write amplification which is unnecessary in solid state storage
 

Dev_Mgr

Active Member
Sep 20, 2014
165
60
28
Texas
I know for SAS and SATA, modern ESXi supports 4Kn, but via some Googling, it looks like 4Kn NVMe isn't supported (yet).

Some links:
- https://www.reddit.com/r/vmware/comments/17xygcm - https://knowledge.broadcom.com/external/article/327012/faq-support-statement-for-512e-and-4k-na.html (it says it supports 4Kn drives, but for NVMe, it specifically says it supports 512e (only))
- https://blog.westerndigital.com/formatting-4k-drives-for-vmware-vsphere/ the "workaround" is to reformat to 512 (instead of 4Kn)
 

dragonme

Active Member
Apr 12, 2016
342
27
28
I know for SAS and SATA, modern ESXi supports 4Kn, but via some Googling, it looks like 4Kn NVMe isn't supported (yet).

Some links:
- https://www.reddit.com/r/vmware/comments/17xygcm - FAQ: Support statement for 512e and 4K Native drives for VMware vSphere and vSAN (it says it supports 4Kn drives, but for NVMe, it specifically says it supports 512e (only))
- Formatting 4Kn NVMe Drives for VMware vSphere - Western Digital Blog the "workaround" is to reformat to 512 (instead of 4Kn)
yep that is what I found.. found an article on how to provision NVME for esxi using esxcli...

hard to believe that 'best in class' heavy iron virtualization heavyweight VMware is so far behind the times that esxi cant recognize 4k native drives.. NVME or not.. if the drive does not have 512 emulation your hosed...

Proxmox looking better every day.. and with Broadcom in charge.. things will only get worse.. and like most projects.. the real innovation end development get done outside the company by the dedicated fan base ... so proxmox should continue to accelerate at a fast pace given the nerd space that drives LINUX innovation as long as proxmox stays open and community.. the community will feed it well .. a narrative VMWare lost a long time ago.. when they almost let Flings die...