lga3647 esxi build to host my Oracle Apps/Databases

BennyT

Active Member
Dec 1, 2018
125
34
28
successfully installed the four 4TB sata SSDs (crucial mx500) and sas9300-8i today. It was simple plug n play.

The X11DPI-NT motherboard bios has settings for each PCIe slot to be either "LEGACY" or "UEFI". I used default of LEGACY for the slot i put the HBA into.

I'm unsure but i think UEFI is only needed for a PCIe slot if i wanted to PASSTHROUGH the HBA device to a VM. I don't need passthrough and so i left it as LEGACY.

The added storage was much needed and being SSDs it's so much faster doing Oracle RMAN DB duplications across my VMs, was great. I'd love to switch to all flash storage eventually.
When the nvme SSDs and pcie x16 m.2 card get here, that'll be fun.
 

BennyT

Active Member
Dec 1, 2018
125
34
28
Successfully installed and configured our PCIE M.2 card into Slot#6 using CPU2 IOU1 set to bifurcation x4x4x4x4. See my attached diagram.

I may try experiment to move it to slot#2 (IOU1) for CPU1, but since I've already allocated the two m.2 drives to Datastores, I'm unsure if moving the card to a different PCI-E slot will affect that in ESXi.
2021-11-29_10-23-20.png
 
Last edited:
  • Like
Reactions: itronin

BennyT

Active Member
Dec 1, 2018
125
34
28
Hello, I'm back with finding weird problems in my ESXi system


Sometimes I have good performance from the SATA SSDs in ESXi VM guests:
2021-11-30_19-14-30.png


and sometimes it's not good. I wonder if writing high volumes (4GB write test in above screenshot) caused this to bog on subsequent tests:
2021-11-30_19-16-53.png

I'm not doing anything fancy. I've set a single SATA SSD, connected to LSI9300 (legacy pcie in bios), no ESXi passthrough to guest VMs

The SSD makes up a single datastore by itself, no other storage devices are in that datastore..

The linux VM BRTAD18 shown in the above screenshot has been allocated a single virtual disk:

/dev/sda (with three partitions sda1, sda2, sda3)

If I reboot the linux VM and re-run my write speed tests, I get acceptable performance again
2021-11-30_19-28-23.png

it's really weird. I first noticed this, not trying to do speed tests but during an Oracle Cloning process. I was seeing write speeds top at 30 and 40 MBps in linux iotop and I was wondering if it was normal, but was kinda expecting better. That's when I began running these speed tests to see if I had an ESXi driver problem or something weird like that.


I've not built anything onto my NVMe SSDs but that would be major bummer if I just spent $ on new SSD storage and they top at 40MBps most of the time.

Any advice or diagnosis ideas from the ESXi experts? Does ESXi 6.7 act unpredictable when using SATA SSDs? That would be bad and I can't believe that is normal. There must be a fix or explanation.

Do I need to change something in the supermicro bios or maybe something in the lsi card's bios?

Here is what my boot screen looks like when it sees the LSI card as I reboot the ESXi host server (not sure it shows anything helpful to ya):2021-11-24_12-56-44.png

Maybe this is where I need to consider learning how to paasthrough the ssd devices to a virtual NAS instead of direct attached storage in ESXI? I don't really know what I'm talking about, never built a san or nas.
 
Last edited:

BennyT

Active Member
Dec 1, 2018
125
34
28
It really might help if you'd provide details on the SSD(s) being used;)
Crucial mx500 sata 4TB . Product#is in that last lsi bios screenshot. Crucial ssd firmware is M3CR044 (latest version i believe), seen as 044 in that screenshot

I'm going to try some tests from esxcli (ssh into esxi Linux shell) command line and try similar dd test to see if i get better results. Trying to narrow the location of the problem by testing outside a vm.

*Edit: i may need to examine closer how i configured the storage controllers (there are a few different vmware scsi settings to choose from) for the virtual disks when i created the vm.

Also, i had created this vm by cloning an earlier Linux vm. I will create all new VM to avoid mistake of presumption that the vm i had cloned was good. I will also set the disk to Thick Eager, incase the issue is from provisioning space into the virtual disk.

Its about 5:20am now and this is when i think the best as I'm drinking coffee, lol
 
Last edited:

uldise

Active Member
Jul 2, 2020
124
37
28
what about trim on these SSDs? as far as i know(well, have not used ESXi for a while) ESXi wont support trim as you see it in Linux, or are this changed on latest versions?
 

BennyT

Active Member
Dec 1, 2018
125
34
28
what about trim on these SSDs? as far as i know(well, have not used ESXi for a while) ESXi wont support trim as you see it in Linux, or are this changed on latest versions?
TRIM? I don't think it's absence is causing my problems currently, but it might be. I don't think I've collected garbage on the SSD to affect performance like this and SSD does perform well at above 300-400MBps writes sometimes. Reads are even faster at above 500MBps and never a problem. Seems just the writes suffer sometimes and fall below 40 MBps and seem to stay there until I reboot the VM, which leads me to believe it's a problem with how I configured the VM.

The Crucial SSD's firmware has automatic garbage collection, but only when SSD is idle. What does the SSD firmware consider as "idle"? There is still activity happening in an OS even when users are not logged in. Crucial puts that garbage collection in their firmware for OS that do not support TRIM. I don't know if it is working or operating though.

Inside a Linux VM, if I run "lsblk -D" or "lsblk --discard" it will show DISC-GRAN and DISC-MAX values. If they return as 0B for a device then that means the Linux VM doesn't think the device supports TRIM.
2021-12-01_10-00-58.png
so even though the storage device is marked as FLASH inside of ESXi, the guest OS doesn't recognize that the storage device support TRIM.

But there is a UNMAP (for SAS devices) and TRIM (for SATA) enabled by the "Space Reclamation" feature in vCenter 6.7u1 and in vSAN 6.7u1

From VMware:

VMware vSAN 6.7U1 introduces automated space reclamation support with TRIM and SCSI UNMAP support. SCSI UNMAP and the ATA TRIM command enable the guest OS or file system to notify that back-end storage that a block is no longer in use and may be reclaimed.

I can see that Space Reclamation in vCenter under Datastore --> Configure --> General
2021-12-01_10-10-38.png

I'm still experimenting. My next step is to create a small Linux VM, and try different virtual SCSI controllers and also provision as THICK EAGER ZEROING. I'm wondering if that was my problem by not having THICK EAGER it was probably trying to provision and zero byte the disk space prior to allowing writes to occur, which seems like that would be very bad for performance.
 
Last edited:

BennyT

Active Member
Dec 1, 2018
125
34
28
I believe the problem of slow write speeds to the SSD was from "Thick Lazy Zeroing" and also "Thin" provisioning disk. ESXi was trying to grow/expand the virtual disk which basically kills the SSD write performance. And once that growing/expansion begins, even after the expansion completes, the SSD writes would still not improve above 40MBps until I rebooted the VM. For example, if I tested a write speed test on a 5 or 6GB size file by using 64k block size and count=10k on a lazy zeroed virtual disk, it would cause ESXi to grow the virtual disk and that destroyed performance. Then, even small write tests of 4k block size and count=1k would still be under 50MBps

Provisioning a new Linux VM with "Thick Eager Zeroing" fixed that problem. It doesn't have to grow/expand. I was getting 450-500MBps write speeds, no problem. I tested numerous times with varying size blocks and counts. Even writing up to 7+GB sample data file I had no problems.

I'm so thankful!
2021-12-01_14-07-32.png



Next up is to test the NVMe. Those should really fly. I think I'll need to build the VMs using the VMware NVMe controller instead of the VMware Paravirtual SCSI controller.

Thank you
 
Last edited:
  • Like
Reactions: uldise and Rand__