CPU performance vs disk access (OmniOS)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

ehfortin

Member
Nov 1, 2015
56
5
8
53
Hi,

I'm still working on getting a high performance storage with OmniOS running on ESXi 6 and this morning I saw what seems to be a problem. I created a SMB shared folder and needed to copy a few hundred GB of data on it over a 1 Gbps network. The network is used at 70-80%, OmniOS is reporting below 5% of CPU usage, vCenter is reporting nearly 100% of CPU usage, all related to that VM (I only have two running and the other is doing nothing) and everything is really slow. So, I figured vCenter is reporting correctly that the CPU is overloaded. However, as I did a few research, I read that I must use esxtop to really understand what is happening. As indicated by the article at ESXTOP, I was expecting to see %RDY over 10. They are at 0.01 to 0.73% so it is not that. %SYS is running at about 6.5 to 7.5%. That indicate lot of IO but... nothing to worry about. CPU load average are all below 0.5. AVG for PCPU USED and UTIL are at 25% while AVG CORE UTIL is at 46%. I'm including a snapshot of it so that you can have a better understanding. As I'm not speaking nor reading ESXTOP fluently, does somebody see something that is particularly wrong? Can we understand why the CPU is running at 90% continuously from as seen in vCenter even if prstat and esxtop don't seems to agree? As it is very slow, there is something for sure going on.
esxtop.PNG napp-it.PNG

Thank you.
 
Last edited:

ehfortin

Member
Nov 1, 2015
56
5
8
53
As a follow-up, I've read a lot of information and up to now, the conclusion is that ESXTOP is not showing anything that could indicate there is a problem. There is some load, yes but not an overload. So I'm still trying to understand how I can have a slow system with loaded CPU as reported by vCenter while I'm only doing a copy over SMB on ZFS on a 1 Gbps LAN. As written, OmniOS is not reporting any load on the CPU either. So it seems like the VM itself is creating load on the CPU probably while it manage the I/O but what and why?? That's what I'm trying to figure. Right now, it is killing the system when I do any long disk access like a backup or a copy of a VM.

Any idea?
 

Emulsifide

Active Member
Dec 1, 2014
212
93
28
Can you provide some details on your i/o layout? Are you passing a hard drive controller of some sort through to OmniOS or are you using VMDKs on your datastore as virtual hard drives?
 

ehfortin

Member
Nov 1, 2015
56
5
8
53
The hba (LSI 1068) is passthrough to the VM. That said, I had major issue of having the CPU loaded at 100% on another OmniOS VM that is running on another server with dedicated hba. This one is on a SSD raidz. I first got the problem while doing lot of IOPS on it but I also saw this even if not doing anything on the disks (my guess is that it was the ZFS scrubbing or Something like that) which caused the NFS datastore to become inactive. It corrupted some running VMs. I put everything back online and it happens again a few hours later. After a few times, I decided to move my VMs back to another nfs datastore running on an appliance and... it happens again on the source datastore (the ZFS one). Got really frustrated at that time so, I've restored everything from backup on my qnap and I'll try something else.

EDITED: Modified the last sentences as it was not particularly well explained that the problem is on the ZFS side, not on the new NFS datastore that reside on my qnap.
 
Last edited: