ESXi 6.7 shell command line very laggy response

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

BennyT

Active Member
Dec 1, 2018
166
46
28
I'm wondering if the boot disk of my ESXi 6.7 host in my homelab is about to die. I'm new to VMware ESXi having just built this sytstem in January so i'm trying to see how to diagnose my issue. Just typing at the esxi command line takes about 10 - 15 second delays about every 10 seconds or so. It is an old circa 2013 Intel enterprise 80GB ssd I purchased used.

I don't believe it is a network issue as I currently have just the one virtual switch used by vmkernal and all my VMs, connected to two physical network adapters in the host which in turn connect to simple unmanaged switch. I had an internal vswitch for fast network connections between my guest VMs but I've since removed it incase it was causing the problem. I don't think my issue is network related, but I could be wrong.

I ran a df -h and I think the 4GB partition is my boot partition. Looks like it has plenty of space there with only 1% used, so I don't believe I'm low on space on the boot partition.
2019-06-25_11-04-23.png

Here is the SMART info on the boot disk. These numbers look strange because they mostly say "100". The media wearout indicator has me worried too.
2019-06-25_10-55-51.png


The guest VMs themselves seem to be performing fine for the most part but with some high disk kernal command latency and queue command latency from time to time during heavy I0 (they are HDDs which I will be replacing with DCT 883 SSDs in future). I plan to replace this Intel boot SSD with a Samsung DCT 883 240GB eventually too, but I may do that sooner rather than later.

I'm considering installing latest ESXi 6.7 onto a USB thumbdrive and see if that improves things. This would be short term solution until I can replace the boot disk with a new SSD.

Please let me know what you think about my diagnosis. Do you think my boot disk is dying. Does esxi command line get very slow/laggy when boot disk about to crash?

THanks,

Benny
 
Last edited:

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
Im not sure about the boot disk, but i know my ESXI CL runs faster then 10-15 sec per command, so for sure something is off.
I was also under the impression that ESXI loads into memory once started and only uses the boot disk for loading/configuration files? Maybe im wrong. I run all my esxi installs of 32/64GB usb drives and never had any issues.

Oh the other thing is how the memory on the server? Is it full or over provisioned?
 

nephri

Active Member
Sep 23, 2015
541
106
43
45
Paris, France
Yes what i'm remember, esxi don't rely too much on the boot disk i/o
That allow to install it on usb drive without too much issue.

That's not the same thing by example with Proxmox (that i really like). Using it in usb drive is much more tricky
 

BennyT

Active Member
Dec 1, 2018
166
46
28
CPU and Memory seems to be okay for the VMs currently powered on. vCenter Client is running fine and so is the ESXi GUI, no lags there. Slowness appears only at ESXi command line. The ESXi ui is very normal with no perceived lag, even when pulling up logs there. However if I go to the command line of ESXI host and simply try to cd to a folder I have to often wait about 10 seconds before it even registers my keystrokes. Also, I'm on the LAN locally where the host is, no complex network setup. I can ssh into any of my guest VMs on the ESXi Host and no problem. But ssh into the esxi cl and that is crazy slow.


I have a couple choices. I continue and ignore the command line lag (I don't often go into esxi CL). If it dies I'll replace with USB stick or new SSD boot disk.

Or I experiment by installing ESXi onto a USB stick to see if that improves the ESXi CL response. I'm inclined to simply ignore it because I'm doing development on the VMs. In the meantime if ESXi boot dies, I'll address it then.

let me know if you have other ideas or diagnose ideas. Thanks!

vCenter:
2019-06-25_11-28-18.png

ESXi:
2019-06-25_11-47-11.png
 

BennyT

Active Member
Dec 1, 2018
166
46
28
I'm beginning to think it really is network related. If i ssh into the ESXi host linux command line from a guest VM or from another linux server on the network then no lag at all once I'm at the ESXi command line. I can type commands without any response lag or keypress delay. If I ssh into ESXi linux command line directly from my laptop (which is on same LAN via wifi) I get the laggy delay as I type.

Really weird. I'll look closer at my network adapters on my laptop. Maybe I'll try ehternet instead of wifi to see if that improves things. That would point to my wifi adapter being culprit.
 

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
I'm beginning to think it really is network related. If i ssh into the ESXi host linux command line from a guest VM or from another linux server on the network then no lag at all once I'm at the ESXi command line. I can type commands without any response lag or keypress delay. If I ssh into ESXi linux command line directly from my laptop (which is on same LAN via wifi) I get the laggy delay as I type.

Really weird. I'll look closer at my network adapters on my laptop. Maybe I'll try ehternet instead of wifi to see if that improves things. That would point to my wifi adapter being culprit.
Good find that might be causing the issue.
 

BennyT

Active Member
Dec 1, 2018
166
46
28
My problem was because I tried to enable NIC teaming on the physical adapters. This was causing serious lag issues when trying to access ESXi command line via ssh from my laptop and going through router -> TP-Link unmanaged switch ->

I don't have a managed switch. I have an unmanaged TP-Link 8 Port Gigabit switch (tl-sg108). Yet I had two of the NICS in my ESXi host connected to that switch and those NICs were attempting to act as teamed.

2019-06-26_9-28-14.png

I changed this to use the 2nd physical nic as a failover NIC intstead of teaming. That fixed my network issue and lag:
2019-06-26_9-29-28.png

Any traffic going through my router, such as my ssh wifi traffic when connecting to ESXi host from laptop, was apparently confused by having essentially two devices with same IP. However, any network activity not going through the router (activity just between computers connected only to the switch or between guests within the ESXi host), those were fine. After changing back to "Failover" mode on my physical NICs fixed my issue even for traffic coming through the router.

So now my standard virtual switch looks like below with management and VM traffic going just through one physical NIC. Previously I had the yellow line from Management Network and VMs going to both physical NICs, and then to an unmanaged switch. Now the 2nd NIC is only acting as failover.
2019-06-26_9-52-23.png
 
Last edited:
  • Like
Reactions: msg7086