Slow power on of VMs...

Discussion in 'VMware, VirtualBox, Citrix' started by JimPhreak, Jun 5, 2015.

  1. JimPhreak

    JimPhreak Active Member

    Joined:
    Oct 10, 2013
    Messages:
    553
    Likes Received:
    55
    I just built a new ESXi 5.5 server and I'm noticing a weird issue with regard to when I manually power on a guest VM.

    Every time I manually power on a VM the process takes exactly 45 seconds to complete. I'm referring to the time the VMware task itself takes to complete not how long it takes the guest OS to startup. After that 45 seconds I then see the OS booting normally in the console.

    However when I power down a VM the task completes immediate. Since it takes 45 seconds for the task to complete every time I'm assuming it's a setting somewhere but I've never seen this one before on my previous ESXi server builds.
     
    #1
  2. EffrafaxOfWug

    EffrafaxOfWug Radioactive Member

    Joined:
    Feb 12, 2015
    Messages:
    1,062
    Likes Received:
    353
    How much memory does the guest have and what sort of storage are you powering up on? Are the discs busy when you hit the power button? I suspect you're seeing a delay as the hypervisor initialises the .vswp files...
     
    #2
  3. JimPhreak

    JimPhreak Active Member

    Joined:
    Oct 10, 2013
    Messages:
    553
    Likes Received:
    55
    The guest have 8GB, 4GB and 4GB of RAM dedicated to them. VMs are stored on a Samsung SM951 M.2 512GB SSD.

    I just find it strange that the delay is EXACTLY 45 seconds every time.
     
    #3
  4. EffrafaxOfWug

    EffrafaxOfWug Radioactive Member

    Joined:
    Feb 12, 2015
    Messages:
    1,062
    Likes Received:
    353
    If you look in the vmware log file it should give you some indication of what it's doing or waiting on for when you hit the power button...

    If you power on the VM from the command line do you still have the same delay?
     
    #4
  5. JimPhreak

    JimPhreak Active Member

    Joined:
    Oct 10, 2013
    Messages:
    553
    Likes Received:
    55
    I'm not familiar with VMware's command line. Here is what the log file shows.

    VMware log.txt - Google Drive

    This is from the moment I started the power on task and you'll see I left two spaces the moment the task completed.
     
    #5
    Last edited: Jun 5, 2015
  6. JimPhreak

    JimPhreak Active Member

    Joined:
    Oct 10, 2013
    Messages:
    553
    Likes Received:
    55
    Tried moving a VM from the SM951 to a datastore on one of my Intel 730's to see if it was something to do with the M.2 drive. Unfortunately it does the exact same thing regardless of which drive the guest VM is on.
     
    #6
  7. Entz

    Entz Active Member

    Joined:
    Apr 25, 2013
    Messages:
    269
    Likes Received:
    62
    Are you doing any replication or is the VM pointing to a host CDdrive that may have a non-bootable disk etc.
     
    #7
  8. JimPhreak

    JimPhreak Active Member

    Joined:
    Oct 10, 2013
    Messages:
    553
    Likes Received:
    55
    No replication. No host CD drives. They are all set to client device under CD/DVD drive.
     
    #8
  9. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,759
    Likes Received:
    855
    Something definitely wrong, my VM's take 5-7 sec to fully boot on a ZFS hybrid pool via 10G NFS. Never had issues like this, sounds disk I/O related, with 3 Vm's on a decent SSD you would not think it though...odd.

    Check esxtop and look at your 'd' (adpaters) vmhba## if HBA/RAID card/FC/iSCSI/FCoE) and 'u' (disks) esxtop pages/outputs.

    In the 'u' devices page check naa.obscureWWNnumber for direct attached/DAS disks, {NFS} datastore-name for NFS) and check DAVG/KAVG/GAVG for NFS datastores latencies (NFS will only yield guest average/GAVG). If spikes/peaks above 15 ms you've got issues. Also check logs as stated (specifically vmkernel.log).

    Something is amiss.
     
    #9
    Last edited: Jun 6, 2015
  10. EffrafaxOfWug

    EffrafaxOfWug Radioactive Member

    Joined:
    Feb 12, 2015
    Messages:
    1,062
    Likes Received:
    353
    For log files, there's the vmware.log files stored within the VMFS datastore that contains data pertinent to that sole VM, plus you'll also want to check the multitude of logs in /var/log/ (a bevvy of terminal windows and tail -f is your friend); just make sure that your system time is in sync with your workstations via NTP or somesuch first so that they're easier to correlate with the events you're seeing. Can't see files on google drive et al whilst I'm at work sadly.

    If your host is resource constrained you'll often see a delay booting VMs since the hypervisor has to evacuate guest memory from running VMs into the .vswp files, then initialise the .vswp files for the VM you're booting. When you say they have XGB "dedicated" to them, do you mean that they all have a reservation set in the resources tab? Or do they have the unlimited option button ticked? Does this host have enough RAM to give them all 100% of their allocated memory plus overheads and the service console?

    But strange that it's always exactly 45s which, as Entz suggests, sounds like the VMs are waiting on an I/O request to timeout before completing. Are these VMs you've brought over from another host or were they clean builds on this tin?
     
    #10
  11. JimPhreak

    JimPhreak Active Member

    Joined:
    Oct 10, 2013
    Messages:
    553
    Likes Received:
    55
    Hi Effra,

    Thanks for the reply. To answer some of your questions:

    1) In regard to how much RAM is dedicated to my VMs, theya re all set as such:

    [​IMG]

    So yes, the Unlimited box is checked.

    2) I currently have 4 active VMs that have the following RAM allocations (4GB, 4GB, 8GB, 4GB). I also have 2 inactive VMs that have 4GB allocated to them but whether these are on or off, nothing changes with regard to the slow power on of any VM. My host currently has 32GB of RAM in it.

    3) 2 of the 4 active VMs were brought over from another host (restore through Veeam backup) but the other 2 are fresh installs on this host.


    Is there anything in particular I should be looking for in these logs that might stand out? I've never really had to review VMware logs before so I'm not even really sure what I'm looking at/for.

    Also for some strange reason the time in my logs has been off ever since I set NTP. My host shows the correct time when I go Time Configuration and it matches my workstations but int he logs the time shows being 4 hours ahead.
     
    #11
    Last edited: Jun 8, 2015
  12. EffrafaxOfWug

    EffrafaxOfWug Radioactive Member

    Joined:
    Feb 12, 2015
    Messages:
    1,062
    Likes Received:
    353
    1) Think you've actually included the screenie from the CPU tab there rather than memory :) But if you haven't been tweaking settings, out-of-the-box VMware will leave those set at "unlimited" with no reservation. The reservation size affects the amount of space given to the .vswp files when you're running in a resource-constrained scenario.

    Sounds like you shouldn't be running into any resource problems or SCSI stuff dangling off into nowhere especially if both the new and old VMs have the same problem.

    Looking at vmware logs is a chore at the best of times (I swear they design them to be as "you should buy a support contract"-readable as possible, hence why I'd recommend getting on the shell of the host via SSH, clearing out the logs so that they're all as empty as possible and see if you can see what it's doing. whitey's comment about esxtop is valid as well - be handy to see if there's a period of disc activity or a CPU bottleneck or similar before the power-on event happens.
     
    #12
  13. JimPhreak

    JimPhreak Active Member

    Joined:
    Oct 10, 2013
    Messages:
    553
    Likes Received:
    55
    You're right about the screen shot, I just fixed that in my previous post.

    As for the logs, I'll go ahead and clear them and see what I can find.
     
    #13
  14. JimPhreak

    JimPhreak Active Member

    Joined:
    Oct 10, 2013
    Messages:
    553
    Likes Received:
    55
    Ok finally have some time to troubleshoot this issue. I've been working on getting my storage VM going as preclearing 8TB drives takes a LONG time so I wanted to get that going.

    With regard to esxtop, what should I be looking for? I've shut down all my non-essential VMs so the only ones running are pfSense, Media Server, and Storage Server. When I click to startup a VM, are there some indicators I should be looking for specifically in esxtop?

    I've never really had to troubleshoot ESXi as I'm sure you can tell.
     
    #14
  15. TuxDude

    TuxDude Well-Known Member

    Joined:
    Sep 17, 2011
    Messages:
    615
    Likes Received:
    336
    I usually look mostly at the latency numbers when using esxtop to dig into IO issues. I like to look at storage from the disk device perspective (hit 'u' while in esxtop), and make sure the latency fields are enabled (hit 'f' to get the field list, and 'i' to enable the * next to 'Overall Latency Stats', or 'j' or 'k' if you specifically want read or write latencies). Also usually turn off any fields I'm not interested in otherwise the display quickly gets too wide and I can't see the right-most fields, and hit 's', '2', '<enter>' to increase the refresh rate to 2 seconds (as fast as it can go). You will get 4 fields for latency, DAVG, KAVG, GAVG, and QAVG, which correspond to Device latency, Kernel, Guest, and Queue. Of those, Device is how much latency there was going over the wire to the storage, and coming back (whether that is local SATA/SAS, or remote iSCSI, FC, etc.), Kernel is the time spend in the hypervisor kernel, and Queue is time spent in the IO queue - add all those together and you get the Guest latency, which is what the guest OS sees for total latency.

    I usually like to keep all my latencies under 10 ms, occasional short spikes into the 20s are usually ok (often associated with a large batch of IOs from somewhere that gets the queues going), but if it stays above 20 for more than a second or two or is doing it often then the chance of users complaining about slowness is pretty high - with latency under 10 we never hear a complaint.

    If you need to figure out which VM is causing the high IO, then switch over to the 'disk VM' perspective ('v' key).
     
    #15
  16. JimPhreak

    JimPhreak Active Member

    Joined:
    Oct 10, 2013
    Messages:
    553
    Likes Received:
    55
    I'm pretty convinced at this point it's not a disk I/O issue. Did exactly as you asked and during the 45 seconds from when I try to turn on a VM and the task completes (tried this 5 times), the highest value I got on the Samsung Sm951 (VM Datastore) was 1.94 for DAVG which was probably from one of my active VMs and that was for a split second. It almost seems like the disk isn't even being accessed for those 45 seconds while the power on task is stuck at "In Progress."

    I think I'm going to have to start diving into the logs. FUN!
     
    #16
  17. TuxDude

    TuxDude Well-Known Member

    Joined:
    Sep 17, 2011
    Messages:
    615
    Likes Received:
    336
    Logs sound like the next logical place to check (I'd check CPU usage while still in esxtop but really doubt its the problem). I've found for issues like this the best log to look in is also the one that's hardest to get at (or maybe there's an easier way that I haven't found yet) - from a shell session on the host (local or SSH) cd into /vmfs/volumes/<datastore-name>/<vm-name>/ and there you will find the log files for that specific VM, the current (or most-recent if the VM is off) is named vmware.log. Note that <vm-name> in that path doesn't get renamed if you have renamed the VM (until you storage-vmotion it), so its possible it won't match. Also, if you are using shared storage with multiple hosts that the current log file for a running VM will be locked by the host running it - your shell session must be on the same host if you want to read that file.
     
    #17
  18. JimPhreak

    JimPhreak Active Member

    Joined:
    Oct 10, 2013
    Messages:
    553
    Likes Received:
    55
    Can't I just access those logs directly through the datastore browser in vSphere Client?
     
    #18
  19. EffrafaxOfWug

    EffrafaxOfWug Radioactive Member

    Joined:
    Feb 12, 2015
    Messages:
    1,062
    Likes Received:
    353
    Technically you can, but the datastore browser doesn't behave like a file shared log file might do. Nothing to stop you downloading it after the fact and reading it at your leisure but I generally find it helps to view these things in realtime.

    SSHing into the host and opening the log file with something like `tail -f /vmfs/volumes/somevol/somevm/vmware.log` and you'll get an almost-realtime log output into your terminal window (I think it refreshes every 1s by default) and this can be a big help in spotting issues as they occur. VMware logs are hideous enough that examining them after the fact is a great way to convince yourself that you need to drown that pack of ibuprofen in cheap gin.
     
    #19
  20. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,759
    Likes Received:
    855
    I can't tell you how many times this doc has methodically and logically resolved issues for me.

    http://communities.vmware.com/servl...952/vsphere41-performance-troubleshooting.pdf

    Hope that link works, if not just google 'vSphere performance Troubleshooting Guide' and focus in on the flow chart (starting on page 15) for whatever resource you feel or find evidence of being constrained. I know the guide is a bit old (focusing on vSphere 4.1) but i could not find an updated one and the troubleshooting methodology is spot on still.

    These may be useful as well...kiss your day goodbye. Good stuff w/in, I GUARANTEE it!

    http://pubs.vmware.com/vsphere-51/t...er-server-51-monitoring-performance-guide.pdf

    https://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.1.pdf

    VMware KB: Troubleshooting ESX/ESXi virtual machine performance issues
     
    #20
    Last edited: Jun 10, 2015
Similar Threads: Slow power
Forum Title Date
VMware, VirtualBox, Citrix ESX Slow NVMe with intel driver Jun 15, 2018
VMware, VirtualBox, Citrix Very slow esxi 6 or 6.5 response with X9SRH motherboard Mar 21, 2018
VMware, VirtualBox, Citrix Painfully slow Storage vMotion Migrations? Nov 15, 2017
VMware, VirtualBox, Citrix ESXI Backup with VEEAM concurrent tasks fast, single tasks slow Aug 16, 2017
VMware, VirtualBox, Citrix Esxi 6.5, x10sdv-4c-7tp4f, Samsung843 - slow performance? Mar 7, 2017

Share This Page