performance of KVM/QEMU disks on ZFS volumes

Bronek

New Member
Jun 23, 2015
7
1
3
49
Apologies if this question does not below to this forum. I am running a small number of Windows 10 guests on libvirt-2.5 + qemu-2.7 + linux-4.8 with ZFS 0.6.5.8 . Disks of the guests are setup on ZFS zvols , for example disk C: of guest "lublin" ...

Code:
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/zvol/zdata/vdis/lublin'/>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
. . .
    <controller type='scsi' index='0' model='virtio-scsi'>
      <driver queues='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
... is mapped to:

Code:
root@gdansk /etc/modprobe.d # zfs get all zdata/vdis/lublin
NAME               PROPERTY              VALUE                  SOURCE
zdata/vdis/lublin  type                  volume                 -
zdata/vdis/lublin  creation              Thu Jul 30 20:35 2015  -
zdata/vdis/lublin  used                  1.14T                  -
zdata/vdis/lublin  available             2.28T                  -
zdata/vdis/lublin  referenced            136G                   -
zdata/vdis/lublin  compressratio         1.00x                  -
zdata/vdis/lublin  reservation           none                   default
zdata/vdis/lublin  volsize               160G                   local
zdata/vdis/lublin  volblocksize          8K                     -
zdata/vdis/lublin  checksum              on                     default
zdata/vdis/lublin  compression           off                    inherited from zdata/vdis
zdata/vdis/lublin  readonly              off                    default
zdata/vdis/lublin  copies                1                      default
zdata/vdis/lublin  refreservation        165G                   local
zdata/vdis/lublin  primarycache          all                    default
zdata/vdis/lublin  secondarycache        all                    default
zdata/vdis/lublin  usedbysnapshots       883G                   -
zdata/vdis/lublin  usedbydataset         136G                   -
zdata/vdis/lublin  usedbychildren        0                      -
zdata/vdis/lublin  usedbyrefreservation  150G                   -
zdata/vdis/lublin  logbias               latency                default
zdata/vdis/lublin  dedup                 off                    default
zdata/vdis/lublin  mlslabel              none                   default
zdata/vdis/lublin  sync                  standard               default
zdata/vdis/lublin  refcompressratio      1.00x                  -
zdata/vdis/lublin  written               14.7G                  -
zdata/vdis/lublin  logicalused           990G                   -
zdata/vdis/lublin  logicalreferenced     134G                   -
zdata/vdis/lublin  snapshot_limit        none                   default
zdata/vdis/lublin  snapshot_count        none                   default
zdata/vdis/lublin  snapdev               hidden                 default
zdata/vdis/lublin  context               none                   default
zdata/vdis/lublin  fscontext             none                   default
zdata/vdis/lublin  defcontext            none                   default
zdata/vdis/lublin  rootcontext           none                   default
zdata/vdis/lublin  redundant_metadata    all                    default
The ZFS is also setup with a dedicated SLOG 4GB device running on a NVMe and 250GB LARC2 also running on the same NVMe DC3700 , there is also 16GB of RAM for LARC to use (of total 128GB available):

Code:
root@gdansk /etc/modprobe.d # cat zfs.conf
# Enforce max ZFS ARC size to 16GB = 16*1024*1024*1024 = 17179869184
options zfs zfs_arc_max=17179869184
# Enforce synchronous scsi scan, to prevent zfs driver loading before disks are available
options scsi_mod scan=sync
Despite this all, very often the disk performance from within guest is unsatisfactory. And by this I mean IO speed reported by task manager rarely exceeding 3MB/s, sometimes dipping below 1MB/s. This happens when doing some IO intensive work on C: like updating software (e.g. when Adobe cloud is upgrading Photoshop). Often during these times some Windows processes are becoming unresponsive for short periods of time. At the some time the host is fine and responsive.

Any hints how to improve that? Or at least tried-and-tested setup to use when a ZVOL is used as a QEMU guest disk? Or is such setup generally not recommended and I should move pronto to QCOW2?
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,060
500
113
New York City
www.glaver.org
Apologies if this question does not below to this forum. I am running a small number of Windows 10 guests on libvirt-2.5 + qemu-2.7 + linux-4.8 with ZFS 0.6.5.8 . Disks of the guests are setup on ZFS zvols , for example disk C: of guest "lublin" ...
You might want to try the Linux forum here. This side is pretty much FreeBSD (where ZFS is natively included). For example, I don't know what version 0.6.5.8 of ZFS is.
Despite this all, very often the disk performance from within guest is unsatisfactory. And by this I mean IO speed reported by task manager rarely exceeding 3MB/s, sometimes dipping below 1MB/s. This happens when doing some IO intensive work on C: like updating software (e.g. when Adobe cloud is upgrading Photoshop). Often during these times some Windows processes are becoming unresponsive for short periods of time. At the some time the host is fine and responsive.
That's something that needs to be investigated at all of the host OS layer, virtualization layer, and guest OS layer. It's a lot easier to track down these things when the host and guest OS are the same "brand", since bug reports can't be closed with "oh, it's your other operating system... over there" type things.

ZFS even on slow disks and CPUs should be able to achieve at least 100Mbyte/sec. Here's a graph from 6+ years ago showing 500MB/sec writes for extended periods (hours):

 

Bronek

New Member
Jun 23, 2015
7
1
3
49
does the windows host have the latest virt-io drivers?
Ah that's a good question. No, the Windows guests are using old version 0.112. I will upgrade to 0.126 and report back. BTW I cannot find an option to move this thread to a more appropriate group, is this something a moderator only can do? Or should I simply start a new thread in Linux group, or look more closely again?
 

Bronek

New Member
Jun 23, 2015
7
1
3
49
That's something that needs to be investigated at all of the host OS layer, virtualization layer, and guest OS layer. It's a lot easier to track down these things when the host and guest OS are the same "brand", since bug reports can't be closed with "oh, it's your other operating system... over there" type things.
I do have some guests from the same distribution (Arch) and the same kernel version as host, and using guest disk setup the same way as Windows 10 guests, i.e. block device mapped to ZVOL. But I have not tried benchmarking IO on those, this is something I need to look into. Thanks for help!
 

Markus

Member
Oct 25, 2015
78
19
8
What's about the zpool-Zettings?

I accidentally activated deduplication on my pool and doesn't have enough RAM... - After the buffer is full my speed is very low...

Regards
Markus
 

Bronek

New Member
Jun 23, 2015
7
1
3
49
I am certain I do not have deduplication enabled. Also the bad guest performance happens when the host has at least 20GB memory free. Actually I could safely increase zfs_arc_max from 16GB (the host has 128GB of which usually only half is used by guests), is there some way to calculate how much ARC I need for a given pool size and/or L2ARC size?
 
Last edited:

Bronek

New Member
Jun 23, 2015
7
1
3
49
I have an idea what this might be (although not 100% sure). The performance jumped when I switched on option (in disk policies, Windows 10 guest) "Turn off Windows write-cache buffer flushing". That is, I disabled flushing of "disk" buffer, when the disk is actually ZVOL on ZFS. As far as I understand, the buffer flushing is synchronous (i.e. blocking writes in guest) and it kicks every second. I guess this is mapped (via scsi virtio and qemu) to fsync , which in turn triggers synchronous flush of ZIL . When there are plenty of writes in Windows, this turns to even more writes in ZFS and the result is that eventually under heavy IO load, disk performance in guest drops to embarrassing level. Disabling flushing of "disk" buffer means that either 1) ZIL flushing reverts back to as-designed, which IIRC is every 5s, or 2) it is still called by Windows guest, but is no longer blocking writes in guest. Either way, the single switch enabled me to regain good performance in guest.
 
  • Like
Reactions: gigatexal