Question for Gigabyte MZ72-HB0 and EPYC 7003 users

Bronek

New Member
Jun 23, 2015
24
1
3
51
I've managed to significantly cut down interrupt latency by changing CPU pining vs CPU isolation of the host. It's not the same as with VAPIC, but close (judging by ḣow rare are the occurrences of sound artifacts)
 

mirrormax

Active Member
Apr 10, 2020
204
80
28
Can you show your XML? Try locking the VM to one Numa node both CPU and RAM. And static huge pages.
 

Bronek

New Member
Jun 23, 2015
24
1
3
51
Sure. This one is for a Windows guest. You need to compare this against lstopo output for this socket, see graphics. Basically, since I'm passing three devices (PCI 24 and 26 are USB card, while PCI 01 is a NVIDIA GPU) and they are attached to two different dies of the EPYC CPU, I've decided to take 4 (out of 6) cores of each die and assign them to the guest. So these are cores 14-17 for die P#2 and cores 18-21 (for next die) and their matching SMT (aka hyperthreading) siblings. Also in my case I have 4 out of 8 memory channels populated, which means only two out of 4 dies on this socket have memory. This is why numatune is on nodeset 2 (this means use memory on P#2). I am using remaining cores of these dies for other duties of the virtual machine - emulatorpin and iothread.

The kernel parameters are console=tty0 console=ttyS0,115200N8 udev.children-max=32 edac_core.edac_mc_panic_on_ue=1 video=efifb:off iommu=pt amd_iommu=on add_efi_memmap nohz_full=12-47,60-95 rcu_nocbs=12-47,60-95 isolcpus=12-47,60-95 systemd.cpu_affinity=0-11,48-59 nvme.poll_queues=4 initrd=\amd-ucode.img initrd=\initramfs-linux-lts.img

Code:
<domain type='kvm'>
  <name>legnica-vfio1</name>
  <uuid>a62835ce-a87d-4701-9d8e-9e01f0416462</uuid>
  <title>legnica vfio1</title>
  <memory unit='KiB'>67108864</memory>
  <currentMemory unit='KiB'>67108864</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB'/>
    </hugepages>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>16</vcpu>
  <iothreads>1</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='14'/>
    <vcpupin vcpu='1' cpuset='62'/>
    <vcpupin vcpu='2' cpuset='15'/>
    <vcpupin vcpu='3' cpuset='63'/>
    <vcpupin vcpu='4' cpuset='16'/>
    <vcpupin vcpu='5' cpuset='64'/>
    <vcpupin vcpu='6' cpuset='17'/>
    <vcpupin vcpu='7' cpuset='65'/>
    <vcpupin vcpu='8' cpuset='18'/>
    <vcpupin vcpu='9' cpuset='66'/>
    <vcpupin vcpu='10' cpuset='19'/>
    <vcpupin vcpu='11' cpuset='67'/>
    <vcpupin vcpu='12' cpuset='20'/>
    <vcpupin vcpu='13' cpuset='68'/>
    <vcpupin vcpu='14' cpuset='21'/>
    <vcpupin vcpu='15' cpuset='69'/>
    <emulatorpin cpuset='12-13,22-23,61,70-71'/>
    <iothreadpin iothread='1' cpuset='60'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='2'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-5.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/ovmf/x64/OVMF_CODE.fd</loader>
    <nvram template='/usr/share/ovmf/x64/OVMF_VARS.fd'>/var/lib/libvirt/qemu/nvram/legnica_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='4096'/>
    </hyperv>
  </features>
  <cpu mode='host-model' check='none'>
    <topology sockets='1' dies='2' cores='4' threads='2'/>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup' track='wall'/>
    <timer name='pit' tickpolicy='discard'/>
    <timer name='hpet' present='no'/>
    <timer name='tsc' present='yes' mode='native'/>
    <timer name='kvmclock' present='yes'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='directsync' io='native'/>
      <source dev='/dev/zvol/zdata/vdis/legnica'/>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='directsync' io='native'/>
      <source dev='/dev/disk/by-partuuid/5e605c2b-1bcc-41d0-b4b5-0d1df237dd62'/>
      <target dev='sdb' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='1' unit='0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='directsync' io='native'/>
      <source dev='/dev/zvol/zdata/vdis/legnica_steam'/>
      <target dev='sdc' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='2' unit='0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='directsync' io='native'/>
      <source dev='/dev/zvol/zdata/vdis/legnica_profiles'/>
      <target dev='sdd' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='3' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/data/isos/virtio-win-0.1.171.iso'/>
      <target dev='sde' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x16'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x17'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </controller>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <driver queues='16' iothread='1'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:01:34:2e:f3'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <rom bar='off'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/legnica.agent'/>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' port='5912' autoport='no' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <audio id='1' type='spice'/>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </video>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x24' slot='0x00' function='0x0'/>
      </source>
      <rom bar='off'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x26' slot='0x00' function='0x0'/>
      </source>
      <rom bar='off'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <rom bar='off'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <rom bar='off'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x2'/>
      </source>
      <rom bar='off'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x2'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x3'/>
      </source>
      <rom bar='off'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x3'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
</domain>
 

Attachments

Bronek

New Member
Jun 23, 2015
24
1
3
51
As for cache='directsync' , I am changing it to cache='none' for consistency, someone else would probably start with "none". In my case it probably makes no difference since my storage is ZFS volume (aka "zvol") which is always cached because ZFS forces caching. I think "none" and "directsync" work the same for zvols
 

mirrormax

Active Member
Apr 10, 2020
204
80
28
Found using only CPU/ram from one node worked the best, the one with gpu, USB card being on a different node didn't seem to be an issue. Also had issues were numad was overwriting numactl until i disabled it to get as close to metal perf as possible. With everything in VM running of one numa node with the least latency possible. Not at home so can't show my exact setup and XML. Your XML looks solid otherwise think there might some hyperv enlightenments missing?
 
  • Like
Reactions: Bronek

Bronek

New Member
Jun 23, 2015
24
1
3
51
Yeah I might be missing some hyperv enlightenment. I use almost identical XML for my Linux machines and they do not benefit from those, so I never bothered.
 

Bronek

New Member
Jun 23, 2015
24
1
3
51
FWIW I just tested various enlightenments on windows 10 and settled on

Code:
   <features>
    <acpi/>
    <apic/>
    <pae/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='4096'/>
      <vpindex state='on'/>
      <runtime state='on'/>
      <synic state='on'/>
      <tlbflush state='on'/>
      <ipi state='on'/>
    </hyperv>
  </features>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup' track='wall'/>
    <timer name='pit' tickpolicy='discard'/>
    <timer name='hpet' present='no'/>
    <timer name='tsc' present='yes' mode='native'/>
    <timer name='kvmclock' present='yes'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
Interestingly I found that on my machine stimer is significantly increasing DPC latency.