High IO Wait on Proxmox

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Emile

New Member
Mar 26, 2020
1
0
1
I have a server:

CPU: 2x Intel Xeon E5-2689 @ 2.6 GHz
Mobo: Supermicro X9DRi-LN4F+
RAM: 72 GB ECC @ 1333 MHz
HD: NVMe 970 EVO 500 GB, 2x 1 TB WD, 2x 500 GB SSD
GPU: AMD RX 570
OS: Proxmox

Every time I copy a big file from a drive to a different drive, iowait goes up to 80% and all VMs become unusable until the transfer is complete. My NVMe gets a maximum speed of 1.5 GB/s and 50k iops. I tried changing filesystem to EXT4/ZFS and it was the same. Until now, I didn't bother digging further as I thought it was a motherboard issue. Today, I decided to pass through my NVMe to a Windows VM directly and ran some tests. To my surprise, I'm getting 3.5 GB/s and 130k iops which is what a 970 EVO should do. What could cause this? Is there a kernel module/parameter that could cause my drives to slow down? The issue happens on my SATA SSDs too. They get low IOPS and low speeds, no matter what filesystem I use.

Here's a FIO result executed on my NVMe 970 EVO on the host:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [m(1)][26.7%][r=221MiB/s,w=74.1MiB/s][r=56.5k,w=18.0k IOPS][eta 0Jobs: 1 (f=1): [m(1)][28.6%][r=224MiB/s,w=75.7MiB/s][r=57.5k,w=19.4k IOPS][eta 0Jobs: 1 (f=1): [m(1)][35.7%][r=218MiB/s,w=73.1MiB/s][r=55.8k,w=18.7k IOPS][eta 0Jobs: 1 (f=1): [m(1)][42.9%][r=221MiB/s,w=73.0MiB/s][r=56.6k,w=18.9k IOPS][eta 0Jobs: 1 (f=1): [m(1)][50.0%][r=220MiB/s,w=74.2MiB/s][r=56.3k,w=19.0k IOPS][eta 0Jobs: 1 (f=1): [m(1)][60.0%][r=217MiB/s,w=73.0MiB/s][r=55.7k,w=18.7k IOPS][eta 0Jobs: 1 (f=1): [m(1)][64.3%][r=217MiB/s,w=71.8MiB/s][r=55.7k,w=18.4k IOPS][eta 0Jobs: 1 (f=1): [m(1)][73.3%][r=215MiB/s,w=72.1MiB/s][r=54.9k,w=18.5k IOPS][eta 0Jobs: 1 (f=1): [m(1)][78.6%][r=216MiB/s,w=73.0MiB/s][r=55.3k,w=18.7k IOPS][eta 0Jobs: 1 (f=1): [m(1)][85.7%][r=214MiB/s,w=71.5MiB/s][r=54.8k,w=18.3k IOPS][eta 0Jobs: 1 (f=1): [m(1)][92.9%][r=216MiB/s,w=71.3MiB/s][r=55.4k,w=18.2k IOPS][eta 0Jobs: 1 (f=1): [m(1)][100.0%][r=212MiB/s,w=69.7MiB/s][r=54.2k,w=17.8k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=25798: Fri Mar 27 13:34:59 2020
read: IOPS=55.9k, BW=219MiB/s (229MB/s)(3070MiB/14049msec)
bw ( KiB/s): min=215208, max=233712, per=100.00%, avg=223800.29, stdev=4840.57, samples=28
iops : min=53802, max=58428, avg=55950.07, stdev=1210.14, samples=28
write: IOPS=18.7k, BW=73.0MiB/s (76.6MB/s)(1026MiB/14049msec); 0 zone resets
bw ( KiB/s): min=71000, max=78968, per=100.00%, avg=74800.00, stdev=1948.95, samples=28
iops : min=17750, max=19742, avg=18700.00, stdev=487.24, samples=28
cpu : usr=19.05%, sys=79.33%, ctx=23463, majf=0, minf=8
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: bw=219MiB/s (229MB/s), 219MiB/s-219MiB/s (229MB/s-229MB/s), io=3070MiB (3219MB), run=14049-14049msec
WRITE: bw=73.0MiB/s (76.6MB/s), 73.0MiB/s-73.0MiB/s (76.6MB/s-76.6MB/s), io=1026MiB (1076MB), run=14049-14049msec

Disk stats (read/write):
dm-0: ios=777977/260103, merge=0/0, ticks=35016/12100, in_queue=47104, util=99.29%, aggrios=785923/262711, aggrmerge=4/88, aggrticks=38838/13620, aggrin_queue=45424, aggrutil=98.86%
sda: ios=785923/262711, merge=4/88, ticks=38838/13620, in_queue=45424, util=98.86%