Proxmox High IO Delay - Low Fsync. What's wrong?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

JC Connell

Member
Apr 17, 2016
52
6
8
36
I recently built a dual E5-2670 Proxmox host after discovering how affordable they had become. I purchased many of the parts from the suggestions of users in the great deals forum. This is really my first foray into virtualization so I am still learning how to address some issues.

The host has:
-Intel S2600cp
-2x E5-2670
-128GB RAM
-2x 256GB SSD in mirrored ZFS vdev (Crucial MX100, Samsung pull from Lenovo T510, unsure of model), connected to the onboard SATA ports for VM
-Intel RMS25KB080 cross flashed with LSI9205-83 IT firmware
-- 2x 2TB + 2x 6TB in mirrored ZFS vdevs for data attached the the Intel card

Last night, I noticed IO delay in the 7-10% range and some of my VMs slowed down considerably. I run about 8 containers and 2 VMs, all using the defaults or suggested best practices by Proxmox.

I ran pveperf on the two pools and got the numbers below:

Code:
pveperf VM vdev:
CPU BOGOMIPS:      166025.28
REGEX/SECOND:      1528845 
HD SIZE:           185.95 GB (r0ssd240gb)
FSYNCS/SECOND:     159.21
DNS EXT:           102.47 ms
DNS INT:           76.59 ms

pveperf data vdev
REGEX/SECOND:      1490789
HD SIZE:           6825.08 GB (vol1)
FSYNCS/SECOND:     84.23
Are these numbers normal? Can they be improved? What should I look into to improve performance?
 

rubylaser

Active Member
Jan 4, 2013
846
236
43
Michigan, USA
I recently built a dual E5-2670 Proxmox host after discovering how affordable they had become. I purchased many of the parts from the suggestions of users in the great deals forum. This is really my first foray into virtualization so I am still learning how to address some issues.

The host has:
-Intel S2600cp
-2x E5-2670
-128GB RAM
-2x 256GB SSD in mirrored ZFS vdev (Crucial MX100, Samsung pull from Lenovo T510, unsure of model), connected to the onboard SATA ports for VM
-Intel RMS25KB080 cross flashed with LSI9205-83 IT firmware
-- 2x 2TB + 2x 6TB in mirrored ZFS vdevs for data attached the the Intel card

Last night, I noticed IO delay in the 7-10% range and some of my VMs slowed down considerably. I run about 8 containers and 2 VMs, all using the defaults or suggested best practices by Proxmox.

I ran pveperf on the two pools and got the numbers below:

Code:
pveperf VM vdev:
CPU BOGOMIPS:      166025.28
REGEX/SECOND:      1528845
HD SIZE:           185.95 GB (r0ssd240gb)
FSYNCS/SECOND:     159.21
DNS EXT:           102.47 ms
DNS INT:           76.59 ms

pveperf data vdev
REGEX/SECOND:      1490789
HD SIZE:           6825.08 GB (vol1)
FSYNCS/SECOND:     84.23
Are these numbers normal? Can they be improved? What should I look into to improve performance?
Hello! You have a nice motherboard, CPU, and RAM setup for Proxmox (I have the same setup at home for my fileserver). To answer your question, yes, those fsync/second numbers are low. But, the only way to improve them is either forego ZFS and use a hardware RAID controller with a cache + ext4 (not my preferred method with Proxmox) or add a lower latency ZIL device to your pools. I usually just use 200GB Intel S3700 SSD's at home, but these have gotten very expensive even used. Another option is the Intel S3610 200GB for around $160/each on eBay.

Here is a nice page showing how different SSD's work as a journal (this would also show what works well as a ZIL). Obviously, enterprise NVMe drives are much better suited for this, but also much more expensive.

Ceph: how to test if your SSD is suitable as a journal device? | Sébastien Han

Here's more data showing the performance gap of Crucial MX100 SSD's vs. Intel 3500/3700.
SSD ZFS ZIL SLOG Benchmarks - Intel DC S3700, Intel DC S3500, Seagate 600 Pro, Crucial MX100 Comparison | b3n.org
 
  • Like
Reactions: JC Connell

ttabbal

Active Member
Mar 10, 2016
743
207
43
47
If you want to test and see if an SLOG will help, set sync=disabled on the pools. There are some downsides to doing that long term, so know what they are before you leave it that way. If you get good performance with sync off, then it's worth thinking about SLOG. If not, you need to look elsewhere as a log device only helps with sync writes.
 
  • Like
Reactions: JC Connell

JC Connell

Member
Apr 17, 2016
52
6
8
36
Thank you for the responses everyone!

@ttabbal, I disabled sync and tested pveperf again. This time I received an fsync value of 21274.15. That's an improvement of more than 130x!!!!


Should I put an Intel s3700 in as a SLOG? It would be awesome to purchase 2 of them and add a new mirrored vdev, but I could purchase one much sooner than two.
 

ttabbal

Active Member
Mar 10, 2016
743
207
43
47
More of a difference than I expected! If you want to turn sync back on, yes, do the SLOG device. You likely don't need a big one, 8 or 16GB is likely more than enough. Just create a small partition and use that. Unless there is a better way to over provision them. That allows the SSD to have loads of empty space to work with when writing. As I understand it, the s3700 is a great choice.

You can also run with sync off. That means that some data may be lost in a power outage etc.. It could break the pool, but I don't think so. There is a lot of redundant metadata and I have never had a problem with mine, it's about 6 years old. I seem to remember reading some info from a ZFS developer saying that the in flight data would be lost, but the filesystem itself would be fine. There could be some application level breakage, but that's the price you pay for write caching. If you are using it for database storage, and probably VMs, I would run sync. For media storage, it's not really needed. It's your data, so you have to decide.
 

rubylaser

Active Member
Jan 4, 2013
846
236
43
Michigan, USA
You definitely want to have sync enabled with VMs running on the pool or you risk corrupted, non-working virtual machines if something goes sideways.
 

dswartz

Active Member
Jul 14, 2011
610
79
28
That depends. If you are willing to lose, say, an hour's worth of data on a VM, and you do hourly snapshots of the dataset or zvol, this is workable...
 

rubylaser

Active Member
Jan 4, 2013
846
236
43
Michigan, USA
That depends. If you are willing to lose, say, an hour's worth of data on a VM, and you do hourly snapshots of the dataset or zvol, this is workable...
True, but I don't know of too many home users that take hourly snapshots of all of their VMs. Especially if any of them are content acquisition boxes (Usenet, torrent, etc.) as those snapshots will eat up space in a hurry.
 

ttabbal

Active Member
Mar 10, 2016
743
207
43
47
An hour? I don't think that the data sits around that long in RAM. I believe ZFS defaults to 10 second commits.

For snapshots, keep any VM storage on its on dataset and snapshot that. No need to snapshot media storage.
 

dswartz

Active Member
Jul 14, 2011
610
79
28
True. On the other hand, there are any number of scripts and such that allow you to take snaps on configurable times, and keep a reasonable number of old snaps (auto-deleting 'too old' ones.) Pay your money and take your choice :) Yeah, the 3700 is excellent. OP, you don't really need to mirror these. If the SLOG dies, ZFS will just write to on-pool ZIL (it will get slow again.) Nothing bad *should* happen in this event...
 

JC Connell

Member
Apr 17, 2016
52
6
8
36
I saw a very significant improvement!!! IODelay is no longer a concern for me. I'm currently running both Toshiba's in a RAID 0 configuration but the difference in FSYNC between a RAID 0 and a RAID 1 was very small ( < 200 ).

Currently at 2196.62 for FSYNC on the VM vdev.