Slow NFS

45ygfxs · Jan 19, 2018

Running out of ideas if anyone has any suggestions it would be greatly appreciated.

Have an Ubuntu 16.04 VM running on Proxmox with an LSI SAS card passing through to JBOD. Disks are pooled via Mergerfs and shared via NFS and Samba with other VM's on this and another server. Locally seeing 500-600MB/s write speeds and all is well. Samba/CIFS shares on other VM's see ~300MB/s. Unfortunately, not able to get much beyond ~110MB/s on NFS.

Servers are physically connected via 1Gb through a switch along with a second 10Gb direct link (on different subnets). Validated all interfaces are using a standard 1500 MTU. iPerf tests between physical boxes to/from this NAS VM show speeds within 20% expectation of those peak rates for overhead. VM's on the same server are much faster.

Testing using dd:

Code:

dd if=/dev/zero of=/media/pool/downloads/test bs=1M count=1024

Fairly common NFS export:

Code:

/media/pool                     *(rw,no_root_squash,insecure,no_subtree_check,fsid=101)

..and corresponding mount Options on the client side:

Code:

192.168.0.44:/media/pool               /media/temp             nfs             rw,intr,hard,async,retrans=2,noatime,rsize=8192,wsize=8192,vers=3,timeo=600       0 0

Tried playing with rsize/wsize:

rsize=8192,wsize=8192
- 43MB/s
rsize=32768,wsize=32768
- 64.6MB/s
rsize=65536,wsize=65536
- 84.8MB/s
rsize=1048576,wsize=1048576
- 107-120 MB/s
rsize=131072,wsize=131072
- 97.7MB/s
rsize=524288,wsize=524288
- 101 - 104 MB/s
rsize=4194304,wsize=4194304
- 111 - 110 MB/s

Also tried playing with the MergerFS settings:

Code:

#defaults,direct_io,allow_other,fsname=mergerfsPool,category.create=epmfs        0 0
#defaults,direct_io,func.getattr=newest,allow_other,minfreespace=50G,fsname=mergerfs,category.create=epmfs,intr,readdir_ino,noforget     0 0
#defaults,direct_io,func.getattr=newest,allow_other,minfreespace=50G,fsname=mergerfs,category.create=epmfs      0 0
#defaults,direct_io,func.getattr=newest,allow_other,use_ino,minfreespace=50G,fsname=mergerfs,category.create=epmfs  0 0
#defaults,direct_io,allow_other,minfreespace=20G,fsname=MergerFS,category.create=ff     0 0
#defaults,allow_other,use_ino,func.getattr=newest,category.create=epmfs,moveonenospc=true,minfreespace=50G,fsname=mergerfsPool,dropcacheonclose=true    0 0
#defaults,allow_other,minfreespace=20G,fsname=mergerfsPool,category.create=epmfs,intr,readdir_ino,noforget      0 0

Same symptoms on Ubuntu default kernel/tuning settings. But updated to the following with no change:

Code:

net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_synack_retries = 3
net.ipv4.tcp_syn_retries = 3
net.ipv4.tcp_rfc1337 = 1
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.conf.all.log_martians = 1
sysctl: cannot stat /proc/sys/net/ipv4/inet_peer_gc_mintime: No such file or directory
net.ipv4.tcp_ecn = 0
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_fack = 1
net.ipv4.tcp_dsack = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 0
sysctl: cannot stat /proc/sys/net/ipv4/tcp_tw_recycle: No such file or directory
net.ipv4.tcp_max_syn_backlog = 20000
net.ipv4.tcp_max_orphans = 9297
net.ipv4.tcp_orphan_retries = 1
net.ipv4.tcp_fin_timeout = 20
net.ipv4.tcp_max_tw_buckets = 743424
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.netdev_max_backlog = 2500
net.core.somaxconn = 65000
vm.swappiness = 0
vm.dirty_background_ratio = 5
vm.dirty_ratio = 15
vm.min_free_kbytes = 59503
fs.file-max = 371712
fs.suid_dumpable = 2
kernel.printk = 4 4 1 7
kernel.core_uses_pid = 1
kernel.sysrq = 0
kernel.msgmax = 65536
kernel.msgmnb = 65536
kernel.shmmax = 5483866521
kernel.shmall = 1487594

45ygfxs · Jan 19, 2018

Additional info that may help, but can provide more.

NFS Stat:

Code:

Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
1039273    0          0          0          0

Server nfs v3:
null         getattr      setattr      lookup       access       readlink
57        0% 246288   23% 22        0% 275625   26% 220450   21% 0         0%
read         write        create       mkdir        symlink      mknod
7019      0% 249856   24% 4         0% 0         0% 0         0% 0         0%
remove       rmdir        rename       link         readdir      readdirplus
2         0% 0         0% 0         0% 0         0% 0         0% 27545     2%
fsstat       fsinfo       pathconf     commit
1         0% 60        0% 25        0% 12290     1%

Haven't had any retransmits or anything but did go ahead and bump RPCNFSDCOUNT=32.

Mount stats, maybe inexperience but don't see anything off here either:

Code:

Stats for 192.168.0.44:/media/pool mounted on /media/temp:
  NFS mount options: rw,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.44,mountvers=3,mountport=13199,mountproto=udp,local_lock=none
  NFS server capabilities: caps=0x3fcf,wtmult=4096,dtsize=4096,bsize=0,namlen=255
  NFS security flavor: 1  pseudoflavor: 0

NFS byte counts:
  applications read 0 bytes via read(2)
  applications wrote 1073741824 bytes via write(2)
  applications read 0 bytes via O_DIRECT read(2)
  applications wrote 0 bytes via O_DIRECT write(2)
  client read 0 bytes via NFS READ
  client wrote 1073741824 bytes via NFS WRITE

RPC statistics:
  1039 RPC requests sent, 1039 RPC replies received (0 XIDs not found)
  average backlog queue length: 0

WRITE:
        1024 ops (98%)
        avg bytes sent per op: 1048732  avg bytes received per op: 136
        backlog wait: 3774.995117       RTT: 321.236328         total execute time: 4096.318359 (milliseconds)
GETATTR:
        3 ops (0%)      1 retrans (33%)         0 major timeouts
        avg bytes sent per op: 100      avg bytes received per op: 112
        backlog wait: 0.000000  RTT: 1.333333   total execute time: 1.666667 (milliseconds)
ACCESS:
        3 ops (0%)
        avg bytes sent per op: 136      avg bytes received per op: 120
        backlog wait: 0.000000  RTT: 0.333333   total execute time: 0.333333 (milliseconds)
LOOKUP:
        2 ops (0%)
        avg bytes sent per op: 142      avg bytes received per op: 228
        backlog wait: 0.000000  RTT: 0.000000   total execute time: 0.000000 (milliseconds)
FSINFO:
        2 ops (0%)
        avg bytes sent per op: 88       avg bytes received per op: 80
        backlog wait: 0.000000  RTT: 0.000000   total execute time: 0.000000 (milliseconds)
SETATTR:
        1 ops (0%)
        avg bytes sent per op: 172      avg bytes received per op: 144
        backlog wait: 0.000000  RTT: 148.000000         total execute time: 148.000000 (milliseconds)
PATHCONF:
        1 ops (0%)
        avg bytes sent per op: 88       avg bytes received per op: 56
        backlog wait: 0.000000  RTT: 0.000000   total execute time: 0.000000 (milliseconds)
COMMIT:
        1 ops (0%)
        avg bytes sent per op: 148      avg bytes received per op: 128
        backlog wait: 0.000000  RTT: 1126.000000        total execute time: 1126.000000 (milliseconds)

Network interfaces on client side are clean:

Code:

ens18: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.113  netmask 255.255.255.0  broadcast 10.0.0.255
        inet6 fe80::6c19:6dff:febe:922  prefixlen 64  scopeid 0x20<link>
        ether 6e:19:6d:be:09:22  txqueuelen 1000  (Ethernet)
        RX packets 57284554  bytes 119844632108 (119.8 GB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 38357955  bytes 13045905315 (13.0 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens19: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.113  netmask 255.255.255.0  broadcast 192.168.0.255
        inet6 fe80::6cd6:3dff:fe25:8b6  prefixlen 64  scopeid 0x20<link>
        ether 6e:d6:3d:25:08:b6  txqueuelen 1000  (Ethernet)
        RX packets 622752493  bytes 435446918715 (435.4 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 545637417  bytes 500085849311 (500.0 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

On the server side, some drops on the 1Gb interface (about .1%), but not enough and most testing is being done on the 10Gb interface which is 100% clean:

Code:

ens18     Link encap:Ethernet  HWaddr e6:f6:f7:9f:8b:8e
          inet addr:10.0.0.44  Bcast:10.0.0.255  Mask:255.255.255.0
          inet6 addr: fe80::e4f6:f7ff:fe9f:8b8e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:164685400 errors:0 dropped:183246 overruns:0 frame:0
          TX packets:161269563 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:79754109793 (79.7 GB)  TX bytes:90467958399 (90.4 GB)

ens19     Link encap:Ethernet  HWaddr 46:7e:bf:28:46:42
          inet addr:192.168.0.44  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::447e:bfff:fe28:4642/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:637949592 errors:0 dropped:0 overruns:0 frame:0
          TX packets:571179595 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:589124907966 (589.1 GB)  TX bytes:874243075824 (874.2 GB)

mrkrad · Jan 19, 2018

try adding async in /etc/exports options!

45ygfxs · Jan 19, 2018

Adding async on the exports and now seeing between 87.9 and 89.1.

45ygfxs · Jan 23, 2018

Stewing on this more. Did a release upgrade to a newer version of Ubuntu to get NFS and related packages upgraded as well with no change. Then tried running NFS against a single disk directly, speeds reached +500MB/s. Remounted just that single disk in a MergerFS Pool and back down to ~100MB/s. So that seems to rule out NFS in of itself.

So to recap. Speeds locally against both a single disk and MergerFS are pretty close to 500MB/s. Speeds via NFS are near that limit against a single disk direct but drop to 100MB/s against a MergerFS pool regardless of disks included or mount options tried. In addition to the options above did attempt just direct_io and allow_other to simplify but saw no change in performance. Removing direct_io does drop it down to ~20MB/s too with local tests down to about ~100MB/s.

Did more Google research and came across a similar thread which noted a high backlog wait on NFS over MergerFS. Saw the same on my output above with the write queue = backlog wait: 3774.995117

In the other thread the user reduced vCPU's from 4 to 1. Did the same and speeds improved. Now over 200MB/s and near Samba speeds. Even 2 vCPU's had the same trouble as 4. Did try setting threads=1 and threads=4 on the MergerFS mount but made no difference when multiple vCPU's were available.

Also missed, using compiled version of MergerFS:
2.23.1-12-gb48054f-dirty

Just leaving a single core on the VM for the time being. If a fix ever comes up will be sure to update.

gigatexal · Jan 23, 2018

wow that's messed up -- nice find though

Search

Slow NFS

45ygfxs

New Member

45ygfxs

New Member

mrkrad

Well-Known Member

45ygfxs

New Member

45ygfxs

New Member

gigatexal

I'm here to learn