How best to determine what to upgrade for better FreeNAS performance?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

SPCRich

Active Member
Mar 16, 2017
256
137
43
42
Crappy subject, but I can't really think of how better to word it, so forgive me. I'm wondering how to go about determining bottlenecks in my system, and then using that to determine what to upgrade. First, my FreeNAS box (current):

Supermicro SC846 24 bay Chassis w/ 920W -SQ PSUs & SAS3/SATA BP (6Gbps)
Supermicro X10SDV-4C-TP4F board (4C/8T xeon-d processor)

48GB DDR4 ECC RAM (it's expensive...this was already a lot)
LSI (Broadcom) SAS 9207-8i HBA

18x WD RED 8TB drives in RAIDZ2 configs (3x 6Drive R-Z2)
2x Intel DC S3510 120GB drives mirrored for boot
FreeNAS 11.1-U5


So first things first. Using IPERF, I can get ~9.4Gbps over the 10G interface, however when using NFS or ISCSI, I max at around 2.2Gbps. (reading off SSDs into the pool, or from pool onto SSDS) During this test, I see CPU usage is ~80% across all threads in FreeNAS. This is sequential transfers using either DD or copying a large file (~80GB). This tells me first I probably need to upgrade the Xeon-D to something better, fine. But how would I go about measuring that? does clock speed mater more? Cores? (I'm mostly using either ISCSI or NFS storage)

Secondly, in D-t-D operations (this is in my home lab), I routinely conduct two functions. Copying inside the pool (1 large ~80TB pool) using Rsync shows speeds around 30-40MBps. For 18X drives in 3 stripes, I'd expect a lot faster. The second is using the pool for ISCSI storage for Kubernetes pods. Performance here is acceptable (I'm definitely not maxing out 10Gbps speeds), but I notice that I have a lot of ARC misses. Would adding something like an Optane 900p help either of these?

My main goal is as low power as possible, followed by decent performance. When I rebuild/build the next iteration of my FreeNAS system, I'm looking at either one of the newer Xeon-D platforms with more cores, or a Single Xeon E5-1630/1650 or something, unless someone says something else is more than adequate for 10Gbps Seq R/W? When I look at processor/board specs, I'm not sure what my limiting factor in the (inter-pool copying and NFS/ISCSI), so I'm not entirely sure what would be a decent upgrade. I definitely don't want/need dual processors, but either an E3, Xeon-D or something else?


TL;DR: given drives capable of multi gigabit transfers, which component(s) would offer the best upgrade path with least power? Do I need a Xeon E5-16xx processor? Is it Ghz related? Core related? or something else? Can I get by with a higher core count Xeon-D processor or do I need to go full on Xeon -E3 or E5?

Thanks!
 

K D

Well-Known Member
Dec 24, 2016
1,439
320
83
30041
Subscribing to the thread. With a similar setup I have been able to do multiple parallel file copies totaling ~800MB/S sustained but unable to reach more than ~200MB/S with a single stream. Interested to see if I can improve the performance.
 

kapone

Well-Known Member
May 23, 2015
1,095
642
113
You have to follow a process of elimination. For .e.g

1. Is the CPU enough? - Create RAM disks on both ends to eliminate disk subsystem and test
2. If #1 works, test your disk susbsystem locally first. Can it support the speeds you're looking for?
3. If #2 works, try each protocol one by one, with a Ramdisk still on one end, to eliminate the other end.

and so on. There may be optimizations for one or the other protocol that may make it faster.

My point being, it could be any number of things.

As a benchmark, with an E3-1270v2 (3.5Ghz 4c/8T) on each end, I can push ~9.5gbps over NFS at ~75% CPU usage. This is with Mellanox CX3 nics. I have never used the AM4 10gbe nics that are in your motherboard, so can't comment on the performance of the nics themselves.
 

SPCRich

Active Member
Mar 16, 2017
256
137
43
42
You have to follow a process of elimination. For .e.g

1. Is the CPU enough? - Create RAM disks on both ends to eliminate disk subsystem and test
2. If #1 works, test your disk susbsystem locally first. Can it support the speeds you're looking for?
3. If #2 works, try each protocol one by one, with a Ramdisk still on one end, to eliminate the other end.

and so on. There may be optimizations for one or the other protocol that may make it faster.

My point being, it could be any number of things.

As a benchmark, with an E3-1270v2 (3.5Ghz 4c/8T) on each end, I can push ~9.5gbps over NFS at ~75% CPU usage. This is with Mellanox CX3 nics. I have never used the AM4 10gbe nics that are in your motherboard, so can't comment on the performance of the nics themselves.

Thanks my two K8s nodes both have AOC-STGN-I2S 10GB adapters (supermicro 2x SFP+) in them so I can easily test #1. 75% seems really high which makes me think GHZ is more important to NFS, since the D-1518 is only a 2.2Ghz chip, but let me try this tonight. thanks!
 

SPCRich

Active Member
Mar 16, 2017
256
137
43
42
tested both using DD if=/dev/zero and from a file on the Xeon-D box to a ramdrive on one of the k8s nodes, had almost the exact same performance

Writing to ramdrive:

Code:
dd if=/dev/zero of=/tmp/mnt10g/test.dat bs=1M count=10000
1.55Gbps Write @ 30-60% usage / thread
Code:
dd if=/mnt/media/file.mkv of=/tmp/mnt10g/test.dat bs=1M count=10000
1.57Gbps Write @ 30-60% usage / thread

Reading from ramdrive over NFS to /dev/null
Code:
dd if=/tmp/mnt10g/test.dat of=/dev/null bs=1M
3.04Gbps @30-60% usage /thread

Code:
dd if=/tmp/mnt10g/test.dat of=/dev/null
3.05 Gbps @30-60% usage /thread

Given that the local FS on the xeon-d box and using if=/dev/zero both have almost the exact same performance, this doesn't appear to be a FS issue per-se, since dev/zero and a random filled file both have the exact same performance, but it also doesn't appear to be the CPU since it's not hitting 100%, unless it's simply a Ghz thing...
 

manxam

Active Member
Jul 25, 2015
234
50
28
This is the sole reason that I'm still using OmniOS. From what I've been able to determine, ISCSI, NFS, and even SMB are single threaded applications within Linux and BSD and, therefore, need ghz not cores to make them fly.
Within OpenSolaris, these are multi-threaded and built into the kernel and, therefore, MUCH MUCH faster utilizing far less CPU.

I suspect that your best bet is to find a lower core count processor with faster single threaded speed in order to achieve more consistent transfers.
@gea may be able to shed a little more light on this.
 

BackupProphet

Well-Known Member
Jul 2, 2014
1,083
640
113
Stavanger, Norway
olavgg.com
FreeBSD 12 is coming with pNFS, which will hopefully improve performance https://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt

Another thing which helps with NFS performance when doing sequential transfers is mounting with higher read and/or write size for example
-o rsize=65536,wsize=65536

You could also post your nfsd settings.Not sure how different settings are between FreeBSD and FreeNAS, but mine looks like this (default settings)
Code:
$ sudo sysctl vfs.nfsd
vfs.nfsd.fha.fhe_stats: No file handle entries.
vfs.nfsd.fha.max_reqs_per_nfsd: 0
vfs.nfsd.fha.max_nfsds_per_fh: 8
vfs.nfsd.fha.bin_shift: 22
vfs.nfsd.fha.enable: 1
vfs.nfsd.request_space_throttle_count: 0
vfs.nfsd.request_space_throttled: 0
vfs.nfsd.request_space_low: 1045524480
vfs.nfsd.request_space_high: 1568286720
vfs.nfsd.request_space_used_highest: 620544
vfs.nfsd.request_space_used: 0
vfs.nfsd.groups: 1
vfs.nfsd.threads: 64
vfs.nfsd.maxthreads: 64
vfs.nfsd.minthreads: 64
vfs.nfsd.cachetcp: 1
vfs.nfsd.tcpcachetimeo: 43200
vfs.nfsd.udphighwater: 500
vfs.nfsd.tcphighwater: 0
vfs.nfsd.enable_stringtouid: 0
vfs.nfsd.debuglevel: 0
vfs.nfsd.enable_locallocks: 0
vfs.nfsd.issue_delegations: 0
vfs.nfsd.commit_miss: 0
vfs.nfsd.commit_blks: 0
vfs.nfsd.mirrormnt: 1
vfs.nfsd.async: 0
vfs.nfsd.server_max_nfsvers: 4
vfs.nfsd.server_min_nfsvers: 2
vfs.nfsd.nfs_privport: 0
vfs.nfsd.allowreadforwriteopen: 1
vfs.nfsd.writedelegifpos: 0
vfs.nfsd.v4statelimit: 500000
vfs.nfsd.sessionhashsize: 20
vfs.nfsd.fhhashsize: 20
vfs.nfsd.clienthashsize: 20
vfs.nfsd.statehashsize: 10
vfs.nfsd.enable_nogroupcheck: 1
vfs.nfsd.enable_nobodycheck: 1
vfs.nfsd.enable_checkutf8: 1
 
Last edited: