omnios/napp-it: Improving write performance

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

scobar

Member
Nov 24, 2013
112
19
18
Ok, been doing some tweaking, and using the benchmark>dd tool to test along the way to validate changes have improved performance.

Yesterday I was at about 150mb/sec on write tests. I added an ssd for log and another for cache. I am now pulling in 300-350+mb, but it stays in that area for larger file tests. In the past when I had more ram, and more CPU, I was able to consistently hit 800+mb/sec

As it sits, I would like to squeeze more out of this setup, and narrow down are the issues due to: disk space, cpu, memory, or configuration. Below are tests from today, 3 each of each test.

ZFS perf
Blocksize 2M
Count 100
Wait 40
Size of testfile 204.8MB
Write 204.8 MB in 0.2s = 1024.00 MB/s Write
Read 204.8 MB in 0.1s = 2048.00 MB/s Read
Write 204.8 MB in 0.1s = 2048.00 MB/s Write
Read 204.8 MB in 0.1s = 2048.00 MB/s Read
Write 204.8 MB in 0.1s = 2048.00 MB/s Write
Read 204.8 MB in 0.1s = 2048.00 MB/s Read
ZFS perf
Blocksize 2M
Count 1000
Wait 40
Size of testfile 2.048GB
Write 2.048 GB in 2.3s = 890.43 MB/s Write
Read 2.048 GB in 1.2s = 1706.67 MB/s Read
Write 2.048 GB in 2.4s = 853.33 MB/s Write
Read 2.048 GB in 1.1s = 1861.82 MB/s Read
Write 2.048 GB in 2.4s = 853.33 MB/s Write
Read 2.048 GB in 1.2s = 1706.67 MB/s Read
ZFS perf
Blocksize 2M
Count 6250
Wait 40
Size of testfile 12.8GB
Write 12.8 GB in 38.7s = 330.75 MB/s Write
Read 12.8 GB in 8.9s = 1438.20 MB/s Read
Write 12.8 GB in 38.3s = 334.20 MB/s Write
Read 12.8 GB in 9s = 1422.22 MB/s Read
Write 12.8 GB in 39s = 328.21 MB/s Write
Read 12.8 GB in 9.3s = 1376.34 MB/s Read
ZFS perf
Blocksize 2M
Count 10000
Wait 40
Size of testfile 20.48
Write 20.48 GB in 65.4s = 313.15 MB/s Write
Read 20.48 GB in 42.2s = 485.31 MB/s Read
Write 20.48 GB in 65.2s = 314.11 MB/s Write
Read 20.48 GB in 42.5s = 481.88 MB/s Read
Write 20.48 GB in 64s = 320.00 MB/s Write
Read 20.48 GB in 54.5s = 375.78 MB/s Read

It seems after the 2gb write test, performance falls flat on its face. As the file size increases, the read performance goes down as well. The read performance isn't a huge concern, but the write performance is.

System specs:
Intel s2400gp
2x Intel E5-2405L
24gb ram
esxi 5.5 w/ update 2
Omnios appliance, 18gb allocated. Pass through won't let me use more.
PCI pass through for an LSI sas3008

Currently there are 4 vdevs with 6 drives per in raidz1. Disks are Toshiba DT01ACA3. Pool cap is 79%. These are in external shelves.

What I am trying to determine what is the cause of the performance?
Is it due to pool cap?
Not enough ram?
Not enough CPU?
Configuration in question?

I'd appreciate any input. I do not see any data errors.
 
Last edited:

Hank C

Active Member
Jun 16, 2014
644
66
28
i bet it's not enough of RAM. your RAM is 18gb and your test file at 20gb so it's not pulling from the ram but instead from the disk.
 
  • Like
Reactions: T_Minus

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,640
2,058
113
Seems to me once you run out of RAM you're slowing down.
 

Entz

Active Member
Apr 25, 2013
269
62
28
Canada Eh?
Same that would be my first guess as well. Everything is getting cached until it runs out of ram and then starts hitting the disks (or rather waiting for them). Though for 24 disks I would expect a bit more then 300MB/s.
 
  • Like
Reactions: T_Minus

scobar

Member
Nov 24, 2013
112
19
18
I do have more ram in another box I can swap as a test, though its a pita to pull everything out. Other things to look at or test to further determine it is the ram or something else?
 

gea

Well-Known Member
Dec 31, 2010
3,157
1,195
113
DE
about write performance

1.
If you set sync=always, every write must be first logged to the Slog device prior a regular cached write to the pool, what means that the Slog ZIL is a limiting factor for writes: compare results with sync=disabled

2.
In a concurrent read/write situation, you can improve writes by reducing reads from disks. You need more RAM to cache data and reduce reads.

3.
try another pool layout

rule with raid-z
sequential read/write performance scale with number of data disks
iops scale with number of vdevs

rule with multi-mirror raid-1, raid 10, raid100..
sequential write and write iops scale with number of vdevs
sequential read and read iops scale with 2 x number of vdevs

To improve write performance (asuming that iops is mostly the limiting factor)
you need more vdevs - compare a multi mirror pool
 

scobar

Member
Nov 24, 2013
112
19
18
Here are the same tests with Sync Disabled:
Sync Disabled
ZFS perf
Blocksize 2M
Count 100
Wait 40
Size of testfile 204.8
Write 204.8 MB in 0.1s = 2048.00 MB/s Write
Read 204.8 MB in 0.1s = 2048.00 MB/s Read
Write 204.8 MB in 0.1s = 2048.00 MB/s Write
Read 204.8 MB in 0.1s = 2048.00 MB/s Read
Write 204.8 MB in 0.1s = 2048.00 MB/s Write
Read 204.8 MB in 0.1s = 2048.00 MB/s Read
ZFS perf
Blocksize 2M
Count 1000
Wait 40
Size of testfile
Write 2.048 GB in 2.4s = 853.33 MB/s Write
Read 2.048 GB in 1.3s = 1575.38 MB/s Read
Write 2.048 GB in 2.3s = 890.43 MB/s Write
Read 2.048 GB in 1.2s = 1706.67 MB/s Read
Write 2.048 GB in 2.2s = 930.91 MB/s Write
Read 2.048 GB in 1.2s = 1706.67 MB/s Read
ZFS perf
Blocksize 2M
Count 40
Wait 6250
Size of testfile 12.8
Write 12.8 GB in 41s = 312.20 MB/s Write
Read 12.8 GB in 7.7s = 1662.34 MB/s Read
Write 12.8 GB in 41.1s = 311.44 MB/s Write
Read 12.8 GB in 7.3s = 1753.42 MB/s Read
Write 12.8 GB in 39.9s = 320.80 MB/s Write
Read 12.8 GB in 7.7s = 1662.34 MB/s Read
ZFS perf
Blocksize 2M
Count 10000
Wait 40
Size of testfile 20.48
Write 20.48 GB in 66.2s = 309.37 MB/s Write
Read 20.48 GB in 33.5s = 611.34 MB/s Read
Write 20.48 GB in 66.1s = 309.83 MB/s Write
Read 20.48 GB in 43.9s = 466.51 MB/s Read
Write 20.48 GB in 67.7s = 302.51 MB/s Write
Read 20.48 GB in 33.1s = 618.73 MB/s Read

I'll see about swapping ram in the two boxes for further testing.
 

scobar

Member
Nov 24, 2013
112
19
18
Ok, pounded in more ram, brought the host from 24gb to 72gb, brought guest from 18gb to 64 GB. In the 2gb file size bench a very sharp improvement. With the 12gb test file, performance improves, but not as sharply.

ZFS perf
Blocksize 2M
Count 1000
Wait 40
Size of testfile 2.048GB
Write 2.048 GB in 1.3s = 1575.38 MB/s Write
Read 2.048 GB in 1.2s = 1706.67 MB/s Read
Write 2.048 GB in 1.4s = 1462.86 MB/s Write
Read 2.048 GB in 1.2s = 1706.67 MB/s Read
Write 2.048 GB in 1.4s = 1462.86 MB/s Write
Read 2.048 GB in 1.2s = 1706.67 MB/s Read
ZFS perf
Blocksize 2M
Count 6250
Wait 40
Size of testfile 12.8GB
Write 12.8 GB in 32.5s = 393.85 MB/s Write
Read 12.8 GB in 7.2s = 1777.78 MB/s Read
Write 12.8 GB in 63.8s = 200.63 MB/s Write
Read 12.8 GB in 7.3s = 1753.42 MB/s Read
Write 12.8 GB in 32.1s = 398.75 MB/s Write
Read 12.8 GB in 7.2s = 1777.78 MB/s Read
Write 12.8 GB in 31.2s = 410.26 MB/s Write Ran again after 2.048gb test file
Read 12.8 GB in 7.3s = 1753.42 MB/s Read
Write 12.8 GB in 31.1s = 411.58 MB/s Write
Read 12.8 GB in 7.5s = 1706.67 MB/s Read
Write 12.8 GB in 31.9s = 401.25 MB/s Write
Read 12.8 GB in 7.5s = 1706.67 MB/s Read
 

Entz

Active Member
Apr 25, 2013
269
62
28
Canada Eh?
Reads are scaling with ram as expected. Now that you have lots of RAM and your writes are not scaling, you are liking hitting a ZFS write throttle due to your disks not being able to handle data any faster.

try running "zpool iostat -v 1" and see what the system is doing.

If I am understanding how this thing all works. ZFS does transaction group flushes every 5s and will ensure that it only takes in as much data as can be flushed in that time period. Which means groups of 2GB or so are able to fit in memory and any more then that gets throttled waiting on the disks. That number may be tune-able, but carries more risk of course.

Will leave that for Gea :)
 

scobar

Member
Nov 24, 2013
112
19
18
For now I swapped the ram back as the machine I stole the ram from needs to be running.
 

gea

Well-Known Member
Dec 31, 2010
3,157
1,195
113
DE
As a result it seems that with the current system and pool layout, your write performance is limited to 300-400 MB/s. You can improve read performance with more RAM or writes with another pool layout.

But in general tuning is not a replacement for a general concept regarding use case and what do you need. If your network is 1GB/s, your pool is 3-4 x faster. If you use SMB on a 10 GB/s network, your pool performance is at the limit of SMB without some tunings like jumbo frames.

If you want to use the storage as an ESXi NFS/iSCSI datastore, your sequential performance is more than enough but your iops is too low what means that you should use an SSD only pool (in former times you selected a raid 1000... pool but when a single SSD offers several thousand iops while a disk with spindels offers some hundreds in best case ...)
It should be clear that in such a case you must use SSD only when you want performance. A common solution is a smaller high performance SSD pool and a larger but slower pool for general usage like filer or backup.
 
Last edited:

scobar

Member
Nov 24, 2013
112
19
18
From one machine to another I have a 10gbe link. Performance on that swings from 200-400MB/sec on reads, and writes from 150-250MB/sec over the 10gbe which is somewhat explains to how to got to wanting to improve the performance. From workstations to the ZFS box, I don't have any real performance gripes.

So if I am understanding correctly my options are:
  1. Revamp pool/vdev config (no where to temporarily place the data)
  2. Add more ram (which puts us at the max 400MB/sec)
  3. Create a pool created from SSDs to provide performance on both throughput and IOPS if I am looking to do iscis/nfs/datastore
  4. Do nothing and deal with it.