SSD performance - issues again

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
There is no physical barrier. This is more an example of the 80:20 rule what means that you only need to spend 20% of all efforts to reach 80% of a maximum. If you want to reach the maximum you need to spend 80% of all efforts for the missing 20%

I see it like
Reason is the ZFS filesystem with its goal of ultimate datasecurity what means more data and data processing due checksums, CopyOnWrite that increases data that must be written as all writes are ZFS blockwise as there is no infile data update. This also adds more fragmentation. As in a ZFS Raid, data is spread quite evenly over the whole pool there is hardly a pure sequential datastream. Disk iops is a limiting factor then. Without these security related items a filesysystem can be faster.

And even with a quite best of all DC/P 3700 you are limited by traditional Flash what means you must erase a SSD block prior write a page with a ZFS datablock and the need of trim and garbage collection. With 3 GB/s and more you are also in regions where you must care of internal performance limits regarding RAM, CPU, PCIe bandwith so this require a fine tuning on a lot of places. This is propably the reason why a genuine Solaris is faster than Open-ZFS..

So unless you cannot do a technological jump like with Optane that is not limited by all the Flash restrictions as it can read/write any cell directly similar to RAM. Optane can give up to 500k iops, ultralow latency down to 10us without any degration after time or the need of trim or garbage collection to keep performance high. This is 3-5x better than a P 3700 and the reason why it can double the result of a P3700 pool.

In the end you must also accept a benchmark as a synthetic test to check performance in a way that limits effects of RAM and caching. This is what makes ZFS fast despite the higher security approaches. On real workloads more than 80% of all random reads are delivered from RAM what makes pool performance not relevant and all small random writes go to RAM (beside sync writes). Sync write performance in a benchmark is propably the only value that give a correct relation to real world performance. Other benchmarks are more a hint that performance is as expected or to decide if a tuning or modification is helpful or not.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
I totally agree - sync write is the only thing that I really look at as this is the one that impacts VM write speeds... thats why its frustrating to see that both 3 and 7 way mirrors only reach 360 MB/s;)

So what are big vendors using for high speed sync writes? Or what did they use to use before Optane? What write speeds do they get?
 

i386

Well-Known Member
Mar 18, 2016
4,221
1,540
113
34
Germany
There are special/custom devices like flashtec nvram that are used in high end storage appliances.
These devices are dram based (+ nand for backup in case of power loss) and can do 200k+ 4k random io @ qd1/1thread ._.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Hm sounds interesting albeit not in my price range most likely;)

Single Pair of P3700's
upload_2018-5-7_10-32-1.png
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Optane can be used as datadisk and as slog performace booster for slower pools.
Beside Optane there are other fast devices for sync logging, partly based on Dram or fast Flash.

Most vendors show you only sequential values, latency and iops, not the low sync write performance.
If you want you read about sync write on other high security filers with a CoW filesystem (similar to ZFS),
you may look or google for comments on netApp, a leading storage vendor ex

"netapp nfs sync write performance"

Only hint in the specs is latency + 4k/random write iops as there is a relation between them and sync write performance
(and latency does not scale with number of disks/vdevs unlike iops and sequential performance)

see Netapp all Flash arrays
Flash Array – Reduce SSD Latency with All-Flash EF-Series | NetApp
their latency is between 100 us and 800 us with up to 1m iops

If you compare:
The latency of a single DC S3700 is at 50us with 36k iops
A P3500 if you compare is at 35k iops
The latency of a single DC P3700 is at 20-30us with 175k iops

Optane 900/4800 latency is at 10us with 500k iops
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Very interesting, thanks.

So basically there are three options currently for a high speed VM filer...

1. Stay with ZFS and go optane
2. Stay with ZFS but move to async
3. Leave zfs
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Leave ZFS + sync write/ Slog mostly means using a hardware raid + cache + bbu/flash protection.
I have never seen comparable sync write values with this so I doubt this will help.

Using filesystems with a lower security level without CoW and checksums is an option but who wants that.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Well 'want' is relative.
I *want* good performance with ZFS but that seems to be difficult;)
The question is how can i get that - throwing a ton of SSDs at it doesn't help as we see. I could sell all of them off and buy 2 960GB 905's (if I make enough money), double/triple write speed and quarter available capacity, but thats not really sounding too great...

What about putting a 900p (or pair) as slog in front of the SSD array? You said usually that wouldnt be needed on SSDs but Optane is way ahead on sync write, is it not?
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
So finally was able to run an optane bench (on a pair of new 900 480's)
Quite nice for a single mirror.
upload_2018-5-7_21-0-58.png

Wonder whether that scales up...
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
I have made benchmarks on OmniOS and Solaris with 4 x 900P, see http://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf

About DC 3700 + Optane Slog
Your sync write performance (300-400 MB/s) is good enough for 10G sync write. If you want more an Optane Slog can make sense., even with a DC 3700 pool. The boost is not as big than with a slower pool but I would expect an improvement.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Yes, thats really a good reference document, thanks for that:)

And I was aiming at 40/100GB - not really expecting to fill those up but I have those cards and would like to utilize them if possible;)
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Thats not so easy in ZFS i believe. Although I never tried;)

Here are some results with 7 Mirrors and 900p as slog...

7 drives
nappitI_7.PNG
7 drives + Optane as slog
nappitI_7_slog.PNG
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
At least around 30% faster

bzw
Raid-0 in ZFS is ultra easy.
In menu Pools: select two or more disks and vdev type=basic or
Create a pool from one basic disk as vdev and add more basic disk as vdev.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Ah, never thought to use basic vdev:p thanks:)

And yes, with optane slog the ssd pool is usable. Still horrible performance for the amount of hardware involved but at least something;)
 
  • Like
Reactions: gea

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
So I created a new VM (no slog yet), ran Benchmark on it;

nappitJ_virtual_9.PNG


created a NFS share and moved a win vm to it and...

upload_2018-5-14_15-25-12.png


Not what i was hoping for ;)
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Can you add details about

Server
- vnic type (slow e1000 or fast vmxnet3)
- vmxnet3 settings/ buffers
- tcp buffers
- NFS buffers and servers

Windows client
- vnic type
- tcp settings especially interrupt throtteling (should be set to off)

background
defaults are optimised for a 1G network and to limit RAM/CPU use
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Ok, reset the Nappit Box to be sure its ok, updated to the latest and greates (omni OS/Napp-It).

Using ESX6.5U1, vmxnet3, 2658v3 4 cores, 48GB Ram, no tuning on NappIt
VM is located on the NFS share and is writing to local disk, i.e NIC is not in play; Same CPU, 2 cores, 4GB Ram


I will read the tuning guide:)
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
So added slog again...

upload_2018-5-14_22-32-12.png

Not helping ...

upload_2018-5-14_22-26-46.png


still not checked the tuning guide though;)