Optane NVMe for Slog/ Pooldisks or All-in-one via vdisk on OmniOS

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
I have made some tests with the new Intel Optane 900P

Content:
1. The new Intel Optane 900P
2. Disk based pool + Optane Slog
3. SSD based pool + Optane Slog
4. NVMe Flash based pool + Optane Slog
5. Pool build from Optane 900P without Slog

6. What about a ZeusRAM
7. All-In-One Optane vdisk as an Slog
8. Intel Optane vdisk for a pool

http://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf
 

i386

Well-Known Member
Mar 18, 2016
4,220
1,540
113
34
Germany
(I cannot say why sync=always is even slightly faster)
I can imagine that it has something to do with the different queue depths:

sync on: low queue depth & queue not optimized by zfs
optane controller checks queue -> low queue depth, no need to run a fancy algorithm -> write data

sync disabled: high queue depth & queue optimized by zfs
optane controller checks queue -> high queue depth ->runs a fancy algorithm to optimize the queue -> write data
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Yes indeed.
Writes via sync=disabled were always (much) faster in the past than writes via sync=always. There must be something in ZFS that allows sync write to be faster with a device like an Intel Optane. than writes with the "unsecure setting" sync=disabled.
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
I have a P3600 and a DC 3700 Sata around.
I will include them when I redo my tests with a different hardware and a different set of benchmarks.
 
  • Like
Reactions: J-san

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Any timeframe on these tests (when and how long) ? I could borrow you a P3700/400 for testers if you want...
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
I have modified my benchmark script in menu Pools > Benchmark (uploaded to current napp-it) to make benchmarking faster as the new script processes two tests for small random files (a single process write loop to write 8k datablocks and a filebench random test) and two tests for sequuential data (dd and a filebench). Each of the 4 tests is automatically done with sync=always and sync=disabled.

Currently I have tested a pool from 4 HGST HE8 (4 basic vdev), a pool from 4 Intel DC3510 (4 basic vdev), a pool from 2 NVME Intel P750 (2 basic vdev) and a pool from one or two Optane 900P in a midrange server. Each pool was tested without Slog, with a ZeusRAM, a P3600 and an Optane 900P as Slog

Additionally I tested AiO performance compared to barebone perfomance (currently without Optaneas I had problems with them under ESXi)

6. Fazit for a midrange server and sync write

Code:
Disk Pool    8k sync/unsync /s    random sync/unsync/s    seq sync/unsync/s    dd sync /unsync /s
no slog         264K / 1.8M        1.8M / 72M        260M/ 957M        260M/ 957M
ZeusRAM    992K / 1.9M        40M / 75M        256M / 1023M        285M/ 983M
P3600 slog    1M / 1,8M        65M / 71M        341M / 1023M         386M/ 1.1G
Optane Slog    1M / 1.8M        68M/ 73M        512M / 1023M        854M/957M

Fazit: A disk based pool always need a good Slog for fast random sync ans sequential writes.
Only the 900P improves also sequential writes to a similar value as unsync writes so sync=always is
always a suggested setting there - even for a filer.


SSD Pool      8k sync/unsync /s    random sync/unsync/s    sequ sync /unsync /s    dd sync/unsync/s
no slog         1M/ 1.9M        13.8M / 55.8M        341M/ 1023M        416M/ 896M
ZeusRAM    1.1M/ 1.8M        43M / 56M        256M / 1023M        290M/ 818M
P3600 slog    1.0M/1.9M        66.6M/ 55.4M        341M/ 1023M        358M/ 818M
Optane Slog    1M / 1.8M        51M/53M        512M/ 1023M        775M/ 824M

A SSD only pool improves from an Slog but only if it is essentially faster than the combined SSD
poolperformance. Only the Optane really helps as it keeps small random writes high and nearly doubles
the dd sync value.


NVMe Pool      8k sync/unsync /s    random sync/unsync/s    sequ sync /unsync /s    dd sync/unsync/s
no slog         1.3M/ 1.9M        51.4M / 139M        1023M/ 1023M        944M/ 946M
Optane Slog    1M / 1.8M        60M/153M        511M/ 1023M        1.1G/ 1.6G

The NVMe pool from 2 NVMe basic vdevs is faster than the SSD pool from 4 SSD even with an Optane there as Slog. The additional Optane as Slog boosts unsync values what makes me believe that ZFS is working differently when there is an Slog available.


Optane  Pool      8k sync/unsync /s    random sync/unsync/s    sequ sync /unsync /s    dd sync/unsync/s
one 900p    1.1M/ 1.9M        50M / 154M        511M/ 1023M        801M/ 1.4G
two 900P    1.1M/ 1.9M        54M/ 137M        512M/ 1023M        1.2G/ 1.7G

A pool from a single Optane 900P is faster than the raid-0 of two NVMe vdevs.
As performace improvemenmt of two Optanes is not as expected, I suppose the lowend/midrange server is the limiting factor. I will add a test with a faster and more Optanes later.
http://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf
 
  • Like
Reactions: Rand__ and T_Minus

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
I look forward to testing on pools with more than 4x SSD :)
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
4 SSDs in a Raid-0 (4 vdevs) is 8 SSDs when you want redundancy in a Raid-10 config for same write performance..

This system is a low power system with a 35W Xeon 4Core CPU.
For higher throughput I need a faster system (while 1.2 Gbyte/s sequential sync performance from 2 Optane is hmmm not bad.. /et schlecht würd dr Schwoab saga)

I expect to get a new X11SPH-nCTPF mainboard with a silver Xeon and two more Optane in the first december week. Overall throughput of 4 Optane should be more or less like 16 SSD if there is no limiting hardware. I hope to go up sequentially to around 5 GB/s with sync values at 3-4 GB/s if the current performance scales well.
 
Last edited:
  • Like
Reactions: Stux

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Thanks for the offer for the P3700.
But I suppose it is slighty faster than the P3600 and far below the 900P Optane.
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Item 7. All-In-One Optane vdisk as an Slog is gone from the pdf? Or am I just blind?;)
 
  • Like
Reactions: gea

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
I espect the difference P3600 vs P3700 as slog is similar than between two average benchmark run. For a really statistically relevant value you must run every benchmarks for several times. Needs a lot of time. My values are just run and set. But I do not expect it really changes the general conclusion: Optane is a game-changing technology similare the one from disk to SSD.
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Well I think the 3700 was no slouch and might be a bit faster than a 3600, but o/c not even close to Optane. I agree its another dimension - not in raw speed but in latency.
Thats (together with the vmware issues) makes your idea of virtualizing the slog device intriguing;) Regular drives probably would have been too slow for it, but Optane might pull it off. Considerable loss of speed irrc from the last pdf but still faster than other drives... and way more versatile then o/c
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Item 7. All-In-One Optane vdisk as an Slog is gone from the pdf? Or am I just blind?;)
Yesterday my Optane works under ESXi - today not, even not after a Disk initilize that works yesterday, therfore I skipped this with current benchmarks. Yesterday the Optane improves write values when used as vdisk but not the same amount than on a barebone setup. Hope to get it working again tomorrow (current state is a cannot write config to disk in ESXi)

btw
My first tests were on a faster machine but this one died during tests, very sad. But an Optane without an unwanted disk or controller cache makes an ESXi vdisk as slog an option - especially as you can use the Optane for other VMs or an L2Arc as concurrent reads should not affect concurrent sync writes like with Flash based NVMEs.

Optane is the first NVMe where a ESXi vdisk does not come with a bad taste for a high performance/ high data security demand.
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Ok, I fixed the partition problem in ESXi and added the AiO benchmarks.
I deleted/ recreated a GPT partition with gparted and was now able to use the Optane under ESXi

Fazit for AiO (ESXi with a virtualized ZFS appliance)

Barebone Setup

Disk Pool 8k sync/unsync /s random sync/unsync/s seq sync/unsync/s dd sync /unsync /s
no slog 264K / 1.8M 1.8M / 72M 260M/ 957M 260M/ 957M
Optane Slog 1M / 1.8M 68M/ 73M 512M / 1023M 854M/957M

SSD Pool 8k sync/unsync /s random sync/unsync/s sequ sync /unsync /s dd sync/unsync/s
no slog 1M/ 1.9M 13.8M / 55.8M 341M/ 1023M 416M/ 896M
Optane Slog 1M / 1.8M 51M/53M 512M/ 1023M 775M/ 824M

Optane Pool 8k sync/unsync /s random sync/unsync/s sequ sync /unsync /s dd sync/unsync/s
one 900p 1.1M/ 1.9M 50M / 154M 511M/ 1023M 801M/ 1.4G

vs AiO setup with Slog or Optana as ESXi vdisk

Disk Pool 8k sync/unsync /s random sync/unsync/s seq sync/unsync/s dd sync /unsync /s
no slog 520K / 1.9M 1.6M / 65.8M 41,8M/ 1024M 283M/ 939M
Optane Slog 1.6M / 1.9M 39.4M/ 68.4M 512M / 1023M 849M/961M

SSD Pool 8k sync/unsync /s random sync/unsync/s sequ sync /unsync /s dd sync/unsync/s
no slog 1.5M/ 1.9M 16M / 50.2M 341M/ 1023M 423M/ 806M
Optane Slog 1.6M / 1.9M 38.2M/50.2M 512M/ 1023M 731M/ 806M

Optane Pool 8k sync/unsync /s random sync/unsync/s sequ sync /unsync /s dd sync/unsync/s
one 900p 1.6M/ 1.9M 32M / 75M 511M/ 1023M 711M/ 1.1G
Fazit Intel Optane and AiO
Despite the lower RAM and CPU performance of a virtualized SAN, the pool performance is nearly the same as the barebone values even when using an Slog as a vdisk. Even the last test with a basic ZFS pool on a vdisk shows that this pool is only minimal slower than on the barebone setup. Sync write performance with an Slog from a vdisk on Optane is > 700MB/s on the diskpool and the ssd pool. The optane only pool from a vdisk is in these benches quite similar to the disk or SSD pool and seesm limited by the server performance.

What are the consequenses for a „best of“ AiO setup?

Basically the most important thing is:
You want an Optane as Slog and propable L2ARC due the read ahead caching option on L2Arc.
For the pool itself, use disks or SSDs due their higher iops: Performance with an Optane Slog is superiour.
Pass-through of the Optane is not needed, not for performance reasons and as there is no cache involved
most propably not for security reasons. A pool from Optanes as datadisks would require a much faster machine.
Propably not a use case for AiO.


My suggested AiO setup

- use am USB stick ro boot ESXi
- create a local datastore on an Intel Optane 900P and place the napp-it storage VM onto
- Use an LSI HBA or Sata in pass-through mode for your datadisks

- add a 20 G vdisk on the Optane datastore to the napp-it storage VM and use as Slog for your datapool
- add a vdisk for on L2ARC (around 5 and no more than 10 x size of RAM)
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
so what remains is the question whether Optane now has PLP or not (they seem to have removed this statement from Intel Arc) - or if it is inherent to 3dXpoint based drives after all ...
 
  • Like
Reactions: Stux

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Until begin of this week Intel stated the 4800X and 900P as powerloss protected
and the smaller cache modules without.

I have just re-checked and discovered
Intel® Product Specification Advanced Search

They have indeed removed the 900P from the list of drives with PLP !!!!
(and lowered the price, the 900P was last week at 600 Euro, now at 380Euro)

Result is: (unless Intel lowers price or offers a PLP modell with less capacity)
Intel P4800X (U.2 with 375G): around 1500 Euro

Maybe I should ask a Lawyer about my current 900P that were adverised with PLP
or maybe they have PLP and Intel removed this for newer models to ask for more money.