Optane NVMe for Slog/ Pooldisks or All-in-one via vdisk on OmniOS

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
Intel is not known to make faults with the datasheet of a new product so whats going on?

- the 900P has plp but Intel removed this as an advertized feature
to push 4800x sales (we saw this earlier with other non DC SSDs)

- plp on the 900P was not stable enough to guarantee a commited write to be on disk on a sudden powerloss

- Intel plans a 900P Pro/server edition
a 900P as advertized earlier but at a higher prize, the current 900p is for gamer only

- really a fault, sorry, we sacked the responsible person

In general, as there is no cache needed for an Optane,
plp seems for me mainly a firmware/software feature less a hardware feature.
 

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
I ask myself about the real consequences.

For a critical production setup its clear, the advantages of Optane are huge, the other good Slog devices are expensive too (I paid more for a ZeusRAM than I would for the 4800x) and I would not suggest an Slog device without the guarantee that commited writes are on stable storage.

For many setups you may ask for the danger vs the price of a solution. Without an Slog the whole writecache is in danger, say 4GB of data.

The Slog must only guarantee for the last commit so worst case is one lost commited datablock. Bad if this affects a critical finance transaction or a metadata that results in a filesystem corruption. But in general the real risk is low, much lower than the risk of a corrupted filesystem or a corrupted database without sync.

If I go back a few years, this risk was accepted with the first reasonable Slog (Intel SLC X25) that comes without plp as this one was better than anything else at that time.

So maybe for many setups, the Optane cache version (16/32GB) or the faster 900P remains a good ZFS and Slog option for many setups - while I hope for a cheaper alternative of the 4800x in future.
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany


so, what should caps protect in the event of pl, when the ack back to os is issued after inflight writes are persisted to optane without DRAM ?
 

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
is not all 3dxpoint / optane ,plp by design' as there is no cache?
Writing data consists of many steps in software. On a sudden powerloss a system should complete the whole current transactions for a commited write and behaves proper to ensure filesystem consistency, mainly a question of firmware quality with Optane. Intel guarantees this for some models like the 4800x, not for others like the cheaper Optane.

One may discuss if the firmware is the same but for some setups you must rely on a vendors guarantee for a feature does not matter if this is only a marketing difference.
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
well, Intel states (and i guess still does) the Controller of p900 is the Same as for the 4800 - and i really wonder how PLP would look like without a DRAM cache. this could also explain that there are no caps on the 4800 ...
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
but for some setups you must rely on a vendors guarantee for a feature does not matter if this is only a marketing difference.
This is an argument.
But is a feature that is not necessary/existing any more because of technology change and that is therefore only a sentence in a specs sheet still worth to consider ?

I have no problem with people paying a massive premium for an empty, obsolete sentence backed by zero differences in operational aspects if they need to.

Really wonder when/if there will be an official statement about the need/existence of PLP as a ,feature' with optane.
 

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
You are right if the PLP guarantee of the 4800x is only a marketing gag by Intel so you can safely ignore the PLP=no of the cheaper Optane.

You say there is no cache that you need to protect. This seems true but shortly prior a cache flush from ZFS the Slog can hold say 4 GB of data. After a crash these data must written to a pool on next reboot. This is why you need the Slog.

Now remember, you want an Slog to guarantee the validity of a database or a filesystem on ZFS due uncomplete atomic writes. Why do you expect that the same would not affect the data and filesystem on the Slog. What if it corrupts on a crash?
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
If data on the slog becomes corrupt, would a (cap or marketing backed) PLP be able to prevent this in any way?
there is always a small slot in time that can't be covered, and be it a single CPU cycle.

the only way to be 100% save is maybe to verifiy the data that has been written to the slog (by zfs/the filesystem) before ack back to the app. And this is for sure no performance boost.
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
oh, and would zfs write corrupted Data from the slog at all, or just fallback to the last txg before the power loss when it sees the checksums of the slog are not ok ?

imho, giving up the corrupted txg and forcing a manual rollback to the last consistent state/sane txg commit would be much more safe than writing a corrupt txg from slog.
 

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
In the end, only an Intel developper can answer about the differences between Optane with or without PLP.

Until then ZFS guarantees only that the filesystem is not corrupt while an slog must guarantee the validity of the writecache and the last commited transactions. If you see Slog only like a regular ZFS filesystem, you cannot guarantee last transactions.
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
yes, this is somehow esoteric :D
Not sure if there is any difference, as zfs only sees the ack for the writes to slog.

If these are on a persistent layer and no ram-based cache i have a hard time seeing what could be different.

Either data is written and survives the reboot or it's not written. For the later case no ack should be given.

So as long as a drive acks honestly there should never be an issue.

If data gets corrupted after an honest ack something went seriously wrong with a good portion of bad luck that imho is not related to the presence of PLP.

In this case rollback with loss of a single txg is maybe the only option.
 
  • Like
Reactions: Stux

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
Maybe this is similar to the ECC functionality that Intel artificially limited to server chipsets. From a manufacturing view they cost the same with the result that any cheap NAS comes now without ECC even when it offers 64 GB of RAM.

I expect the same with PLP where it only affects firmware quality. An additional hardware is not required like on hardware raid (BBU or Flash backup) - simply to maximise profits.
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
yes, but in this case it would mean the fw on the cheaper Drives intentionally messes up things or acks too early. can't imagine this but who knows.
imho expensive fud that is smoothened by an additional feature listed on ark that now is not present anymore for the p900 - but might be wrong and only Intel knows what is really going on.
 

Stux

Member
May 29, 2017
30
10
8
42
Maybe completed sync writes are fine but in flight async writes are not ;)

After all, the sync write should *only* be acknowledged once it is committed.
 

J-san

Member
Nov 27, 2014
67
42
18
40
Vancouver, BC
I have a P3600 and a DC 3700 Sata around.
I will include them when I redo my tests with a different hardware and a different set of benchmarks.
Would love to see a 200G or 400G Intel DC 3700 SATA drive in there as a slog for comparison!

Keep up the good work!
 

NYCone

Member
Jun 23, 2017
35
8
8
57
My suggested AiO setup

- use am USB stick ro boot ESXi
- create a local datastore on an Intel Optane 900P and place the napp-it storage VM onto
- Use an LSI HBA or Sata in pass-through mode for your datadisks

- add a 20 G vdisk on the Optane datastore to the napp-it storage VM and use as Slog for your datapool
- add a vdisk for on L2ARC (around 5 and no more than 10 x size of RAM)
Gea,

As you suggested, I've switched to Solaris 11.4 AIO. I'm a bit of a newbie to ESX and Solaris, do you have a how to on what parameters you used for the vdisk SLOG and the L2ARC? Is the exact provisioning important in ESXi?

Thanks
 

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
Just create a vdisk (Optane datastore) for Slog with size=20GB and a second vdisk fpr L2Arc with a size between 5x and 10x the Ram that you have assigned to Solaris.

You can use the default scsi virtual controller. On newer ESXi you can try the virtual NVMe driver (may have a lower latency)