Intel Optane P4800x vs P900

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,292
1,756
113
CA
I SO wish I understood what was being said here.

Were those specs about the 900 and 905p ... and maybe in RAID or something
- where it said 100GB specs ...and then
- 200GB specs ..?
or is that saying that the performance of a larger capacity drive just doubles the stats!??

Very confusing.

What IS the difference between the 905p / p4800x / p4801x ...?

Were people talking about using them in a raid of 4 drives!? Geesh.

How does it work..?

Let's say I'm writing 420GB of data ... and I have a 375GB Optane + 48GB of RAM ...

Does that mean that no matter HOW slow the spinning drives are ... I could write 423GB (obviously a little less bc of overhead and formatted capacity, MiB vs MB, etc) ... but between the DRAM + Optane, they could literally hold 100% of the data being written to buffer the performance..? Is that literally how it works..?

Is that how you choose the size of an Optane sort of..? And then I'd need a

• ZIL
• L2arc


Thanks!
I suggest reading the forum threads regarding ZFS, SLOG\ZIL, L2ARC.
Read the Intel Spec sheet on the Optane 900p, 905p and 4800x\4801x and then re-read this thread.

It's a LOT to try to take in hardware specifications and understanding ZFS all at once :)

The forum (and some main site posts) cover these topics very well in my opnion and since they're a common topic they're discussed in different ways in different threads.

We use optane for high performance storage and SLOG device as well.
 

svtkobra7

Active Member
Jan 2, 2017
362
83
28
Read the Intel Spec sheet on the Optane 900p, 905p and 4800x\4801x and then re-read this thread.
It's a LOT to try to take in hardware specifications and understanding ZFS all at once :)
AMEN. One bit I would like to chime in with ...
  • For me at least, there has to be some understanding of product line taxonomy, before you can get anything out of Intel Ark ...
  • So ... I've tried to succinctly summarize the four delineations I suggest below (w/ links to Intel Ark) ...
  • Avoiding memory (out of scope) and DC DP, I think the following is a somewhat tidy, practical starting point.
Yay for my 905p's & P4801x's coming tomorrow! If I finally determine if I can bifurcate it will be a great day ;) I have missed isdct.

###

What is the 900P? Intel® Optane™ SSD 900P Series Product Specifications
  • Enthusiast Optane, version 1. U.2 and HHHL. Max size = 480 GB.
What is the 905P? Intel® Optane™ SSD 905P Series Product Specifications
  • Enthusiast Optane, version 2. M.2 form factor added. Larger than v1.
What is the P4800X? Intel® Optane™ SSD DC P4800X Series Product Specifications
  • DataCenter Optane, version 1. U.2 and HHHL. Min size = 375 GB.
What is the P4801X? Intel® Optane™ SSD DC P4801X Series Product Specifications
  • DataCenter Optane, version 2. M.2 form factor added. Smaller than v1.
We can cover the difference between my Ethusiasist and DataCenter delineations in Chapter 2. ;)

Here is one article, that helped me at least: ZFS Caching
 
Last edited:

svtkobra7

Active Member
Jan 2, 2017
362
83
28
I'll grab a few more of your Q's since now I can't stop loooking forward to USPS coming tomorrow with more Optane. ;)

Is that how you choose the size of an Optane sort of..? And then I'd need a

- where it said 100GB specs ...and then
- 200GB specs ..?
or is that saying that the performance of a larger capacity drive just doubles the stats!??
  • See my last post (for you) and the "DataCenter Optane, version 2. New M.2. Smaller than v1."
  • Both are P4801x. And much more reasonably priced than a larger P4800x which may be outside of a given system budget.
  • I think it is fair to say that of all Optane variants (removing the more recent release from the equation and the fact that there aren't as many in the wild), in a equilibrium state: The P4801x 100GB and P4801x 200GB will be used for the use case of a SLOG more so than any other variant.
  • The larger writes are 2x, but that doesn't translate to a pool being able to write twice as fast.

Were people talking about using them in a raid of 4 drives!? Geesh.
• ZIL
• L2arc
  • ZIL = ZFS Intent Log.
    • SLOG = Separate Intent LOG.
      • Where you have a pool of 12 HDDs AND a separate disk for ZIL, you now have a SLOG.
        • In its most common use case, Optane would be used as a SLOG.
  • Initially it may make sense to consider ZIL to be simply a write cache.
  • L2ARC
    • It isn't clear to me if you are looking to broaden your knowledge base or build a system. Arguably you do one before the other and in that certain order, but too my point, most ZFS filers (but not all) don't need an L2ARC with the rule of thumb being to max out your RAM before you think about adding an L2ARC.
      • Example: I have 200GB of RAM in each of my filers and in my use case don't think I see a material, or any, benefit from adding an L2ARC.
        • Also, it would have to be so large the SSDs would be better used elsewhere.
          • You could use Optane as an L2ARC, but that would also be a horrendous waste of Optane.
  • Initially it may make sense to consider L2ARC to be simply a read cache.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,292
1,756
113
CA
- P4801x are not all m.2 I have a 2.5"\u.2 sitting here

- Optane in Pools are still faster than NVME+Optane SLOG we've tested this in various scenerios, and even over the internet the Optane test faster. In real-world "can we feel it" tests outside of database usage the answer has been no. We still use pure optane pool for high performance setups but you need the RAM + CPUs and you're still going to run out of CPU first unless you're running high performance 4 CPU and very fast networking or 2 CPU with fast networking for high-perf storage server.
 

svtkobra7

Active Member
Jan 2, 2017
362
83
28
- P4801x are not all m.2 I have a 2.5"\u.2 sitting here
  • I didn't say they were. I said "New M.2" ... Thanks for making note that it wasn't clear. I know what I meant, but that doesn't mean you knew what I meant! Edited to "M.2 form factor added."
  • What was trying to convey without a copy/paste of Intel Ark is that:
  • (1) P4800X = U.2 and HHHL & (2) P4801X = New M.2 = As in U.2 existed in v1, M.2 added to line up in v2.
Optane in Pools are still faster than NVME+Optane SLOG
  • Are you sure about that? ;)
  • A ["Optane in Pools"] must equal B ["NVME+Optane SLOG"]; where an Optane SLOG is technically "Optane in [a] Pool" And Optane = NVMe.
  • Furthermore: a mirrored pair of P4801x 100 GB ["Optane in Pools"] > [insert fast NVMe pool here] + P4800x SLOG ["NVME+Optane SLOG"] ?
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,292
1,756
113
CA
  • I didn't say they were. I said "New M.2" ... Thanks for making note that it wasn't clear. I know what I meant, but that doesn't mean you knew what I meant! Edited to "M.2 form factor added."
  • What was trying to convey without a copy/paste of Intel Ark is that:
  • (1) P4800X = U.2 and HHHL & (2) P4801X = New M.2 = As in U.2 existed in v1, M.2 added to line up in v2.

  • Are you sure about that? ;)
  • A ["Optane in Pools"] must equal B ["NVME+Optane SLOG"]; where an Optane SLOG is technically "Optane in [a] Pool" And Optane = NVMe.
  • Furthermore: a mirrored pair of P4801x 100 GB ["Optane in Pools"] > [insert fast NVMe pool here] + P4800x SLOG ["NVME+Optane SLOG"] ?
Yes.
I have a single system with a dozen Optane in mirrored configuration, in addition it has PM1725 3.2TB in mirrored + Optane SLOG
and in other systems we have 12 P3500 + Optane SLOG, another 8x P3700 + Optane Slog. The pure optane pool always tests faster when we benchmark the file system or with our software in real-world-usage, is it that mcuh faster techicnally NO but that's more of a limitation to ZFS implementation currently. However as I said it's not that noticeable unless you're hammering it with read\write database then the optane really shines, and continues and continues... but even normal->heavy usage a pool of a half dozen NVME is nothing to sneeze at ;)

I'll be adding 8-12 more P3500 2TB to that system and can test again but there's 0 chance latency will decrease :D


IMO if you want a beyond-badass high IOPs system with capacity 4x+ of the Samsung PM1725 3.2TB (or equivalent) with mirrored optane slog is the way to go. Keep in mind most systems with high capacity NVME are not running all 12 or 24 NVME etc at their top performance due to backplane\hba limitations, with more systems now with x16 we'll see NVME performance increase by that alone on older drives where you're using 12+ NVME in a single system.
 

TrumanHW

Member
Sep 16, 2018
141
21
18
I have a single system with a dozen Optane in mirrored configuration, in addition it has PM1725 3.2TB in mirrored + Optane SLOG
and in other systems we have 12 P3500 + Optane SLOG, another 8x P3700 + Optane Slog. The pure optane pool always tests faster when we benchmark the file system or with our software in real-world-usage, is it that mcuh faster techicnally NO but that's more of a limitation to ZFS implementation currently. However as I said it's not that noticeable unless you're hammering it with read\write database then the optane really shines, and continues and continues... but even normal->heavy usage a pool of a half dozen NVME is nothing to sneeze at ;)

I'll be adding 8-12 more P3500 2TB to that system and can test again but there's 0 chance latency will decrease :D


IMO if you want a beyond-badass high IOPs system with capacity 4x+ of the Samsung PM1725 3.2TB (or equivalent) with mirrored optane slog is the way to go. Keep in mind most systems with high capacity NVME are not running all 12 or 24 NVME etc at their top performance due to backplane\hba limitations, with more systems now with x16 we'll see NVME performance increase by that alone on older drives where you're using 12+ NVME in a single system.

Do you by chance have benchmarks for those..?

When you say 12x optanes mirrored ... I don't get it:Wouldn't that give you the capacity of only 1x for the 'cost' of 12x ..?
What capacity are the mirrored optanes..?

mirrored PM1725 3.2TB ... those get up 7GB/s alone, yes..? How many PM1725 do you have mirrored..?

Can you give benchmarks and real world performance numbers for:
• 12x Optanes + 2x PM1725 + (which model) Optane ZIL
• 12x P3500 + (which model) Optane ZIL
• 8x P3700 + (which model) Optane ZIL
...? (Did I get those correct) ..?

Also, have you considered using Epyc processors for more lanes..? Or is the problem not enough physical slots for the drives (when you say they're limited by x16 or the backplane) ...?

What about NVMe over Fiber is it..? Is that what you're using for networking..?
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,292
1,756
113
CA
Sorry, that wasn't clear. A pool of mirrors. I have 4x of the Samsung, I'm not sure what they are off top of my head but the IOPs per-drive was high almost a million, but at very high QD too. A fast drive though. I don't have any benchmarks anymore, ran through and have done misc real-world tests as we test\reconfigure but 'fast enough' doesn't start to describe how fast they are LOL.

Everything is Intel so I don't want to mix and match at this point, maybe in the future.
 

TrumanHW

Member
Sep 16, 2018
141
21
18
Sorry, that wasn't clear. A pool of mirrors. I have 4x of the Samsung, I'm not sure what they are off top of my head but the IOPs per-drive was high almost a million, but at very high QD too. A fast drive though. I don't have any benchmarks anymore, ran through and have done misc real-world tests as we test\reconfigure but 'fast enough' doesn't start to describe how fast they are LOL.

Everything is Intel so I don't want to mix and match at this point, maybe in the future.
Am I correct in understanding that ... all other things being equal, if a device requires high QD to get the same numbers another can do on QD1 ... the device maintaining performance at QD1 is 'more impressive' ..?
 

vladimir.mijatovic

New Member
Jan 12, 2019
23
11
3
Am I correct in understanding that ... all other things being equal, if a device requires high QD to get the same numbers another can do on QD1 ... the device maintaining performance at QD1 is 'more impressive' ..?
I'd even say that this is true in an exponential way.

Remember Kill Bill 2 training scene
Uma: "I can (punch trough the wood) but not that close"
Master: "Than you can't!"
:)

Like "Optane can lift 100kg with one arm while others need all four limb in sync to do so and still do a bit worse"

We have a "performance oriented webhosting" for our own CMS platform and a couple of Wordpress/WooCommerce clients.
But as those clients do not need or can not afford to pay a 16vCPU hosting, we give them 1-4vCPUs in LXC containers that are All-in-One (webserver, PHP, db, storage) This is quite a normal scenario. So an Intel Optane could get to 140k+ IOPS performance for even a small 2vCPU client while with other NVMe drives it would be closer to 20-40k. There is some true value in this. For them being small Optane is often an overkill. Do they notice and like their site being fast? Of course. Can we claim we provide really value? Definitely.

I recently did a benchmark with Wordpress plugin "WordPress Performance Benchmark (WP Bench)".
Optane 900p 480GB (on a server under light load) got to 1.000-1.200 queries/sec with E5 Xeons while Samsung 970 EVO+ 1TB got only to 300 q/sec with i9-9900K (the fastest thing next to its family members for PHP). I think that benchmark is single thread oriented (common for PHP workloads) but you could see the difference immediately even though I've had high hopes for 970+ being pretty much an enterprise drive in a consumer disquise.
 

Stephan

Active Member
Apr 21, 2017
412
238
43
Germany
From reading all the comments here I have the impression people do not know how to properly size SLOG to begin with? SLOG is NOT a write cache, it is a sudden-powerloss-reliability-cache. ZFS will write data to your pool every 5 seconds, so what you need at most is the capacity to hold the data of those 5 seconds. I went with 4 GB which means 820 MByte/s sustained sync-writes coming into the system. Adequate for 10 GBps. Anything over 10-20 GByte for SLOG is imho ridiculous and just wrong. Wrong as using unreliable non-PLP 900p for SLOG.

Side note: Cheaper and imho at least equally reliable/speedy option for SLOG with PLP (instead of 900p): Intel P3700 400GB, 7.3 PBW without overprovisioning, which would also be possible, 75k IOPS write which is 2 orders of magnitude more than what HDDs can do.


aka HPE MO0400KEFHN. I got the latter for cheap from ebay, nvme tool allowed firmware upgrade to latest HPE version, no HPE tool involved.

Using Octane raw as RAID1 for databases is another matter, but dare I say invest in more ECC-RAM to keep relevant SQL data in RAM instead of giving the database more IOPS to hammer on? Patrick ;-)
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,292
1,756
113
CA
From reading all the comments here I have the impression people do not know how to properly size SLOG to begin with? SLOG is NOT a write cache, it is a sudden-powerloss-reliability-cache. ZFS will write data to your pool every 5 seconds, so what you need at most is the capacity to hold the data of those 5 seconds. I went with 4 GB which means 820 MByte/s sustained sync-writes coming into the system. Adequate for 10 GBps. Anything over 10-20 GByte for SLOG is imho ridiculous and just wrong. Wrong as using unreliable non-PLP 900p for SLOG.

Side note: Cheaper and imho at least equally reliable/speedy option for SLOG with PLP (instead of 900p): Intel P3700 400GB, 7.3 PBW without overprovisioning, which would also be possible, 75k IOPS write which is 2 orders of magnitude more than what HDDs can do.


aka HPE MO0400KEFHN. I got the latter for cheap from ebay, nvme tool allowed firmware upgrade to latest HPE version, no HPE tool involved.

Using Octane raw as RAID1 for databases is another matter, but dare I say invest in more ECC-RAM to keep relevant SQL data in RAM instead of giving the database more IOPS to hammer on? Patrick ;-)
I think we're getting off track of the original thread purpose, but on the flip-side... we have

1- You're assuming people are only optimizing for network performance not local.
2- You're assuming people care to use\split the rest of the SLOG drive. I know I don't, the capacity is what the drive is, and that's it.
3- You're assuming Optane needs PLP or it will lose data, it does not and Patrick and others that know more than I about what's inside have stated this as well. I also believe they originally had this as a spec then removed it to not affect their enterprise market.
5-You're assuming people only use a SLOG with HDD, not true SSD and nVME need a SLOG device too, and optane is a clear winner here.
6- Stating a P3700 is similar to an optane is so very far from being accurate, it's nothing like the low latency and performance you get form an Optane. P3700 pools benefit from an optane slog.
7- Sometimes you can't add more ram (E3 generations) and sometimes you don't want to spend time configuring the database or your app to be more effective with ram, and an optane for DB is more than sufficient boost in performance. There are other instances but these are the most common I see.

I think it's worth mentioning again that P3700, SN200, 1725, P4600 + newer gent NVME and Optane all are very high performance no matter which way you slice it they'll work in various configurations for basic home\lab workload and SMB workload without much thought. Getting into enterprise space splitting hairs with top-end NVME and Optane is a bit more important :D
 
  • Like
Reactions: itronin

Stephan

Active Member
Apr 21, 2017
412
238
43
Germany
Thanks T_Minus, you are a true Optane fan. ;-)

I can't find anything verifiable regarding Optane never ever needing PLP, would you care to share some link to a statement of this? Thanks. :)
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,395
506
113
I can't find anything verifiable regarding Optane never ever needing PLP, would you care to share some link to a statement of this? Thanks. :)
NAND is block/page-based and also uses mapping tables kept in DRAM for good performance; when you want to update a byte or a whole block you have to do a read-modify-write on the entire block/page in order to update it. If the power were to go in the middle of a RMW, you might lose the whole block. PLP gives you enough breathing room to make sure that doesn't happen, as well as ensuring that the mapping tables are saved correctly and any in-flight writes are committed to disc too.

Optane/3D-Xpoint is byte-addressable in the same way that RAM is - any byte can be read/written in situ, and really this is what allows 3D-Xpoint to be so insanely fast on small random reads and writes since there's never any RMW penalty. As such, within the chip itself there's never anything in flight since everything is changed when it needs to be changed.

There's the question of whether the drives themselves include any cache in between the PCIe interface and the 3DXP chips themselves but TTBOMK none of the optane drives have one of these* so there's no need to keep PLP around for that either.

* IIRC the Micron 3DXP drive does have a RAM cache but I've not seen one reviewed in the wild yet. Almost all NAND SSDs contain one.
 
  • Like
Reactions: T_Minus and itronin