Can a 1 DWPD SSD become a 3 DWPD just with overprovisioning?

Hubi · Oct 8, 2024

Well to be honest, it will be impossible to state such things for ALL ssd controllers in the world. At least for the second part (use any cells) but partitions are definitely no cell boundries.

ca3y6 said:
if you overprovision, it seems the TBW capacity doesn't change, the only thing that changes is the DWPD

pimposh said:
.eg. 3.2TB vs 3.84TB

Reducing a 3.84TB 1DWPD drive to 3.2TB will only increase DWPD to 1.2 in terms of simple capacity change.

3.84TB 1DWPD 5y = 7008TB TBW
3.2TB 1.2DWPD 5y = 7008TB TBW

3.2TB 3DWPD 5y = 17520TB TBW

remaining 1.8 DWPD increase comes from reduced WA when providing more OP while sticking to JESD218 enterprise workload. As mentioned, this is not an exact number for all SSD models.

hmw · Oct 8, 2024

An increase in spare area available for wear leveling has an outsized influence on sustainable DWPD. It's not just the % reduction in capacity and subsequent decrease in TBW

Anecdotal data from reading tons of SSD data sheets

- 0.3. ~ 0.6 DWPD Consumer level SSDs usually have 1-7% spare area
- 1 DWPD Enterprise SSDs usually have 15% spare area
- 3 DWPD Enterprise SSDs usually have 28% spare area
- 5 DWPD Write Intensive SSDs have 50% ~ 60% of storage dedicated to wear leveling

pimposh · Oct 8, 2024

By looking into some random datasheet it looks like chopping usable write capacity in half... makes it write intensive capable (guessing there might be different controller/firmware tweaks)

hmw · Oct 8, 2024

What makes a huge difference is TRIM/DISCARD. I have a 4TB Micron MX500 SSD with OP set to 50%. It runs in a Unifi CloudKey with a USB to SATA interface, the CloudKey does zero TRIM/DISCARD commands. The CloudKey NVR writes in a circular buffer so it's almost all sequential writes. Not 4K random but still < 1MB sequential writes.

I've measured the WAF over 4 years and its ~ 10 but the drive lifetime remaining is > 78%.

I also have a Unifi NVR - this has the Annapurna SoC with native SATA interfaces and does do TRIM/DISCARD. I have 7.68TB Micron 5100 SSDs set to 6.4TB and the WAF is 1.15 ~ 1.2. Drive lifetime remaining is 100%

There's no difference in reserving capacity by means of a hidden partition or by using the firmware to reduce the LBA and capacity. The reason for using the firmware is of course, depending on firmware - it will recognize a specific amount of cells set aside specifically for wear leveling. The other reason is that if you're using the drives in places that don't give you the opportunity to format. For example you put the drives in a NAS appliance or a NVR - it will format the drive the way it wants - if you create some nice NTFS or RAW hidden partitions, they will be destroyed and the partition table overwritten. In this case, the only way to ensure proper OP is to use the firmware capacity management

I see Amazon Prime Day deals for Crucial P3 and P3 Plus and all the stupid pseudo tech sites talk about how this is a great deal. Please avoid this drive - it should be renamed Crucial Pathetic3 (Plus)

Whaaat · Oct 9, 2024

Hubi · Oct 9, 2024

Not sure about the old PM1725b, but with newer models you actually you do the opposite, you go from capacity optimized to endurance optimized.

And it is not done by flashing a new firmware, basically you limit the max. space available for namespaces.

According to several kioxia "battle cards" this option even exists for PM9A3 and all PM17xx successors.

Prophes0r · Oct 15, 2024

There seems to be a problem with perception here.
There are multiple things going on at the same time.

Let's use an analogy.

You have a tub of ice-cream.
You can get 1000 scoops out before the tub is empty.
If you use a slightly smaller tool, you can get 1200 scoops from the tub.
But these new scoops are going to be smaller, because the tub still has the same amount of ice-cream.
So, if you account for the smaller scoop size on your menu, you can get more servings, they will just be smaller.

If you take a 4TB drive rated for 4PB of writes and use overprovisioning to make it appear to be a 2TB drive, you will still get 4PB of writes.
This will give you "double" the endurance when comparing to other 2TB drives. It doesn't do much when comparing to 4TB drives.

Having a 2TB drive will force you to treat it differently though, so it might last longer(time), even if it doesn't last longer(writes).

There is another very minor benefit.
A footnote. Because the above assumes you are writing every single cell until they are ALL broken.
That isn't how it actually works.

Imagine a bucket full of water with a tiny hole in it that is blocked when the bucket is on the ground.
Every time you pick the bucket up and move it around some water will drip out of the hole.
There is a line on the bucket. If the water level ever get's below that line, the bucket breaks.
The farther down the bucket the line is, the longer the bucket will go before it breaks.

For drives, this is like the provisioned size.
If you provision it to use 100% of the available flash, it will break as soon as a single cell goes bad.
By using a smaller provisioned size, you very slightly improve the number of writes until it says you need to stop.

Let's say you have a hypothetical drive with 100 cells. Each cell will break after 100 writes.
After 9900 writes, every cell is 9/10th used.
If you drive is configured as having 80 units, it will break after 20 more writes. 9920 writes total.
If, however, it is configured as a 60 unit drive, it will break after 9940 writes.
The increase is honestly too small to matter, but it is TECHNICALLY there.

Also, something something more efficient garbage collection with more spare cells.

Hubi · Oct 16, 2024

This is only half the truth.
Enterprise Workloads as definied by jedec are 100% random and preconditioned.

Every free cell helps the drive to get those random writes in free cells and reorganize in the background instead of actually having to shovel around blocks and pages.
This will reduce write amplification a lot and therefore you get a lot more host writes from the same cell write count.

pimposh · Nov 1, 2024

Couple of interesting links:

Endurance of NVMe, SAS, and SATA SSDs | SNIA | Experts on Data

www.snia.org

What is Write Amplification (WAF)

Understanding WAF: The Key to Mastering SSD Performance As I mentioned in [this post], WAF is the most crucial topic to understand to master how SSDs work, and get the most out of your drives. Write amplification (henceforth just WAF) can happen at different levels of the software stack. For

ssdcentral.net

Now who's measuring WAF in homelab hands up!

SSD Useful Life Calculator - WintelGuy.com

SSD SSD Useful Life Calculator

wintelguy.com

DWPD, TBW, GB/day Calculator - WintelGuy.com

DWPD, TBW, PBW, GB/day calculator / converter

wintelguy.com

------
Not directly related to topic but kinda neat!

GitHub - devizer/w3top-bin: W3-Top isn't grafana, htop, atop, iotop or gnome-system-monitor. It's all together with web interface and built-in benchmarks. Here is a build tools for w3top, an http-based monitoring and bench-marking tool based on KernelManagementLab

W3-Top isn't grafana, htop, atop, iotop or gnome-system-monitor. It's all together with web interface and built-in benchmarks. Here is a build tools for w3top, an http-based monitoring and ...

github.com

Ouch

W3Top - web based metrics monitoring tool and built-in disk benchmark

How much a hardware fits your workload? The W3Top app is the answer. Works on any linux on x64, arm64, and arm32 platforms

3.85.91.86

nexox · Nov 1, 2024

pimposh said:
Now who's measuring WAF in homelab hands up!

Yes, but a different definition of that acronym...

In slightly more seriousness I don't have a consistent enough workload to make it useful to measure write amplification, since the idea is you can project future media writes based on your estimated host writes, and I can't estimate that or confidently assume that any given day's writes won't lead to a different amplification factor. I just kinda keep an eye on media writes and leave it at that, but I also avoid TLC and QLC, with SLC and MLC it's difficult to hit truly absurd amplification levels.

Hubi · Nov 2, 2024

pimposh said:
Now who's measuring WAF in homelab hands up!

I don't care about disk writes at all. My homelab has probably 0.01 DWPD.

Prophes0r · Nov 4, 2024

pimposh said:
Now who's measuring WAF in homelab hands up!

I don't measure it.
I just don't use the wrong settings so it doesn't really happen.

If you are doing millions of 4k writes, don't use a 4MB block size.

And peek at your S.M.A.R.T. status a few times after a new setup has been running for a while.
If you notice your Optane drive is using 4% of it's endurance over a month...it's time to figure out what you did wrong.
(I had this exact issue. Luckily they were just some $2 16GB drives and I was only playing around to see if L2ARC was useful. It almost never is. Like zero chance your HomeLab needs it. 99% chance an enterprise setup doesn't need it either. It is a trap anyway.)

Apachez · Jan 9, 2025

When it comes to DWPD its mainly a warranty thing (similar to MTBF) which doesnt mean that the drive will survive for that long but in theory it should surpass that.

To increase DWPD there are a combo of things:

- Overprovisioning (should rather be called underprovisioning). That if your 1TB drive is configured to act as a 500GB drive this would increase DWPD (daily write per day) by 2x but not the TBW (terabyte written) who will remain the same.

- Type of flash (SLC can survive more writes per cell than TLC or QLC - but also to rewrite a single QLC cell will actually rewrite multiple cells).

- DRAM cache in front of the flash (either directly on the storagedevice or in combo with hostbased caching). Multiple writes to the same LBA can be aggregated in the DRAM (on the storage) and after some time dumped to the physical flash. This way this single LBA will only have a single write on the flash while the application (such as a database or so) think it have written to the same LBA 10 times.

- Static cache in front of the flash such as pSLC and similar. Compared to DRAM cache whats written to the pSLC will survive a powerloss. Performance will also probably go down once you pass the size of the pSLC (depending on how smart the controller is in case it will move pages thats not been written to for a while to the TLC/QLC flash). A common design is to have something like 250GB pSLC and 750GB regular TLC or QLC flash and tada you have yourself a 1TB drive with extended lifetime compared to one which is 1TB QLC.

- Controller and firmware being used. A controller who spend more time to find a cell to write to (with least amount of writes) will have longer lifetime than a controller/firmware who just dumps it anywhere. In the later case its more likely that you will pass the 1000-5000 writes/cell a TLC or QLC based drive will have sooner. Also such controller will of course have lower performance (higher latency to find the least used cell). Can be dealt with by clocking the controller higher that is you might have the same controller on two devices but the one who spends more time to find least written cell might be higher clocked (or have more enabled cores etc) to not affect the latency between the two models.

Personally I try to select 3 DWPD drives. Again it doesnt necessary mean much but I believe if the vendors lie they will lie equally much so a 0.3 DWPD from the same vendor should in theory have shorter lifespan (in terms of TBW) than a 3 DWPD drive of the same size.

Prophes0r · Jan 10, 2025

Apachez said:
Overprovisioning (should rather be called underprovisioning)

No, Overprovisioning is the correct idea.
You Provision something by supplying it with necessary materials.
If you provide it with more than it needs to do it's job, you Over-provisioned it.

Since we can't change the 'materials' these drives have, we reduce it's job.
The drive is now provisioned for more than it's job, and is therefor overprovisioned.

Apachez said:
DRAM cache in front of the flash

Every single U.2 drive is going to have this. It isn't even going to be listed in specs unless the marketing team is desperate.

Apachez said:
Static cache in front of the flash such as pSLC

Not in an enterprise drive.
They could, in theory, implement it, but it would go against design rules.
You spec out the drive in worst case conditions, and guarantee it operates that way by default.
Having it reconfigure itself as it fills up, like a consumer drive does, would lead to wildly inconsistent performance as the drive ages, which is a HUGE no-no.
Flash based storage will end up slowing down a little when it get's really full due to garbage collection and data-shuffling not having much room to shift things around, but it needs to be reduced as much as possible in an enterprise drive.

Apachez · Jan 11, 2025

I dont agree regarding the term "overprovisioning".

If you overprovision something you have a 800GB drive and try to provision it with 1200GB of data - which of course wont work (that is raw data, not cheating with compression).

Similar to how ISP's can have 1Gbps customers on a 48 int switch but the uplink is just 1x10G or 2x10G (20G). If all customers tries to use their 1Gbps connection at the same time the uplink would need to be 48G (initially packet buffers will deal with this to queue up outgoing packets over the uplink but they are not unlimited). That is the ISP 10G (or 20G) is overprovisioned with 48G.

What you do with SSD/NVMe's is that you underprovision them (IMHO). That is you limit the namespace and/or partition to only use lets say 400GB on a 800GB drive in order to prolong its lifetime (will of course not help if we look at TBW).

Also there are more interface types than U.2 drives when we speak about SSD and NVMe's today. But even so having a DRAM with some added latency can aggregate writes to the same LBA so whats then actually being written to the *LC will be a single write instead of lets say 10 writes which will be one way to limit amount of writes needed on the write senstivive *LC flashmemory. But in the DRAM case its more of a nice sideeffect since the main purpose is to speed things up (be able to receive a burst of x GB since the *LC flashmemory is slower than DRAM).

Prophes0r · Jan 11, 2025

Apachez said:
I dont agree regarding the term "overprovisioning".

You are still misunderstanding.
"Provisioning" is not a term unique to computers.

"Provisions" are supplies.
Like...you are going on an expedition, or building a house.

If I'm taking a group of 4 people on a 5 day trip, I need to get 5 days of supplies for 5 people to be properly provisioned. (25 = 25)
If I get 5 days of supplies for 6 people, I am overprovisioned. (25 < 30)
If I instead leave 1 person behind after I got the original supplies, I am also overprovisioned. (20 < 25)

This is what we are doing with these drives.

NAND flash drives already have spare flash.
They HAVE to have spare, because cells will go bad and there is background stuff going on with them.
These extra cells are the "provisions".

The drive ratings are based on having these extra cells.

If a drive is 100% provisioned when it has 10% extra cells, then adjusting the storage/extra cells so you have 15% extra, means you are overprovisioning the drive. You are also reducing the usable storage, but that isn't the point.
The point is that you are changing the configuration to one where the drive has more provisions than it is specified for, hence overprovisioning.

Apachez · Jan 11, 2025

Yes but overprovisioned means that you have overused available resources.

You have food available for a day trip for 5 people but you have 10 people with you on this trip. You have overprovisioned by 200% (or well technically the overprovision will be 100% since a 0% overprovision means spot on target). Meaning the available resources will deplete sooner or that half of your customers wont get a snack on that trip.

But what you do with SSD/NVMe's and (as I would like to call it) underprovisioning is that you will not use 100% of its resource but lets say 50%.

You will only partition 50% of available space so that the "DWPD" (daily write per day) will be based on 400GB storage rather than 800GB storage. This way you have doubled the DWPD rating of that particular drive. That is for normal use (as a filestorage) you have prolonged this drive lifetime. However you have not done so if we instead look at TBW (terabyte written).

Which is what the vendors do - they underprovision the drive so what they sell as lets say a 800GB drive is actually lets say 2400GB (as raw capacity) to get a high "DWPD" rating. Which is why a drive with higher DWPD is also more expensive even if its the same vendor and series. But doing this underprovisioning is not the only way to expand how many writes the drive can survive.

If the same 800GB drive would be sold as 800GB raw capacity you would have <1 DWPD.

If they (IMHO) would overprovision the drive it would be like those faked 1TB USB drives from China who are actually a 128GB drive or smaller as raw capacity (it looks like a 1TB drive but if you write more than 64-128GB to it the drive will fail).

I know that the marketing people like to skew this terminology so it means something different. Same thing with "broadband" which is in reality a terminology from doing radio (over the air communication) and signaiing refering to frequency gaps (the opposite is for example "narrow band" etc). But most people today ("incorrectly") think of internet speed when you mention "broadband" and not a frequency gap.

Prophes0r · Jan 11, 2025

Apachez said:
Yes but overprovisioned means that you have overused available resources.

No...
You still aren't listening.

Provisioning "the activity of obtaining the equipment and resources you need for a particular activity."
Over- Superior, Excessive, Exceptional, Extreme.
Overprovisioning means providing more than the required resources.

The "resource" is the extra space, NOT the space required for normal storage.

Overprovisioning a drive means you are giving it more overhead.
You do this by reducing the storage portion, which increases the extra portion.

It doesn't matter if you feel this is the wrong way to put it.
It is the literal definition of the component words/terms, so it makes perfect sense when you combine those terms.

It means exactly what it says on the tin.
It is not a "marketing term".
It is an engineering term.
It is technical language, which is language that strives to mean exactly one thing, with little to no room for interpretation.

pimposh · Jan 11, 2025

Apachez · Jan 12, 2025

As you posted yourself, overprovisioned means excessive. Partitioning lets say 400GB out of a 800GB is NOT excessive - its rather the other way around.

Trying to partition 1200GB on a 800GB drive would be a bit excessive...

Ill just leave this here

Duty Calls

xkcd.com

Can a 1 DWPD SSD become a 3 DWPD just with overprovisioning?

New Member

Well-Known Member

hardware pimp

Well-Known Member

Active Member

New Member

Active Member

New Member

hardware pimp

Well-Known Member

New Member

Active Member

Member

Active Member

Member

Active Member

Member

Active Member

hardware pimp

Member