Can a 1 DWPD SSD become a 3 DWPD just with overprovisioning?

ca3y6 · Sep 30, 2024

I am looking at the specs of the Micron 9300 SSD, which they sell in two versions, pro starting at a size of 3.84TB rated at 1 DWPD, and max starting at 3.2TB rated at 3 DWPD. But specs seem otherwise identical.

My question is, is there more to the higher endurance than the overprovisioning? ie if I were to only create a 3.2TB partition on a 3.84TB pro drive (converted to base 1024 equivalent), would I expect a rating of 3 DWPD, or is there more to higher endurance within a class of SSD available in both versions?

nabsltd · Sep 30, 2024

Drives with more endurance generally will have different firmware (or different firmware settings), even if the total amount of flash is the same. The flash may also be slightly different, even if it is the same model/manufacturer due to binning. Reducing the partition size won't really change the endurance. Reducing the size of the namespace will often signal the controller/firmware to act more like a factory-overprovisioned drive, so this could help endurance.

Remember that wear-leveling causes every flash cell to be written once if you write 100% of the total flash capacity (including any spare). So, it doesn't matter if the partition is 100%, 80%, or just 10% of the disk. Once you write 100% of the total flash, you have finished "one true drive write", which is all that really counts. Yes, if you drop the partition size to 10% of the total flash, you would be able to turn a 1DWPD disk into a 10DWPD disk. But, you would have still have written the same amount of total data.

ca3y6 · Sep 30, 2024

thanks for your answer. A few follow up questions if I may.

> Reducing the partition size won't really change the endurance. Reducing the size of the namespace will often signal the controller/firmware to act more like a factory-overprovisioned drive, so this could help endurance

How come? If some of the drive is not partitioned, then surely the drive will never get any write to the corresponding logical block addresses, so the drive has to know that they are free (assuming the drive is either new or has been trimmed before the partitions were created). Why would those free blocks be treated differently than factory overprovisioned blocks?

Does the drive have any way to tell the state of the NAND cell other than blindly counting the number of writes to it? I was assuming that a 3.2TB drive had a rating of 3 DWPD vs 1 DWPD for 3.84TB just because the manufacturer must have used a modelled distribution of NAND cells going bad, so 1 DWPD corresponds to the 96th percentile of NAND going bad (3.84/4) and 3 DWPD corresponds to the 80th percentile (3.2/4). But for my logic to make any sense, the drive must allow itself to write more than a fixed number of times to a cell and only a retire a cell when it actually goes bad.

EasyRhino · Sep 30, 2024

I suspect it would increase the endurance, but not sure if we can measure how much.

But bear in mind, the DWPD figure is a warranty figure. Most drives will actually last much longer, but some drives will fail earlier.

Also-also bear in mind that more than 1TB of writes per day for five years is a ton of endurance and hardly any workloads are likely to stress that.

Let's say you had a 1TB drive rated at 1DWPD.

If you only partitioned and filled it to 500GB, then you would easily stretch the endurance to 2DWPD of your formatted capacity.

But actually, it would probably increase beyond that. Because a SSD controller can use the overprovision space to reduce the problems of wear levelling and write amplification especially as a SSD fills up. Overprovisioning even more helps this this process out even more.

So would you get 3DWPD of your formatted 500GB capacity? Probably, maybe even higher. I'm not smart enough to calculate it.

But also the drive manufacturer only cares about the warranty of the DWPD of the original retail capacity.

ca3y6 · Sep 30, 2024

thanks. Yeah I am not planning to get close to the DWPD rating, but curious if leaving some of the disk unpartitioned is doing any good for an enterprise drive that is already over provisioned. I don't know if my percentile logic is correct but if it was, then by sacrificing 10% of the SSD, I could perhaps increase the DWPD by 50% or more.

Talking about SSDs failing, has anyone seen any failure rate stats for SSDs that stayed clear of their DWPD limits and haven't been killed by an accident (eg electrical event)? I am currently assuming that in good conditions they should live 15y+ like any other electronics but that's only an intuition.

T_Minus · Sep 30, 2024

ca3y6 said:
thanks. Yeah I am not planning to get close to the DWPD rating, but curious if leaving some of the disk unpartitioned is doing any good for an enterprise drive that is already over provisioned. I don't know if my percentile logic is correct but if it was, then by sacrificing 10% of the SSD, I could perhaps increase the DWPD by 50% or more.

Talking about SSDs failing, has anyone seen any failure rate stats for SSDs that stayed clear of their DWPD limits and haven't been killed by an accident (eg electrical event)? I am currently assuming that in good conditions they should live 15y+ like any other electronics but that's only an intuition.

This is not talked about near as much as before due to drive sizes, etc...

But if you go back 3-6 years you'll see over-provisioning was super common to get more performance out of cheaper drives, both consumer and enterprise and that even bencchmarks back inthe day would do normal format and over provision to show performance gains, mostly in stead-state performance from sustained usage increased by over provisioning, even enterprise drives.

It will affect endurance but you'll not know exactly how because you don't know exactly how the firmware works, and I imagine (guessing here) it matters what it is... QLC, TLC, eMLC, MLC, ETC...

azev · Oct 1, 2024

Well with large drives such as 3.2tb or higher I woudn't worry about endurance at all, especially if you are using this for home use, or even for home lab use. I've had drive that are well within the endurance range that essentially become a brick, not sure whats the cause but it just wont be detected anymore. Yes its rare, but it can happen randomly. To me I just look at the drive endurance level by calculating their total write maximums during the warranty period. Over time I've averaged my SSD usage on my desktop to about 100-200TB per year, that is like 3-6month of endurance for 1x DWPD 3.84TB drive. I guess moral of the story is most normal user will rarely ever hit endurance limit of a drive, and it will probably fail due to other reasons.

hmw · Oct 1, 2024

If it’s Micron you can actually use Flex Capacity and/or Overprovisioning in the micron client and go from 3.84TB to 3.2TB. There was a time when Micron used to encourage just that. I’ve used 7.68TB Micron 5300 SATA SSDs over provisioned to 6.4TB in my NVR and they’ve been fine with little write amplification. I’ve used a 3.84TB SSD OP’ed to 3TB in a crappy NVR where there was zero TRIM/DISCARD happening and even though the WAF was quite bad - the drive itself had lost only 10% of its lifetime over 4 years of pathological write patterns

So to answer your question - can you create a 3 DWPD drive either by using a blank partition or Micron’s own utility - sure you can. Overprovision accordingly and measure lifetime remaining + host writes so you can prove to yourself that the OP is working.

Will Micron entertain any warranty claims over and above what the drive came with? Absolutely not.

just a note - the 3 and 5 DWPD drives have specially tuned concurrent garbage collection cycles that don‘t just kick in during heavy writes. Not so with the 1 DWPD drives. Micron has never answered whether the drive firmware looks at the free space and decides to adapt the GC strategy accordingly.

pimposh · Oct 2, 2024

I think there is some misunderstanding here.
Physical flash capacity vs used capacity vs firmware.

Using half a capacity (say 3.84TB out of a 7.68TB drive) does not automatically make it more enduring, since the unused 2nd half is _not_ used for anything else, unless firmware is smart enough - or told (e.g. micron drives) to use it for something like GC/temporary page collection.

One brand, after OP set by vendor tool, reports capacity lower, so remaining space can be used by firmware internal endurance-proofing functions, while other brands do not have this functinality/vendor tools.

Proof: Use my favourite trash drive - Crucial P3. The 1TB version is enough to prove it. Create a 250GB partition, write it many times in a loop. You will notice the usual QLC slowdown to 80MB/s.

If what was said before were true - then using a fraction of the total flash capacity as OP - then the firmware should be using something else to avoid such terrible behaviour. But it is not.

Or it is just me who does not understand how should it work.

ca3y6 · Oct 2, 2024

But isn't the main point to use this free capacity to manage wear levelling? I was under the impression that even old retail drives used free capacity for wear levelling (hence the deteriorating performance as the drive gets full).

And I think the question is really down to whether wear levelling works in a sort of linear way, as described by nabsltd above, ie there are 4TB of NANDs in that disk that can each be written n times and therefore the max number of writes is n*4TB, no matter how you partition the disk, or a whether it is a more statistical approach as per my uninformed intuition, where a drive will have a distribution of NAND cells, some better, some weaker, and the manufacturer will assume a certain distribution, so over provisioning by x% protects you against the x-th percentile of the weakest cells (with the stronger cells being able to support a lot more writes), and therefore you get a non linear DWPD rating as a function of % overprovisioned.

pimposh · Oct 2, 2024

Please elaborate on below. Sorry i am not very paint proficient ;-)
Case A, B, C.

Question is if by rule of thumb one can be sure that grey areas in all cases/firmwares/drives are going to be used for anything.

ca3y6 · Oct 2, 2024

Well let me do my own bad paint work to illustrate my hypothesis. And to be clear, those charts are made up, not to scale, and I am not saying it works that way, I am asking whether it works that way.

So for a given drive, I assume there is a mix of better and weaker NAND cells, and I can create a histogram, where I sort them by DWPD (i.e. how many times each of those cells can be written), you would get a distribution like this (probably steeper, I have no idea), so stronger cells on the left, weaker cells on the right, adding up to a total of 4TB (native capacity of the drive):

So with no over provisionning, the disk sold as a 4TB retail drive would have a DWPD rating equal to its weakest cell, which in the illustration above would be 0.5 DWPD. It will have stronger cells but if the drive is full and, all cells have been written to 0.5 DWPD already, and you try to do one more write, the weakest cell will fail and the SSD will die.

Now with some factory over provisioning:

The disk only exposes 3.84TB of storage, so not only you distribute 3.84TB of writes on a slightly larger number of cells (4TB) but also if some of the weakest cells die, you can just retire them and you have enough stronger cells left to sustain 1DWPD. How much more DWPD depends on the distribution of the cells endurance.

And same logic if you do even more over provisioning:

You can sustain a lot more DWPD because you allow a lot more of the weaker cells to die and still be able to do a full 3.2TB write.

So assuming it works the way above (and I don't even know if it is the case), what I was wondering is if by taking a 3.84TB drive and only partitioning 3.2TB, I am not giving the drive the same capacity to retire weaker cells once they start to fail than if the drive was factory limited to 3.2TB.

TRACKER · Oct 2, 2024

There is a professor (Onur Mutlu), who made great lectures on how Flash NAND works and he explained in details how things work.
I would highly recommend to watch his lectures (at least the flash related

)
E.g. this one:

Then i believe lot of your questions will be answered. Especially the wear leveling ones.

P.S. Flash NAND matter is very complex and lot of algorithms are proprietary.

hmw · Oct 2, 2024

pimposh said:
Proof: Use my favourite trash drive - Crucial P3. The 1TB version is enough to prove it. Create a 250GB partition, write it many times in a loop. You will notice the usual QLC slowdown to 80MB/s.

You have the right idea but the wrong drive and technology. You're conflating the P3's pSLC cache with the slow down in steady state write heavy workloads for enterprise drives.

Here's an old but excellent high level write up on this: https://www.micron.com/content/dam/...rief/ssd-flex-capacity-feature-tech-brief.pdf

to answer your question - the P3 is a shite drive with shite(r) firmware. It is a hideous product that doesn't expose half the things NVMe drives are supposed to expose. The P3 comes with a reserved pSLC cache whose size is static. A 1TB P3 has 264GB of pSLC and a 2TB P3 has 550GB of pSLC. Taking a 2TB P3 and reserving half the area in a hidden partition won't change the pSLC area. You can then write 550GB before the drive crawls to 100 mb/sec. What it WILL do is ensure there are more NAND cells that can be used for wear leveling and hence the drive will last longer.

In a normal enterprise, non-QLC drive, having more NAND cells for wear leveling also means the garbage collector can do its work better and you don't get write blips where the write speed plummets off a cliff while the GC kicks in. This is very different from the pSLC slowdown

In fact that is the entire difference between creating a hidden partition for over-provisioning and using 'Flex Capacity' for doing the same.

Samsung, WD and Micron will create hidden partitions for over-provisioning - especially for their consumer drives. In this case, you simply have a larger pool of NAND cells you can spread wear leveling over.

When using FlexCapacity like mechanisms (Kioxia, Micron) the firmware reduces the max LBA (making the drive size smaller) and is now aware of a certain section of storage it can use for wear leveling - the GC algorithms can adjust appropriately and you WILL see an increase in IOPs during write heavy loads.

The 3 and 5 DWPD drives aren't just guaranteed at that DWPD level but also their performance is guaranteed at that level.

More recently manufacturers have started putting in percentages and block size for writes when talking about TBW and DWPD - Solidgm now says The D5-P5336 is guaranteed for 0.56 DWPD at a 50/50 mix of R/W with a 16K block size. Micron will tell you that its 30TB 6500 ION is guaranteed for 1DWPD at 100% sequential writes with 128k block size - but only guaranteed for 0.3DWPD when doing 100% 4k random writes. You can then work backwards and say that if you wanted 1 DWPD at 100% 4k writes - you could take a 30TB 6500 ION and use FlexCapacity to reduce the LBA to 1/4th of the original value ...

hmw · Oct 2, 2024

ca3y6 said:
Well let me do my own bad paint work to illustrate my hypothesis.

You're right - reducing the LBA or creating hidden partitions will give you an increase in DWPD. The problem is it is difficult to quantify exactly by what level because the manufacturers can use different NAND and different firmware for higher spec'ed drives to create an artificial product differentiation.

What I am saying is that the DWPD will definitely go up - but by how much is something you can only measure during drive operation - assuming you can calculate WAF and track lifetime remaining/spare cell area. Unless the manufacturer gives an exact calculation

Manufacturers are known to give out specific details, different firmware and also expose inner workings - but only to large customers (think hyperscalers)

pimposh · Oct 2, 2024

I will ask the question the other way round.

Setting OP by reducing LBA's through vendor's tool is quite clear and I assume in these scenarios firmware is aware of full/chopped capacity to use remnants for wear leveling mechanisms (my pretty drawing, case B) so indeed it impacts longevity. Clear.

But what remains unclear is if by rule of thumb every drive firmware (guess not) in case A does the same for unused space, more importantly why it ever should do that (eg. extending partition afterwards).

ca3y6 · Oct 2, 2024

hmw said:
You're right - reducing the LBA or creating hidden partitions will give you an increase in DWPD. The problem is it is difficult to quantify exactly by what level because the manufacturers can use different NAND and different firmware for higher spec'ed drives to create an artificial product differentiation.

Actually the Micron paper you helpfully shared seems to suggest my understanding was wrong, i.e. if you overprovision, it seems the TBW capacity doesn't change, the only thing that changes is the DWPD because it is a ratio which denominator decreases with the lower disk size. So it looks like my idea about distribution of NAND cells of different quality is simply not how it works.

pimposh · Oct 6, 2024

Now since due to P/E cycles of pages it's bit CoW kind of write. In light of this some CoW FSs on SSDs are nice write amplificators by design.

Hubi · Oct 8, 2024

nabsltd said:
Reducing the partition size won't really change the endurance. Reducing the size of the namespace will often signal the controller/firmware to act more like a factory-overprovisioned drive, so this could help endurance.

i don't believe that to be correct.
The SSD controller does not care about partitions. It will just use any cell that is available.
in my opinion, it does not matter how you spare out the capacity, the important part is there is always enough free NAND cells to write new data so write amplification can be kept at a minimum.
Remember enterprise endurance is at 100% random write.
Every free NAND cell helps to keep write amplification low and increase endurance.
Also, 1 Disk write with reduced capacity is worth less TBW, which will increase DWPD.
can't prove it, just my 2 cents.

to anwer the initial question:
Can a 1 DWPD SSD become a 3 DWPD just with overprovisioning?

Several manufacturers provide this feature for some SSD series, especially Micron and Sasmung. So yes, more OP will increase DWPD. Not sure if 7% OP = 1DWPD and 28% OP = 3DWPD is written in stone, but DWPD will increase.

pimposh · Oct 8, 2024

Hubi said:
The SSD controller does not care about partitions. It will just use any cell that is available.

I have no idea if this is correct or no but to put it under further discussion if artificial product segmentation isn't the case, what is it then than write-intensive drives often are shrunk comparing to mixed-workloads ones .eg. 3.2TB vs 3.84TB, 1.6TB vs 1.92TB and so on.

Can a 1 DWPD SSD become a 3 DWPD just with overprovisioning?

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Build. Break. Fix. Repeat

Well-Known Member

Well-Known Member

hardware pimp

Well-Known Member

hardware pimp

Well-Known Member

Active Member

Well-Known Member

Well-Known Member

hardware pimp

Well-Known Member

hardware pimp

New Member

hardware pimp