Intel 750 PCIe (nvme) Wear Endurance falling insanely fast (on RMA'd drive too)??

james23

Active Member
Nov 18, 2014
427
93
28
49
Hellooo!

I have a Intel 750 PCI E 400 GB NVMe SSD drive as my main boot drive on my desktop PC, running windows 10 - 64 bit (Win 10 as boot for about the past 5 months, in the prior 2 months i had been running Win 7 - 64) . I quickly noticed that the 750 PCIe's Smart Wear endurance stat had quickly fell to 93, after about 3-5 months of ownership, after 5-7 months it was already down to 85. I could tell that I had written around 3-4 TB of data to this drive from the smart stats. Note; I did upgrade the firmware of the drive to the latest version a few weeks after the update was released (this was back in April 2016, and that FW is still the most current as of now). Also note, I had over-provisioned this SSD by about 50 GB of its 371gb raw space. As I was using Intel's SSD toolbox to monitor the smart stats every few days I began using their "CSV data export" from the SSD toolbox, for each point down in my wear endurance level so I would have a type of log. I also happen to keep very detailed notes such as bandwidth test results on the drive and other items related to this personal new PC build that I was using with the 750 in as boot (so I can clearly see that the bandwidth test over time look normal and as fast as they should be, even as the Write endurance where was falling.)

So I initially contacted Intel support and they created a case, after about two months they decided to do an CrossShip RMA (they had to charge / refund my card 800$ though, and i had to call about the refund over a week after they received my RMA), but i finally I got this new, RMA'ed 750 in and i imaged my my old 750 to this new one (that went successful with no issues), and then proceeded to start using this "new" 750 as my boot drive. I then started to notice that the Wear endurance on this new 750 was falling seemingly faster than the first one (ie by the 10th day of it being online i was already down to 96/97). So I began doing toolbox CSV export for every single point drop in the wear endurance. It's clear that I'm nowhere near the spec'd write endurance for this drive (which is about 127 TB- I know that's still pretty low for an SSD, but usually Intel is conservative on those numbers IMO).

I've seen a few other users on the Intel forms who have had similar issues but non-seemingly as extreme as what I've seen- In one case on the intel forums, a user had some kind of wild process that had written 100 TB of data to his 750 over the course of three or four months which caused his write durance to drop 10 or 15 points (which somewhat makes sense). Another user was reporting his wear endurance was falling quickly but a Intel rep blamed it on him using incompatible hardware while using the 750 as a boot drive. That didn't make much sense to me, since the smart data the drive records (indicating the number of host writes and NAND writes should be agnostic to your hardware).

As I said I do have quite a bit of experience with both consumer and enterprise SSD drives (mainly Intel and HGST drives), as I often build servers enterprise class servers for my IT customers-that said I have never seen any SSD fall anywhere close to or as quickly as my 2 x 750s have, with so little write's recorded by SMART.

Does anyone have any input on this? Is it possible that my OS is doing a certain type of low MB count write such that it really hurts the write endurance but doesn't rack up a high number of MB/GB writes on the SMART counters? (im posting screen shots of my write endurance along with the corresponding HOST/NAND GB writes along with some FULL CSV exports from the intel ssd tool box, at XYZ write endurance count).

I can tell one of the Intel reps I've been working with seems to be suggesting something along the lines that possibly my OS is doing a lot of really small writes which could cause the wear endurance to fall much faster than the spec'd endurance level for the drive.

At this rate, now that I've gone through 2 x 750 PCIe 's it seems that the only solution would be to get an enterprise class P3600 or P3700 PCIe to use as my boot drive (because even the enterprise level P3500 PCIe only has about 10 times the write endurance of this consumer 750- according to the specs).

thanks for any help / input and sorry for the long, wordy post (but this has been a really long, annoying issue).

btw; i do have a windows gadget that keeps track of process level disk writes, and it always shows Chrome and Firefox as the processes with the most writes to the 750 boot drive, and not in amounts that are anything crazy. For those wondering, the main tasks i use this desktop PC for are, Web browsing (ALOT of web browsing, as in i often have 50+ chrome windows + TABS open at a time) , alot of SSH / RDP / VNC sessions, and some other remote network type tools for work. Very little to NO gaming, and very little to NO photoshop / adobe premier work.

tks

From my 2nd "new" 750 PCIE:

97.PNG

95.PNG


CSV export AT 93 (see attached .zip file below):



CSV export AT 96 (see attached .zip file below):


From my 1st "RMA'd" 750 PCIE:
78.PNG

76.PNG
 

Attachments

  • Like
Reactions: Stux

keybored

Active Member
May 28, 2016
286
66
28
A couple of things look odd to me
1. Your write amplification numbers look insane. Your original drive has a WA of 19 and the new one has a WA of almost 37 (divide NAND bytes written by host bytes written)
2. Intel toolbox reports very odd numbers for host and NAND writes. For example, your RMAed drive shows 19,503 host bytes written in attribute F5, but Crystal Disk Info shows over 600GB written.

So, need to confirm whether Intel's SSD Toolbox is actually working properly. If it does and those are not "bytes written" but some other traffic measurement instead then this could be a firmware bug in Intel 750s. There is actually a bug in Intel's 530 series which results in very high WA numbers due to drive's background activity. Here's a thread on that: Intel SSD 530 NAND Write Problem
If those WA numbers are correct, there could be something similar going on here. The issue with Intel 530 series drives manifests only when the drive is allowed to go into sleep/wake-up cycles. A few folks "solved" the issue by creating a script that periodically reads some random files off of the drive thus not letting it fall asleep.
 
Last edited:

james23

Active Member
Nov 18, 2014
427
93
28
49
Very interesting replies! (thanks to both of you, this is exactly why i posted here!)

I have noticed that FF was writing quite a bit in terms of GB, but nothing near the 120 TB the drive is "rated" for, also i rarely use FF, but i do use chrome nearly 24/7. I was planning on sym-linking my chrome %appdata% folder to a Raid 1 volume i have (which would put all of chrome's reads and writes on a different disk, ie NOT the intel 750) - it still makes me sad that i have to do anything like this, as the entire point of having the intel 750 as my boot drive was so that my apps and OS could take advantage of its awesome speed (which wont be happening if i symlink chrome's data to a different drive :( ).

I will be back to my daily machine in a few days (out of town on biz currently) and will look into the validity of Intel Toolbox's SMART numbers, and will also look at running SSD Life to gather some more data.

I do have this bit of info to add:

From an intel REP via the intel support forums (from this thread: Intel 750 400GB "estimated life remaining" dropping quickly):
"In order to get the value of Host Bytes Written it is very important to know the Host sectors written divided by 65536 (1 count = 32 MiB). This means the value you have in the Host Bytes Written needs to be multiply by 32, same happens with the Nand Bytes Written, after you get the result then you just divide by 1024 so you can get the GB value and once again divide by 1024 to get TB."

In terms of Nand Bytes Written, (i asked my intel support rep, in my current intel support case, how to do this calculation from the intel SSD toolbox numbers, on my current 750 (the one i received as a RMA from intel about 2 weeks ago) ) his answer:
"Regarding your question, the operation will be 1178194MB / 1024 / 1024 = 1.1236TB"


so it would appear that Nand Bytes written are infact provided by the Intel 750's SMART data in MBs , not Bytes

thanks
 

keybored

Active Member
May 28, 2016
286
66
28
Unless I'm misreading something, it seems like you got two different answers on how to compute Host and NAND bytes written. The first guy says you have to multiply the number by 32 first to get MB written. The other guys says the number is already in MB.
Let's say one of them is wrong, but you still need to use the same formula to compute both values. Suppose the second guy is right for simplicity's sake. That means your new drive has 19,503MB in Host Writes and 715,350MB in NAND writes. 19GB in host writes for a new drive is pretty reasonable. You either image it or install a fresh OS on it plus some software. But the corresponding amount of NAND writes is 700GB. That is just bonkers. As I said above, that's a write amplification ratio of 36-37. That is not healthy at all. A decent controller should manage to stay somewhere between 1 and 2 and if you write mostly data that can be compressed, some controllers may even stay below 1.
 

james23

Active Member
Nov 18, 2014
427
93
28
49
I get what your saying, which is why i asked the question to the Intel tech in my support case, directly (bc i thought the formula given by the other intel rep in the community forum seemed off). I do know for a fact, that a few days after i had started using the "new" RMA 750 drive, i ran the Intel SSD Toolbox's "SSD Optimization tool - Long Run" and my Windows Process Write gadget showed the Toolbox.exe had wrote 273 GB of data to the 750 pcie (which makes sense as the "SSD Optimization tool - Long Run" tool states that it will defrag or something similar all of your un-used space (which was exactly 273GB when i ran the tool). (i might have that backwards, and the Optimization tool uses your *used* space, but either way the 273GB matched up exactly with my used or un-used space at the time). So if i have this all right, that 273GB should count towards "Host Writes" , and idealy the actual "Nand Writes" (writes to the physical flash chips) should be a much lower value than 273GB , for this particular operation.

I wish the 750's SMART data showed the actual Write Amplification value like my other intel SSD drives do, (and you are correct, most of them show 1-4 as the value for WAF , and some of these are on servers in 24/7 use).

its just such a mess and really annoying / disappointing :( Regardless of the types of writes being done, its just frustrating that there is *some type of write* that can so quickly wear the 750's flash chips down this quickly. I mean at this rate my 750 could end up at 0 in less than one year :((( In the past, when stress testing/experimenting with SATA SSDs ive even tried (using FIO and linux) to craft .fio test files that would wear down a SSDs SMART Wear Level Count (just as personal, unrelated experiments) and never could come close to doing material damage to the SMART Wearl Level Count, even after many days of 24/7 writes.

Thanks for your help with all this though, i really appreciate it. I just hope intel has a real solution at some point (and i dont feel like its going in that direction- IMO there really shouldn't be any scenario where me, or my system is to blame for the count falling this fast).
 

keybored

Active Member
May 28, 2016
286
66
28
... Regardless of the types of writes being done, its just frustrating that there is *some type of write* that can so quickly wear the 750's flash chips down this quickly. I mean at this rate my 750 could end up at 0 in less than one year :((( In the past, when stress testing/experimenting with SATA SSDs ive even tried (using FIO and linux) to craft .fio test files that would wear down a SSDs SMART Wear Level Count (just as personal, unrelated experiments) and never could come close to doing material damage to the SMART Wearl Level Count, even after many days of 24/7 writes. ...
I would ask Intel for an explanation. If the WAF is truly at 37 then there is something seriously wrong with the firmware on those drives. I very much doubt that your PC is some special snowflake capable of generating workloads that increase WAF to those levels.
 

james23

Active Member
Nov 18, 2014
427
93
28
49
ARGGGhhh... so annoying! as a test, i left my machine booted to the win10 desktop but with NO apps running, while i was on a biz trip from oct 3 to oct 8. So the machine has been sitting IDLE... I did a screen shot of the SMART data from Intel ToolBox and a CSV export Before i left (oct 3) and now that im back i did the same (screen shot, and CSV export)...
The wear endurance has fallen from 93 to 92 while IDLE for 4-5 days!!

(numbers below are copied from SMART data via Intel SSD Toolbox, i only added commas )
Oct 3 (5am):
Host Bytes Written: 35,623
NAND Bytes Written: 1,619,045
Wear Leveling Count: 93

Oct 8 (4pm):
Host Bytes Written: 36,110
NAND Bytes Written: 1,872,152
Wear Leveling Count: 92 ;(

Maybe im missing something, but since these two numbers seem to both be in the same units/scale (BYTES) , it would mean my WAF (Write Amplification Factor) was 45 on oct 3rd and after sitting idle is now at 51 on oct 8 !

Im attaching the CSV exports from oct 3 and oct 8 if anyone is interested.
Thanks
 

Attachments

james23

Active Member
Nov 18, 2014
427
93
28
49
UGHH, still no real update / help from Intel on this issue (but my ticket is still open).. and now im down to 53% wear life (and still only showing 2 TB of Drive writes, with about 12 TB of reads).

Ive even moved my entire appdata folders for Chrome and Firefox over to separate drives (as of about 2 months ago) as those folder can be pretty write intensive (and those apps run nearly 24/7 on my system).

I can watch win Resource Monitor "Disk" tab like a hawk and barely see any Writes occurring to my C: (the intel 750) and the smart data does seem to back this up.

I did also upgrade to the latest 750 pcie firmware and drivers from intel (about a month ago, ive always kept up to date on the 750's FW just in case there was a fix or change that would help my issue).. its sad that this PCIe ssd can only seem to last me about 12 months as a boot drive before it will lock and go to read-only mode (if im luck... but altesate i will know its coming :( ).

Also kinda sad that i have to keep checking ebay for a DC P 3700 Enterprise Pcie , just to run on my desktop PC as its boot drive (be id think that would have the endurance i need, but even that drive might only give me 5 or so years of life).. i could fully understand it if were something on my OS writing, but those smart numbers (now from 2 separate 750 pcie's) have to be accurate, and the writes just arent there..

So if anyone has any input, or even any experiences from running a Intel 750 Pcie , please chime in (ie what do you Wear Endurance numbers look like??)

thanks
 

james23

Active Member
Nov 18, 2014
427
93
28
49
btw, i did figure out the issue here.. (ofcourse AFTER i had bought a p3700 once i had given up on finding the cause).

the cause was win10's built in DEFRAG (it showed in gui as off/disabled), but it really was NOT. even still, with stock win10 defrag ON, it should not have caused this level of write IO (but it did somehow)


once i fully disabled it, the issues stopped. (i had already gotten down to 93 of 100 wear-out on my p3700 , after about 3 months. since making that change, im already 6 months in and still at 93).
the fix was to ENABLE defrag in the win gui (eventhough it showed it was disabled) , reboot, and THEN disable it (and reboot).

im talking about this screen/gui:
342332Capture.JPG