Consumer SSD drives in Server

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Roelf Zomerman

Active Member
Jan 10, 2019
147
27
28
blog.azureinfra.com
I've been told not to put consumer drives in my server as they would die really fast.. I was wondering what part of that statement is true and validated.
I understand Enterprise grade SSD' drives will last longer, due to the enlarged spare capacity, but why would server workloads be so different from "regular" laptop use?

My idea is to create storage spaces with the SSD's as cache drives or tiered storage using SSD's.
 

alex_stief

Well-Known Member
May 31, 2016
884
312
63
38
From a pure write endurance standpoint, there are "server-grade" SSDs worse than consumer SSDs. I think the initial statement is a bit oversimplified. I'd rather say "use appropriate hardware for your use-case". Of course, there are more factors to consider than just TBW when choosing an SSD for a server application.
 
  • Like
Reactions: Iaroslav

Iaroslav

Member
Aug 23, 2017
111
24
18
37
Kyiv
We had that widely used in our servers with 10/1 read/write ratios and that's what I may say - if your data is backed up or not critical that makes sense, at least for new drives. Sooner or later one may see if he needs enterprise disks.
Our biggest fail was with Samsung 2Tb 850 pro's - we have 30 of them and after nearly 30k hours and 15TBW all of them started to degrade and fail.
Aside we run nearly 40 used 2Tb STEC's from ebay for a 1.5 years and no problem with them. And now we came to that point - no more consumer drives in servers!
If you dare - Intel, Kingston (though they now have DC series as well), Crucial (Micron) consumer drives caused us no headache compared to Samsung and OCZ.
 

ajs

Active Member
Mar 27, 2018
101
36
28
Minnesota
We had that widely used in our servers with 10/1 read/write ratios and that's what I may say - if your data is backed up or not critical that makes sense, at least for new drives. Sooner or later one may see if he needs enterprise disks.
Our biggest fail was with Samsung 2Tb 850 pro's - we have 30 of them and after nearly 30k hours and 15TBW all of them started to degrade and fail.
Aside we run nearly 40 used 2Tb STEC's from ebay for a 1.5 years and no problem with them. And now we came to that point - no more consumer drives in servers!
If you dare - Intel, Kingston (though they now have DC series as well), Crucial (Micron) consumer drives caused us no headache compared to Samsung and OCZ.
What was the failure on these drives? They should be able to handle way more than 15TBW.
 

acquacow

Well-Known Member
Feb 15, 2017
784
439
63
42
Your biggest issue with consumer ssds is no internal capacitors to ensure data in flight gets parked correctly during a sudden crash/power outage.
 
  • Like
Reactions: Aestr

Spartacus

Well-Known Member
May 27, 2019
788
328
63
Austin, TX
Consumer drives die quickly in a server because of poor endurance calculation, on the flip side they often degrade quickly if the OS/raid controller is unable to properly send the trim commands to the drives.

I understand Enterprise grade SSD' drives will last longer, due to the enlarged spare capacity, but why would server workloads be so different from "regular" laptop use?
Server workloads run 24/7 so equipment and drives are more robustly built to be capable of such (compared to a desktop/laptop that is generally 8-12 hours a day at most the rest being off or idle), additionally they can often be in high heat environments and are made to withstand that. Server workloads often are expected to run close to if not completely full (75+%) the additional space you mentioned gives it plenty of room to properly trim itself to ensure it's rated speed as well as generally giving it a substantially higher TBW/PBW write endurance. (some enterprise ssd have the trim capability built into the drive so it doesn't have to rely on the OS)

What was the failure on these drives? They should be able to handle way more than 15TBW.
The 2TB are rated for about 450TBW and 10 years @Iaroslav if you still have them you can likely warranty replace them.
One question though, were these drives able to properly trim/garbage collect? (and were they 80+% full?)
If they weren't that was potentially the cause of the degradation can't speak to the failure because they're rated for millions of hours and a much higher TBW than 15TB.
 

i386

Well-Known Member
Mar 18, 2016
4,220
1,540
113
34
Germany
but why would server workloads be so different from "regular" laptop use?
"Server workloads" constantly hammer the storage with io. The performance of consumer ssd (especially sata ones) dramatically decreases to the point where you think that you're using spinning rust (latencies of 200+ ms for random io or throughput of <40mbyte/s for sequential io).

Newer consumer nvme ssds perform far better but there is no guarantee that your data is written safely by the ssd controller to the nand in case of power loss, brownouts etc..
 

Iaroslav

Member
Aug 23, 2017
111
24
18
37
Kyiv
What was the failure on these drives? They should be able to handle way more than 15TBW.
Typical disk from the batch
Code:
Device Model:     Samsung SSD 850 PRO 2TB
LU WWN Device Id: 5 002538 c7000f67a
Firmware Version: EXM02B6Q
Sector Size:      512 bytes logical/physical

 5 Reallocated_Sector_Ct   0x0033   099   099   010    Pre-fail  Always       -       84
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       34849
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       123
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       17
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   099   099   010    Pre-fail  Always       -       84
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   099   099   010    Pre-fail  Always       -       84
187 Reported_Uncorrect      0x0032   088   088   000    Old_age   Always       -       120900
190 Airflow_Temperature_Cel 0x0032   075   063   000    Old_age   Always       -       25
195 Hardware_ECC_Recovered  0x001a   001   001   000    Old_age   Always       -       120900
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       55
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       29618932846

Error 65531 occurred at disk power-on lifetime: 34842 hours (1451 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 51 01 10 00 00 00  Error:  at LBA = 0x00000010 = 16
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 22 aa d2 00 00      00:58:05.903  READ FPDMA QUEUED
  60 00 00 22 a8 d2 00 00      00:58:05.903  READ FPDMA QUEUED
 

Iaroslav

Member
Aug 23, 2017
111
24
18
37
Kyiv
The 2TB are rated for about 450TBW and 10 years @Iaroslav if you still have them you can likely warranty replace them.
One question though, were these drives able to properly trim/garbage collect? (and were they 80+% full?)
If they weren't that was potentially the cause of the degradation can't speak to the failure because they're rated for millions of hours and a much higher TBW than 15TB.
Thank you for your good questions!
And now I'm deep in the rabbit hole... Disks are 85-95% full, with rare rewrites.
Continuous trim was not executed all this time (discard option in fstab).
It isn't working because this model is probably in the blacklist for queued trim
When Solid State Drives are not that solid | Algolia Blog
Bug #1449005 “trim does not work with Samsung 840 EVO after firm...” : Bugs : fstrim package : Ubuntu
magician -T for linux is of no use either - This feature is not supported for disks connected to LSI RAID HBA Card.
If I enable trim with a patch or use older firmware and decrease disk space usage, would it now help to avoid errors and further degradation?
 

Deslok

Well-Known Member
Jul 15, 2015
1,122
125
63
34
deslok.dyndns.org
For another point of data I've had good results with the earlier intel drives(a lot of 320's which had capacitors although not as good as proper server stuff) and we're using intel S4510 3.84tb drives that were picked up used on ebay in our primary storage server. The secondary I've got full of 2TB MX500's(14 of them) without a hiccup for a year now, but I still trust the Intel drives more. Most of my workloads are read intensive so the TBW on the MX500's even was more than enough for ~10 years when I worked it out.