Avoid Samsung 980 and 990 with Windows Server

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

pimposh

hardware pimp
Nov 19, 2022
269
153
43
There goes my "I only trust Intel and Samsung SSD's" shopping philosophy, as they are the only brands I've ever used that haven't failed on me.

Intel exited the market, and Samsung seems to have gone to shit. Now I have no idea what brands I can actually trust.
Since you still put interest into consumer grade drives, take your look here and read valuable comments from malventano.

Depending on case usage scenario actually you might want to avoid Samsungs and favor other brands.
 
Last edited:
  • Like
Reactions: Fritz and CyklonDX

CyklonDX

Well-Known Member
Nov 8, 2022
1,229
425
83
Since you still put interest into consumer grade drives, take your look here and read valuable comments from malventano.

Depending on case usage scenario actually you might want to avoid Samsungs and favor other brands.
this looks curious - and i'm like 99% sure thats the case - but i don't see same thing on my hynix p41 -- i did however see it on samsung's.
 

pimposh

hardware pimp
Nov 19, 2022
269
153
43
Not using Windows Server environment myself so not sure how and if that is really related to OP issue but can imagine drive dropouts due to inefficient let’s name it, IO handling.

But in Linux environment i found myself that some (970evo’s and 980pro’s in my case) fully (capacity-20% OP) loaded drives started to behave very oddly (to say, slow as f.k) to point where i was nearly thinking they gone dead…. until full nvme wipe when they got reviwed.

tldr - all goes down to - for real prod - real (enterprise) drives usually are less headache… at least they do not get us into unwanted troubles, and thats the only proper way to go… and as usual not always (time bug@Samsung drives, and so on…)
 
Last edited:
  • Wow
Reactions: ColdCanuck

CyklonDX

Well-Known Member
Nov 8, 2022
1,229
425
83
what i did exp. however was a drop in performance on samsungs ssd's when disks are 60% full, 80% and 90% full - their performance decreased to like 250MB/s at 90% -- even on samsung 883's. (ssd's not nvme)
i assume same thing on nvme's...
Hynix seems to be like always 80-90% of performance
Crucial mx500's always like 90% of performance
intel be like always 95% of performance

but some bottom drawer ssd's and nvme's i assume have same issue like samsung ssd's.
 

nexox

Well-Known Member
May 3, 2023
1,341
615
113
I haven't had a modern Samsung drive in my hands to benchmark, but based on the way their older consumer SSDs worked and from what I read of the newer ones, they work hard to optimize initial performance, so they have further to fall when little things like "storing some data" get in the way of all the firmware tricks to speed up writes.
 

jthm

New Member
Mar 3, 2016
6
5
3
123
I also experienced hard lockups in Server 2019 & 2022 running late model Samsung SSDs. Using a different brand in same hardware without trouble.

I was also affected by several 2TB 990 Pro w/ heatsinks in client systems randomly rebooting -- replaced with P41s and no trouble since.

That is a large financial loss to have a pile of 990 Pros I cannot trust -- Samsung is on the avoid list for me now.
 
  • Wow
Reactions: pimposh

semicycler

New Member
Sep 8, 2022
5
5
3
Found this thread, it matches my issue exactly. Win Server 2022 with a Samsung 4TB 990 Pro m.2 sitting idle at the logon screen. Completely new install without any other apps running. About two days after boot it randomly resets to the BIOS screen and the 990 Pro is missing. A power reset brings the drive back.

For reference this happens with Win 10 under the same circumstances - clean install sitting at the logon screen, within three days it boots to the BIOS screen and the m.2 is missing. A hard reset brings it back.

Likewise booting to a live MemTest USB flash stick and testing the 64GB of DDR memory runs for days without crashing. No memory errors either.

I suspect its a problem between the Microsoft default NVMe driver in Server 2022/Win 10 and the Samsung 990 Pro M.2 firmware.

I have not tried the 'idle' test using a clean Ubuntu install yet nor with Win 11. I also want to move the m.2 from direct PCIe lanes to the CPU via the M2_1 slot on my motherboard to under the chipset PCH controller in one of the other M2 slots on the motherboard. Other thoughts are to try forcing the PCIe generation from 'auto' to 'Gen4' or 'Gen3' in the BIOS settings to see if the symptoms change.

The hardware is a 12th gen i9-12900KS on a Z790 Asus motherboard with the latest BIOS and Intel ME firmware. The Samsung 990 Pro was updated to the latest firmware before testing too - don't know the exact version off the top of my head but definitely the latest as of this posting.
 
Last edited:
  • Like
Reactions: pimposh and Fritz

mattlach

Active Member
Aug 1, 2014
404
164
43
Well, for what it is worth, I have been using SATA Samsung 850 EVO and Pro drives in my Linux server for a decade without issues.

I installed my first four Samsung NVMe drives in my server in late December, and thus far it is smooth sailing.

Two 500GB 980 Pro drives are mirrored using ZFS and serving as boot drives. These admittedly see very light use. They have ~380GB of writes each thus far and are only at 0.5% capacity, so not much load, but they are working perfectly.

Two 1TB 980 Pro drives are mirrored using ZFS and are serving as VM/Container datastores. These have ~658GB of writes each, and are at ~10.3% capacity. They have about 12 VM's running off of them, with no issues at all thus far.

Performance is good on both of them, but admittedly average load is on the light side on this server.

OS is Debian Bookworm.

I'm leaning towards that this might be a Windows Server issue.
 

SRussell

Active Member
Oct 7, 2019
327
154
43
US
I also experienced hard lockups in Server 2019 & 2022 running late model Samsung SSDs. Using a different brand in same hardware without trouble.

I was also affected by several 2TB 990 Pro w/ heatsinks in client systems randomly rebooting -- replaced with P41s and no trouble since.

That is a large financial loss to have a pile of 990 Pros I cannot trust -- Samsung is on the avoid list for me now.
How many Samsung devices did you purchase? Any chance your VAR would replace them?
 

SRussell

Active Member
Oct 7, 2019
327
154
43
US
Found this thread, it matches my issue exactly. Win Server 2022 with a Samsung 4TB 990 Pro m.2 sitting idle at the logon screen. Completely new install without any other apps running. About two days after boot it randomly resets to the BIOS screen and the 990 Pro is missing. A power reset brings the drive back.

For reference this happens with Win 10 under the same circumstances - clean install sitting at the logon screen, within three days it boots to the BIOS screen and the m.2 is missing. A hard reset brings it back.

Likewise booting to a live MemTest USB flash stick and testing the 64GB of DDR memory runs for days without crashing. No memory errors either.

I suspect its a problem between the Microsoft default NVMe driver in Server 2022/Win 10 and the Samsung 990 Pro M.2 firmware.

I have not tried the 'idle' test using a clean Ubuntu install yet nor with Win 11. I also want to move the m.2 from direct PCIe lanes to the CPU via the M2_1 slot on my motherboard to under the chipset PCH controller in one of the other M2 slots on the motherboard. Other thoughts are to try forcing the PCIe generation from 'auto' to 'Gen4' or 'Gen3' in the BIOS settings to see if the symptoms change.

The hardware is a 12th gen i9-12900KS on a Z790 Asus motherboard with the latest BIOS and Intel ME firmware. The Samsung 990 Pro was updated to the latest firmware before testing too - don't know the exact version off the top of my head but definitely the latest as of this posting.
Are any of the Samsung drives SSD or are all they all m.2?

Any idea who is the target market for Samsung PRO drives?
 

mattlach

Active Member
Aug 1, 2014
404
164
43
Are any of the Samsung drives SSD or are all they all m.2?
M.2 drives are SSD's :p

I think you mean SATA drives. :p

There are SATA SSD's, and there are NVMe SSD's. M.2 drives can be either (but most newer ones are NVMe)

Any idea who is the target market for Samsung PRO drives?
I tend to think of Samsung's Pro drives as "pro-sumer" tech. In other words, intended to be found in Workstation, HEDT and high end consumer (who are we kidding, "gaming") machines.

If you ask Samsung they probably won't call them server drives, but as far as client drives go, they have a very good reputation. Or at least used to back before the 980 and 990 Pro's had a well publicized firmware bug that caused serious write amplification issues. (which has since been patched, make sure you update the firmware on your drives using "Samsung Magician", folks)

Heck, I've been buying SSD's since ~2010. The only two brands that I have never had fail on me have been Intel and Samsung drives. (Well, technically my Inland Premium drives havent failed either, but the sample size isn't as large) Every other brand I've bought has had failures.

The worst were the old OCZ SSD's back in the day. Never had one of those last more than 2 years. They would die like clockwork. I've also had a terrible time with Sabrent Rocket 4 drives. 100% failure rate in the 2-3 year time period. (albeit a smaller sample size)

Samsung and Intel - on the other hand - every last one I have ever bought, no matter how hard I have punished them, is still alive and well. And it has probably been ~12 years now.

So, the problem here seems to be Windows Server, not the Samsung drives. I've punished the crap out of a large variety of Samsung Pro drives, 840 Pro (SATA), 850 Pro (SATA), 970 Pro (NVMe), 980 Pro (NMVe), 990 Pro (NVMe) heck, and even some EVO drives for years under Linux (Workstation and Server), and client Windows installs. I've used them as cache devices on ZFS pools, as scratch disks, as VM datastores, you name it.

I have noticed some slight performance degradation as they start filling up (primarily on writes) but I have never even once had a Samsung Pro drive drop out, disappear, reset, crash or lose data. And I have used a pretty decent sample size of the things, much much greater than your typical forum anecdotal experience.

The common denominator in this thread seems to be related to Windows Server installs. Maybe the NVMe driver is somehow different in the Server edition than in 7/10/11?
 
Last edited:

throwaway5325235633

New Member
Feb 8, 2024
1
2
3
I have a 980 pro and run Linux, I've been experiencing similar instabilities caused by high IO. I was able to reproduce this using `bonnie++ -u root -d /tmp/bonnie -r 32072 -c 32 -n 5240000:1048576:256:1048576`, this crash happens in under 5-10 minutes. I have experienced infrequent crashes which I have often assumed were caused by high IO but never determined the exact cause until this thread and my above test.

My case and airflow is fairly poor, my device did not come with a heatsync. I can fix both these issues, however, I'd like to address the following comments, and to be clear, I'm not throwing shade at the authors, but rather pointing out flaws in the opinions:

(CyklonDX)
so this issue is related to overheating; when its beyond thermal threshold it will shutdown sensor, and later after some time the controller (and disk will 'die' until reboot). This is common issue with nvme's.

Put decent heatsink, or decent airflow over the nvme. They aren't made for constant r/w but bursts.
(drdepasquale)
Servers should be using enterprise grade drives and should be actively cooled if they run too hot.
I fundamentally disagree with both of these opinions - the issue is not overheating, it is that the NVME either climbs to 'unsafe-shutoff' temperature before it correctly thermal throttles, or it does not thermal throttle at all. Furthermore, whether this is a consumer device or not should not impact on its' ability to thermal throttle in order to remain stable - if anything I would expect enterprise grade kit to support some form of override on throttling behavior. If my CPU were to fail in a similar manner with an OEM heatsync, I would call it defective, and I think this device in the form it has been sold is defective also.

Samsung either did not test this product sufficiently, or were aware of this defect and did not seek to resolve the issue - which they could have done via an OEM heatsync (on ALL skus), or by implementing better thermal throttling management within firmware (E.g., throttling at an earlier set-point if the device temperature is rapidly climbing). As it stands, I could not ascertain anything meaningful in `dmesg` prior to a crash, and the crash dump messages did not appear meaningful either (caveat: I'm not a kernel developer, but it certainly did not say anything obvious such as 'your primary drive has overheated and disconnected').

I will install a heatsync and increase airflow and report back, but I do not think this 'solves' the issue of samsung releasing a defective product. Even if this prevents test cases from reproducing, I now cannot guarantee that my SSD or my computer is stable under load as my test case will only reduce but not eliminate the problem. I've lost approximately a hundred hours worth of time due to either lost work or troubleshooting / attempting to identify / testing and now fixing the issue and I'm quite annoyed that Samsung are either incompetent or profit focused enough they're willing to ship defective products.
 
Last edited:

semicycler

New Member
Sep 8, 2022
5
5
3
I also want to move the m.2 from direct PCIe lanes to the CPU via the M2_1 slot on my motherboard to under the chipset PCH controller in one of the other M2 slots on the motherboard. Other thoughts are to try forcing the PCIe generation from 'auto' to 'Gen4' or 'Gen3' in the BIOS settings to see if the symptoms change.

The hardware is a 12th gen i9-12900KS on a Z790 Asus motherboard with the latest BIOS and Intel ME firmware. The Samsung 990 Pro was updated to the latest firmware before testing too - don't know the exact version off the top of my head but definitely the latest as of this posting.
Update - playing with BIOS settings did nothing. I still get my setup to fail in roughly 2-3 days just sitting idle running Server 2022. Changed power settings, PCIe from Auto to forced Gen4, etc. and the dropped 990 Pro after a few days problem remains.

BUT - moving from the M2_1 slot directly connected to the i9-12900KS CPU to the M2_2 slot under the Z790 chipset and no failures yet after nearly 6 days, very promising. This 12th gen processor has 20x PCIe lanes exposed - 16x to the GPU slot, 4X to the M2_1 slot. The PCH is connected by x8 DMI lanes to the CPU so throughput is not an issue when using the M2_2 slot. I'm hoping this is a viable workaround. Stay tuned!
 
Last edited:

Marjan

New Member
Nov 6, 2016
26
7
3
Interesting problem, got me intrigued a bit. I bought 980 pro few months ago, and no issues at all under Windows 2022.
It's not boot disk, it has latest firmware, and it's in consumer motherboard with AMD CPU.
 

semicycler

New Member
Sep 8, 2022
5
5
3
BUT - moving from the M2_1 slot directly connected to the i9-12900KS CPU to the M2_2 slot under the Z790 chipset and no failures yet after nearly 6 days, very promising. This 12th gen processor has 20x PCIe lanes exposed - 16x to the GPU slot, 4X to the M2_1 slot. The PCH is connected by x8 DMI lanes to the CPU so throughput is not an issue when using the M2_2 slot. I'm hoping this is a viable workaround. Stay tuned!
Well bad news, it dropped at day nine attached under the Z790 chipset. Other than swapping out the 12th gen CPU for a 14th gen one, and/or swapping motherboard brands/models I'm running out of things to try. I'm about to give up, sell the Samsung 990 Pro and buy a different brand m.2 for this machine.

I'm curious for others experiencing this problem - 990 Pro M.2 dropping randomly with Win Server 2022 requiring a reboot to recover - are you using an Intel or AMD CPU? And what motherboard make/model? Just trying to see if it happens across the board or only with certain CPUs and motherboard manufacturers.
 

Fritz

Well-Known Member
Apr 6, 2015
3,545
1,499
113
70
Just checked my Server 2022 box and it has an Intel Optane boot drive. I do have others with various Samsung's, all the EVO's are earlier models.
 

tinfoil3d

QSFP28
May 11, 2020
901
427
63
Japan
Chiming in with own nightmare experience with 990 pro 4tb.
I bought 4 of these, put them into R730xd, into supermicro 2m2 cards(because asus hyper gen 4 doesn't work in this system, tried everything), by this day I've had:
1 random reboot
2 overtemp events where they reported overtemp in syslog but no actual reading is logged, after which they fell off the bus. I fixed overtemps(I think?) with flashing latest firmware.
But that didn't help with 1 more random reboot
Almost every time it's a different drive!

Weeks of wasted time and wasted money. IDK what to do about these.