I struggled getting my M11SDV-8C-LN4F (Supermicro Epyc 3251 motherboard) stable for several months. Every ~2 weeks the M.2 SSD (and in fact the entire PCI slot) would just disappear from the system. Resetting the system from IPMI would not correct the problem. lspci -v would just stop reporting the PCI slot. A full power off/power on cycle would fix the issue until it happened again, as soon as an hour later and as late as a month later.
I ruled out drive overheating, I reproduced the issue in multiple OSes, I tried multiple memory configurations, and I tried multiple M.2 drives (HP EX900 120GB and Intel 760p). When attempting to RMA the board, Supermicro asked that I try a tested compatible M.2 Drive and, despite being skeptical it would fix anything, I gave it a shot.
After 100 days of uptime I'm ready to declare that this new drive (Toshiba XG5-P / HDS-TMN0-KXG50PNV2T04) appears to have made the system stable. I'm surprised to see a Supermicro board being unstable b/c of which M.2 drive I used, but I guess it happens. Morale of the story? Use a recommended M.2 drive.
I figured I'd post this in the extremely unlikely chance it will help someone.
I ruled out drive overheating, I reproduced the issue in multiple OSes, I tried multiple memory configurations, and I tried multiple M.2 drives (HP EX900 120GB and Intel 760p). When attempting to RMA the board, Supermicro asked that I try a tested compatible M.2 Drive and, despite being skeptical it would fix anything, I gave it a shot.
After 100 days of uptime I'm ready to declare that this new drive (Toshiba XG5-P / HDS-TMN0-KXG50PNV2T04) appears to have made the system stable. I'm surprised to see a Supermicro board being unstable b/c of which M.2 drive I used, but I guess it happens. Morale of the story? Use a recommended M.2 drive.
I figured I'd post this in the extremely unlikely chance it will help someone.