It was the nvmes. One time I sat by the server watching it die and had the opportunity to open event viewer.
Both Samsung 990 had the same issue I guess it's a driver or firmware issue (yes newest firmwares were installed).
Swapped disks had no issues anymore.
yes, read if it is enabled by default but was unsure so i wrote the settings to disk and looked at the eccpoll setting in the mt86.cfg which is default set to 1. TL;DR; yes it is
ok just fiddled a bit around if something is extraordinary hot and if you look at page 10 here https://www.supermicro.com/manuals/motherboard/EPYC7000/MNL-2314.pdf the heatsink next to LEDSAS is extremly hot.
it was the exhaust fan and optimal speed went into "reporting" as it was not spinning fast enough, full speed just yanks up all the fans and standard speed makes it green and again. i don't think that overheating is an issue as the server is hardly under load and temps were always ok when i...
just wanted to say: i appreciate you, thanks!
ok, that i could test with a bios downgrade. i let this run the last 40% and then see if its still there after that downgrade and test with multi core again.
did not check till now. The idea was to rule out ECC issues first and then have a second run with all cores / threads enabled. (not sure if its correct if memtest tells me it has found 16 cpus, if it is supposed to count the cpu and the threads), on my list to look that up
by this you mean lsi sas logs and ipmi health logs right?
LSI sas logs has nothing and ipmi health logs has some fan issues with fan 5 and two less speed, but only after the crash maybe it has something to do with then going to uefi shell. anyway there is this:
nothing which leads me into any...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.