Search results

  1. J

    amd epyc rome on h122ssl-c random crashes

    It was the nvmes. One time I sat by the server watching it die and had the opportunity to open event viewer. Both Samsung 990 had the same issue I guess it's a driver or firmware issue (yes newest firmwares were installed). Swapped disks had no issues anymore.
  2. J

    amd epyc rome on h122ssl-c random crashes

    just a note secure boot is disabled
  3. J

    amd epyc rome on h122ssl-c random crashes

    second run did finish too with 0 errors
  4. J

    amd epyc rome on h122ssl-c random crashes

    finished memtest: 0 errors, 0 ecc errors. sigh!
  5. J

    amd epyc rome on h122ssl-c random crashes

    im at the end of any idea.
  6. J

    amd epyc rome on h122ssl-c random crashes

    two complete pcs in different locations. one has a zippy redundant psu the other a normal 650 watt. did not swap them.
  7. J

    amd epyc rome on h122ssl-c random crashes

    yes, read if it is enabled by default but was unsure so i wrote the settings to disk and looked at the eccpoll setting in the mt86.cfg which is default set to 1. TL;DR; yes it is
  8. J

    amd epyc rome on h122ssl-c random crashes

    nothing till now, is currently at 64% with old bios
  9. J

    amd epyc rome on h122ssl-c random crashes

    again with 2.1 uefi firmware bug is still persistent, when blockmoving test is running with 64bytes blocks it immediately starts
  10. J

    amd epyc rome on h122ssl-c random crashes

    yes getting [UEFI Firmware error] could not start CPU 4 alog with 5 and 7 now trying to downgrade to 2.1
  11. J

    amd epyc rome on h122ssl-c random crashes

    ok just fiddled a bit around if something is extraordinary hot and if you look at page 10 here https://www.supermicro.com/manuals/motherboard/EPYC7000/MNL-2314.pdf the heatsink next to LEDSAS is extremly hot.
  12. J

    amd epyc rome on h122ssl-c random crashes

    it was the exhaust fan and optimal speed went into "reporting" as it was not spinning fast enough, full speed just yanks up all the fans and standard speed makes it green and again. i don't think that overheating is an issue as the server is hardly under load and temps were always ok when i...
  13. J

    amd epyc rome on h122ssl-c random crashes

    just wanted to say: i appreciate you, thanks! ok, that i could test with a bios downgrade. i let this run the last 40% and then see if its still there after that downgrade and test with multi core again.
  14. J

    amd epyc rome on h122ssl-c random crashes

    did not check till now. The idea was to rule out ECC issues first and then have a second run with all cores / threads enabled. (not sure if its correct if memtest tells me it has found 16 cpus, if it is supposed to count the cpu and the threads), on my list to look that up
  15. J

    amd epyc rome on h122ssl-c random crashes

    by this you mean lsi sas logs and ipmi health logs right? LSI sas logs has nothing and ipmi health logs has some fan issues with fan 5 and two less speed, but only after the crash maybe it has something to do with then going to uefi shell. anyway there is this: nothing which leads me into any...
  16. J

    amd epyc rome on h122ssl-c random crashes

    i meant its the same type but i just looked at hwinfo64 and they are slightly different. 7252 and in the first build 7262. thought i ordered the same but the second was a bit more budgeted.
  17. J

    amd epyc rome on h122ssl-c random crashes

    The issues with "testing" is, it always crashes on weekends or at least 5-7 days after a clean boot. i've no idea what could trigge this. all my options of "that could be something possible" like overheating, disk issues (by disk or type of mounting / attaching) are of the table.
  18. J

    amd epyc rome on h122ssl-c random crashes

    two bad cpus would be possible but unlikely. the firstbios is at 2.5 the other at 2.4 the first has 4x8 gig of rams Samsung M393A1K43DB2-CWE the other 2x 16 gigs of ram M393A2K43DB3-CWE RDIMM currently memtest86+ pro is running with ecc check and nothing pops up. Again the system runs with...
  19. J

    amd epyc rome on h122ssl-c random crashes

    just ordered memtest86 pro will test that
  20. J

    amd epyc rome on h122ssl-c random crashes

    ok server again crashed on a sunday between 12 and 15, need to check bmc later to see what time exactly. So global c state is out of the equation.