Hello,
I have a Supermicro 7039A-i workstation. Motherboard is an X11DAi-N. Running an Intel Xeon Silver 4114. 64MB RAM via 4 Crucial 16GB DIMMs. No video card (using onboard video). Primary drive is a Samsung EVO 860 SATA SSD. Also using a Samsung EVO 970 M.2 SSD and a Western Digital Black 6TB hard drive. Running Debian Linux 11.4 with kernel 5.10.136-1.
This system has been running flawlessly for nearly three years. Recently it has been locking up solid, at a frequency of around once every three days. I cannot determine a pattern to when it happens and there are no diagnostic messages from the kernel- not on the console or any log files. I've done three runs of memtest86 and they have not failed.
I did a thorough cleaning yesterday, but a lockup happened again this morning.
Through IPMI I've monitored the system's sensors on auto-refresh and they all fall within thresholds up to the point of the lockup.
The only possible clue I have is from the IPMI system event log. At each lockup event, it has:
Sensor Name: (blank) Sensor Type: Processor Description: IERR - Assertion
I honestly have no idea where to even start troubleshooting.
Would anyone happen to have any suggestions?
I have a Supermicro 7039A-i workstation. Motherboard is an X11DAi-N. Running an Intel Xeon Silver 4114. 64MB RAM via 4 Crucial 16GB DIMMs. No video card (using onboard video). Primary drive is a Samsung EVO 860 SATA SSD. Also using a Samsung EVO 970 M.2 SSD and a Western Digital Black 6TB hard drive. Running Debian Linux 11.4 with kernel 5.10.136-1.
This system has been running flawlessly for nearly three years. Recently it has been locking up solid, at a frequency of around once every three days. I cannot determine a pattern to when it happens and there are no diagnostic messages from the kernel- not on the console or any log files. I've done three runs of memtest86 and they have not failed.
I did a thorough cleaning yesterday, but a lockup happened again this morning.
Through IPMI I've monitored the system's sensors on auto-refresh and they all fall within thresholds up to the point of the lockup.
The only possible clue I have is from the IPMI system event log. At each lockup event, it has:
Sensor Name: (blank) Sensor Type: Processor Description: IERR - Assertion
I honestly have no idea where to even start troubleshooting.
Would anyone happen to have any suggestions?