A couple weeks ago I assembled two identical servers. They each have the following:
They are both "headless" and being run from a networked Windows workstation via IPMI View and the Java iKVM Viewer.
Ubuntu 13.10 Server installed normally on both servers and has run without apparent errors during configuration and testing. Even so, one server has exhibited a couple of anomalies:
Anomaly 1:
Memtest86 was started on both servers about midnight. This morning two passes had completed with ECC off. Memtest86 found no errors. However, while those tests were running the following error records appeared in the IPMI View "System Event Log" for one of the Servers (time is UTC):
Notice they occurred two seconds apart, but one was in DIMMA1 and the other was in DIMMB1.
Question 1: Why did Memtest86 not find those errors while testing with ECC off? It seems that the data read should have been different than the data written without error correction.
Question 2: Why would there happen to be two unassociated errors two-seconds apart in different DIMMs over a many-hour test? That is possible of course, but it seems more likely that they are associated.
Question 3: Have you seen memory errors in the IPMI View "System Event Log" during memory testing that were not logged by Memtest86?
Anomaly 2:
IPMI View always displays voltage, temperature and fan data for one of the servers, regardless of whether the iKVM Console is running, but not for the other. That data is displayed for both servers immediately after logging into to them via IPMI. However, the data display for one server (the one that logged memory errors) disappears after launching the KVM Console.
Question 4: Have you seen that happen or have any idea of cause?
Supermicro X10SLM+-LN4F Motherboard
Intel Xeon E3 1230 V3 3.3G 4C 8T 8M CPU
4x Kingston KVR16LE11/8EF Memory
Intel Xeon E3 1230 V3 3.3G 4C 8T 8M CPU
4x Kingston KVR16LE11/8EF Memory
They are both "headless" and being run from a networked Windows workstation via IPMI View and the Java iKVM Viewer.
Ubuntu 13.10 Server installed normally on both servers and has run without apparent errors during configuration and testing. Even so, one server has exhibited a couple of anomalies:
Anomaly 1:
Memtest86 was started on both servers about midnight. This morning two passes had completed with ECC off. Memtest86 found no errors. However, while those tests were running the following error records appeared in the IPMI View "System Event Log" for one of the Servers (time is UTC):
1,System Event,03/01/2014 09:18:21 Sat,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMA1(CPU1)
2,System Event,03/01/2014 09:18:23 Sat,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB1(CPU1)
2,System Event,03/01/2014 09:18:23 Sat,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB1(CPU1)
Notice they occurred two seconds apart, but one was in DIMMA1 and the other was in DIMMB1.
Question 1: Why did Memtest86 not find those errors while testing with ECC off? It seems that the data read should have been different than the data written without error correction.
Question 2: Why would there happen to be two unassociated errors two-seconds apart in different DIMMs over a many-hour test? That is possible of course, but it seems more likely that they are associated.
Question 3: Have you seen memory errors in the IPMI View "System Event Log" during memory testing that were not logged by Memtest86?
Anomaly 2:
IPMI View always displays voltage, temperature and fan data for one of the servers, regardless of whether the iKVM Console is running, but not for the other. That data is displayed for both servers immediately after logging into to them via IPMI. However, the data display for one server (the one that logged memory errors) disappears after launching the KVM Console.
Question 4: Have you seen that happen or have any idea of cause?
Last edited: