I came across the following errors in my server syslogs:
I then checked my IPMI event logs and see this:
Time to RMA the module?
Code:
kernel: mce: [Hardware Error]: Machine check events logged
kernel: [Hardware Error]: Corrected error, no action required.
kernel: [Hardware Error]: CPU:2 (17:31:0) MC17_STATUS[Over|CE|MiscV|-|AddrV|-|-|SyndV|-|CECC]: 0xdc2040000000011b
kernel: [Hardware Error]: Error Addr: 0x000000034c5beec0
kernel: [Hardware Error]: IPID: 0x0000009600450f00, Syndrome: 0x400040000a801201
kernel: [Hardware Error]: Unified Memory Controller Extended Error Code: 0
kernel: [Hardware Error]: Unified Memory Controller Error: DRAM ECC error.
kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
Time to RMA the module?