LSI 9260-8 Single bit ECC errors

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

misan1834

New Member
Apr 30, 2017
4
0
1
39
Hi guys - I have recently started seeing random Single bit ECC errors reported by MegaRAID. Running the latest firmware - RAID is functioning fine otherwise.

Will these errors be corrected automatically or am I looking at a replacement card?

Many thanks
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,709
517
113
Canada
You'll need to provide much more information before anyone can make an educated guess here. We need to know what mode you are running in, OS, disk array details etc :)
 

misan1834

New Member
Apr 30, 2017
4
0
1
39
My apologies!

RAID adapter - LSI 9260-8i and BBU
Disks - 6 x Toshiba 3TB (TOSHIBADT01ACA3) in a RAID 6
Latest firmware from LSI
OS - Running ESXi 6.5 with the RAID controller setup in PCI passthrough to a Windows 2012 R2 VM (4 vCPU's, 8GB RAM)
Hardware - Supermicro 3U chassis with a X7DWN+ and 128GB PC2 ECC.

The RAID adapter has been running solidly for over a year - MegaRAID reports ECC errors infrequently. I'm yet to perform a full power down on the ESXi host. (uptime has been about 4 months)

Hope this helps :)
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,709
517
113
Canada
It could be failing cache on the controller or it could simply be a screw up in reporting it. Often I have seen a simple warm boot clear matters right up with odd things like this. If it's reporting the error, it's also correcting it, at least for the moment, if it were in production rather than my own data, I would be looking at replacing that card whenever I was passing by next, for me at home I would probably shutdown and reboot a few times to see if it clears up first :)
 

misan1834

New Member
Apr 30, 2017
4
0
1
39
It could be failing cache on the controller or it could simply be a screw up in reporting it. Often I have seen a simple warm boot clear matters right up with odd things like this. If it's reporting the error, it's also correcting it, at least for the moment, if it were in production rather than my own data, I would be looking at replacing that card whenever I was passing by next, for me at home I would probably shutdown and reboot a few times to see if it clears up first :)
I'm due to perform some updates on this ESXi host this week - I shall see what the score is after a host reboot.

Many thanks for the response