Hi there, I have a strange problem with RAID on a Supermicro X10DRL-i that has 2 x S4510 in RAID1 and 2 x WD RED in RAID1.
Sometimes after 1-2 weeks one disk/ssd is thrown out of the RAID randomly, SMART reports show no read/write errors or bad sectors.
The board has 4 x S-SATA and 6 x I-SATA connectors on the board and the HDD/SSD are connected to the I-SATA.
The OS is Windows Server 2019 with latest drivers/firmware/bios
So looking at the event viewer I saw that between 1:00AM/PM and 1:30AM/PM I get multiple errors listed bellow, I don't know what happers between this time and even entered task schedule and checked them one by one and non of them start between 1:00AM/PM and 1:30AM/PM.
Another time I get this errors is after each backup is finished (Windows Server Backup), for example the backup is schduled to start at 3:00AM and it finishes at 3:57AM at the moment the backup is finished I am spammed with the errors listed bellow.
After multiple errors sometimes one of the disks is thrown out of the RAID.
There are days when I don't get any errors even after a backup.
Event ID 4155 "I/O on "DISK_NAME" has failed."
Event ID 51 "An error was detected on device \Device\"DISK_NAME" during a paging operation."
NOTE: "DISK_NAME" is the name of the disk wich is random
What I did:
CHKDSK
Stress tested the RAID's for hours
Different versions of controller driver
Changed OS WS2012R2>WS2019
Updated BIOS
Replaced SATA Cables
Connected drives directly (Bypassing the backplane CSE-825TQ-R740LPB)
PS: Previus to the upper setup I had 2 x Samsung850 Evo instead of the S4510 with Windows Server 2012 R2 and one day one of the ssd's was thrown out of the raid and I wanted to replace it the next day because I thought it was bad but after running the nightly backup it failed the second ssd's and the RAID was gone.
I thought that the problem was the samsung 850 evo's but after purchesing the S4510 I was wrong, the problem relies somewhere else.
Sometimes after 1-2 weeks one disk/ssd is thrown out of the RAID randomly, SMART reports show no read/write errors or bad sectors.
The board has 4 x S-SATA and 6 x I-SATA connectors on the board and the HDD/SSD are connected to the I-SATA.
The OS is Windows Server 2019 with latest drivers/firmware/bios
So looking at the event viewer I saw that between 1:00AM/PM and 1:30AM/PM I get multiple errors listed bellow, I don't know what happers between this time and even entered task schedule and checked them one by one and non of them start between 1:00AM/PM and 1:30AM/PM.
Another time I get this errors is after each backup is finished (Windows Server Backup), for example the backup is schduled to start at 3:00AM and it finishes at 3:57AM at the moment the backup is finished I am spammed with the errors listed bellow.
After multiple errors sometimes one of the disks is thrown out of the RAID.
There are days when I don't get any errors even after a backup.
Event ID 4155 "I/O on "DISK_NAME" has failed."
Event ID 51 "An error was detected on device \Device\"DISK_NAME" during a paging operation."
NOTE: "DISK_NAME" is the name of the disk wich is random
What I did:
CHKDSK
Stress tested the RAID's for hours
Different versions of controller driver
Changed OS WS2012R2>WS2019
Updated BIOS
Replaced SATA Cables
Connected drives directly (Bypassing the backplane CSE-825TQ-R740LPB)
PS: Previus to the upper setup I had 2 x Samsung850 Evo instead of the S4510 with Windows Server 2012 R2 and one day one of the ssd's was thrown out of the raid and I wanted to replace it the next day because I thought it was bad but after running the nightly backup it failed the second ssd's and the RAID was gone.
I thought that the problem was the samsung 850 evo's but after purchesing the S4510 I was wrong, the problem relies somewhere else.