CSE-847 Drive Disconnecting Issues

Carvel

New Member
Jul 19, 2019
13
0
1
I'm struggling a bit getting my server running well and was hoping you could answer some questions that I have.

Current setup:
- Supermicro CSE-847 case
- Supermicro SAS2-846EL1 backplane
- Supermicro SAS2-826EL1 backplane
- Supermicro AOC-SAS2LP-MV8 HBA
- Intel i7-4770k CPU
- ASRock Z97 Fatality motherboard
- 16GB Corsair Dominator Platinum DDR3
- Samsung 960 Evo 250GB m.2
- 22 assorted SATA drives (WD Red, WD Green, Seagate, etc., making a 64.1TB pool)
- Windows 10 Pro w/ Stablebit Drivepool and attempting to get SnapRAID going currently

When I was first setting it up I saw that sometimes I would get a drive disconnection during heavy load. Windows Event Viewer would show it disconnecting and then reconnecting later. I checked the SFF-8087 cables and tried replacing them. The problem seemed to go away.

I wanted to check the firmwares of the backplanes, but whenever I tried to use the ExpanderXTools it doesn't show anything. Is this because I'm not using a Supermicro motherboard or is there another reason that I can never see the backplane?

Now it's probably a year later and I'm trying to get SnapRAID working. Whenever I perform the first sync it gets about a half-day to day in (estimated 53 hours total to perform the first sync) and then all the drives start disconnecting.

Questions:
- Does anyone have any idea why my drives keep disconnecting under load?
- Do you think a firmware update on my HBA or backplanes would help? If so, do you have any idea how I get my backplanes to show up in ExpanderXTools?
- Would a Supermicro motherboard help? If so can anyone recommend one either for my existing hardware (i7-4770k, non-ecc ddr3, etc) or for a modern system (Xeon, DDR4, built-in HBA, etc)?

Thanks in advance.
 

Carvel

New Member
Jul 19, 2019
13
0
1
After a lot of reading it seems like maybe this could be an HBA/backplane conflict and that an LSI 2308 based card might be a better choice? Should I grab an HP H220, they seem like a pretty good deal? Or go with a 3008 based one like the Supermicro 3008?
 
Last edited:

nthu9280

Well-Known Member
Feb 3, 2016
1,588
441
83
San Antonio, TX
How is the airflow over the drives and HBA and you may need check / monotor the drive temps during the heavy load. I'm assuming you have drives populated just in the front bays. Heard drives in the rear bays tend to run warmer due to less airflow.
 

Carvel

New Member
Jul 19, 2019
13
0
1
How is the airflow over the drives and HBA and you may need check / monotor the drive temps during the heavy load. I'm assuming you have drives populated just in the front bays. Heard drives in the rear bays tend to run warmer due to less airflow.
Hmm, I had turned the fans down for noise reasons and the temps are fine during regular usage. This is a good question though so I just cranked the fans up and started another sync. I'll let you know how it goes. I can hear it clearly from the main floor of our house now (server is in the basement mechanical room). :)
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,521
968
113
artofserver.com
i am totally clueless on Windows, but if this were on Linux, I would be looking at the kernel logs to see what the driver tells me when the drives go offline. that might help identify the problem. i'm guessing there's a equivalent to that in Windows event logs perhaps?
 

Carvel

New Member
Jul 19, 2019
13
0
1
Yeah, I can check when it fails again to get the exact message from the Windows event log. But it basically just says that the drive has disconnected from the system.

The Snapraid error says that it's a Windows error 1167 which according to System Error Codes (1000-1299) - Windows applications means ERROR_DEVICE_NOT_CONNECTED which makes sense.
 
Last edited:

Carvel

New Member
Jul 19, 2019
13
0
1

Attachments

Carvel

New Member
Jul 19, 2019
13
0
1
I also noticed that the speed of the Snapraid sync was really jumping around while it was going. From a bit over 1GB/s to as low as 50MB/s. Is that normal or weird?
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,242
420
83
Firstly, is your PSU beefy enough to support this setup? I've seen random crashes and disconnects before when too much power was drawn and the voltage dropped enough to crash various subsystems. An easy way to test is if you can trigger the behaviour by running something CPU-intensive (e.g. some random all-core CPU benchmark) to see if anything marginal breaks.

Secondly, you're using a Marvell SATA controller? Personally I wouldn't trust those things as far as I could spit them for anything other than occasional usage. More specifically, there was definitely a bug in the 9200 controller series that required a firmware update to fix spurious disconnects and SMART errors - it might be worthwhile chasing down whether there was anything similar afflicting the 9400 series and if there's any updates for them.

If you find yourself running out of ports, lots of people on these forums, myself included, will point you in the direction of the LSI-based HBAs which many of us here have been running for years - they're generally cheap, easily available and very reliable workhorses.

I also noticed that the speed of the Snapraid sync was really jumping around while it was going. From a bit over 1GB/s to as low as 50MB/s. Is that normal or weird?
I don't know snapraid and whether this is normal behaviour or not, but my discovery of the LSI HBAs on this site stemmed out of my frustration of finding something better than the Marvell SATA ports I was using in my file server at the time; they'd often suffer inexplicable problems and the performance was mediocre at best - they held back my whole RAID array.
 

Carvel

New Member
Jul 19, 2019
13
0
1
Firstly, is your PSU beefy enough to support this setup? I've seen random crashes and disconnects before when too much power was drawn and the voltage dropped enough to crash various subsystems. An easy way to test is if you can trigger the behaviour by running something CPU-intensive (e.g. some random all-core CPU benchmark) to see if anything marginal breaks.
It's a redundant 1280W PSU so I think it's got plenty of juice there.

Secondly, you're using a Marvell SATA controller? Personally I wouldn't trust those things as far as I could spit them for anything other than occasional usage. More specifically, there was definitely a bug in the 9200 controller series that required a firmware update to fix spurious disconnects and SMART errors - it might be worthwhile chasing down whether there was anything similar afflicting the 9400 series and if there's any updates for them.
I haven't heard anything but I am leaning towards this possibly be due to my HBA. It's a Supermicro SAS2LP-MV8 which I figured would be good with Supermicro backplanes but I guess maybe not.

If you find yourself running out of ports, lots of people on these forums, myself included, will point you in the direction of the LSI-based HBAs which many of us here have been running for years - they're generally cheap, easily available and very reliable workhorses.
Yeah, I don't need any more ports. I have expanders in the backplanes so I have 36 bays which should be enough for me.

I don't know snapraid and whether this is normal behaviour or not, but my discovery of the LSI HBAs on this site stemmed out of my frustration of finding something better than the Marvell SATA ports I was using in my file server at the time; they'd often suffer inexplicable problems and the performance was mediocre at best - they held back my whole RAID array.
Thanks, yeah I might do this.
 

Carvel

New Member
Jul 19, 2019
13
0
1
Would you guys gets get an HP H220 (LSI SAS2308) or a Supermicro AOC-S3008-L8e (LSI SAS3008) for use with Supermicro SAS2-846EL1/826EL1 backplanes?
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,242
420
83
Hmm I didn't make the spot that the AOC-SAS2LP-MV8 HBA you're using is a Marvell controller; SM are notoriously reticent about changelogs but are you able to check it's running the latest firmware (v4.0.0.1812 according to their site)?

In terms of the HBA (and I'm not saying that's definitely the cause of your woes), replacing it with the newer one based on the SAS3008 would be better and more future-proof, but there's many other models using the same chips underneath, so there might be better/cheaper cards available - you might not want to spend too much money on replacing something that might not even be broken.
 

Carvel

New Member
Jul 19, 2019
13
0
1
It finished successfully after changing the power management settings. Hopefully that fixes the flakiness for good now. Thanks guys.
 

Carvel

New Member
Jul 19, 2019
13
0
1
Nevermind, it's still flaky. Blah. Just tried to run another sync and all the drives disconnected again.
 

Carvel

New Member
Jul 19, 2019
13
0
1
Hmm I didn't make the spot that the AOC-SAS2LP-MV8 HBA you're using is a Marvell controller; SM are notoriously reticent about changelogs but are you able to check it's running the latest firmware (v4.0.0.1812 according to their site)?
Yup, I flashed it to the latest and it's still flaky.
 

Carvel

New Member
Jul 19, 2019
13
0
1
So I got my Supermicro AOC-S2308-8Le today and I still can't see my backplanes in ExpanderXTools. Does anyone have any ideas why I've never been able to use that software to see/flash them?

I'm going to kick off another sync now and see if it's fixed my stability issues.