Issues with HBAs not recognizing drives

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Biribiri

New Member
Aug 26, 2023
6
0
1
Hello, I'm having issues with my HBA's not detecting any drives for months now and I have no idea how to fix the issue. I spend hours googling, asking on forums but sadly nothing worked yet. I hope someone here can give me some advice and help me solve this issue. This forum is my last idea...
First of all a small hardware list:
  • "Old" Server:
    • CPU : AMD Ryzen Threadripper 1920X
    • Motherboard: AsRock Taichi x399
    • RAM : 64GB with ECC
  • "New" Server (old prebuild Fujitsu Primergy TX1330 M2):
    • CPU: Intel Xeon E3-1270 v5
    • Motherboard: FUJITSU D3373-A1
    • RAM: 32GB with ECC (different from old RAM)
  • HBA 1:
    • LSI SAS 9200-16e
  • HBA 2:
    • LSI SAS 9200-8e
  • JBOD:
    • Supermicro CSE-846 4U with a CSE-PTJBOD-CB2 "motherboard"
How it's connected:
The card is the only PCIe device plugged into the Motherboard and plugged into the first slot. They connect via 2 cables to a JBOD. Out of the Backplane of the Supermicro JBOD, 2 cables connect to a small internal to external "adapter" which just forwards it directly to the HBA. 2 in 2 out, no 4 internal to 2 external HBA. Here is a link to the EBay listing for some pics so it might be clearer.

The issue:
The 9200-16e was the first one I used. It worked perfectly on the Ryzen system and I could see all my drives during bootup and in UnRaid. After some months I had a power outage and after that the issue appeared the first time, it did not recognize any of my drives. After another reboot they sometimes were showing again. After that day, whenever I did a shutdown, there was a chance that the drives would not be detected. Sometimes it worked, sometimes it didn't. I restarted the PC a couple of times and put the card into different slots and after a lot of tries they displayed again as if nothing ever happened.
Then one day, I never got it to show them ever again on my Ryzen system. All of that happened before I flashed to the newest firmware version btw. After the firmware update there was no change. Had a chat in the UnRaid forum and we came to the conclusion that the Ryzen must be the issue and that there were some compatibility issues with Ryzen and Legacy HBAs. That was the moment I bought the Xeon machine.

On the Xeon it worked immediately, all drives displayed, even after reboot. Then, a month or so later, same issue like with the Ryzen Server, without a dirty shutdown ever happening... Drives aren't displayed in UnRaid, nor do they shown on startup, nothing. Then I bought the 9200-8e card since I thought the 9200-16e might be defect. I tested it, also flashed to the newest firmware version, but again nothing...

As for Error messages and troubleshooting I did:
  • I can link the UnRaid forum post.
  • I can also share this image. It's a pic of the screen when the card (9200-8e) is booting. Normally all the drives should be displayed here.
  • In the BIOS the card is detected, obviously since it tries to boot and detect drives. I also switched the PCIe mode from auto to 2.0 because maybe the handshake might cause some issue? Again didn't do anything.
  • Also tried different slots on the motherboard, no change.
  • I have the HBA connected to my JBOD with 2 cables. I tried plugging both in at the same time as well as each one alone and I tried both ports on the JBod, as well as both/all 4 on the HBAs. I also borrowed a cable from work just to test this and it also made no difference.
  • When connected directly via SATA the Drives all show up and work
At this point I have no idea anymore what I could try to get the system as it is to work again. My biggest fear is that the Backplane of the Supermicro might have an issue but before I go and waste even more money on hardware that might now work, I wanted to ask if anyone has an idea why its not working again, what causes it and if and how I could try to fix it.
Since it doesn't work on 2 different hardware platforms, I don't think the systems are the problem. So im my head the issue is either the HBAs, the connection from the backplane to the internal to external adapter or the backplane itself. Hopefully not the last one.

So please, if anyone has an idea please share it, I would greatly appreciate it. Thank you all in advance
 

Markess

Well-Known Member
May 19, 2018
1,162
780
113
Northern California
You mention a power outage. Any chance there was an accomanying power spike and a power supply is damaged? I assume the JBOD has the same power supply? Did you re-use the power supply from your "old server" in your "new server"? If you haven't already, you may want to get a power supply tester (even the inexpensive ones work pretty well), and verify that the power supplies are OK.
 

Biribiri

New Member
Aug 26, 2023
6
0
1
You mention a power outage. Any chance there was an accomanying power spike and a power supply is damaged? I assume the JBOD has the same power supply? Did you re-use the power supply from your "old server" in your "new server"? If you haven't already, you may want to get a power supply tester (even the inexpensive ones work pretty well), and verify that the power supplies are OK.
Hello, I didn't use the old PSU in the "new Server". It was a prebuild and I just left it in there, seems to be working fine though with no real issues. As for the JBOD, it has 2 redundand PSUs and its still the same ones. It also seems to be running without any issues really. Don't think the power outage damaged both PSUs so badly it can't power a not even half filled JBOD, or could that be the case? I can clearly hear and feel all drives spinning when on and it still worked after the power outage, sometimes at least.
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,672
1,081
113
artofserver.com
If you're using unraid, which is based on Linux, you can use this video to analyze the kernel logs:


Since you said your drives disappeared while the system was online, I suspect there will be messages in the logs when the event happened and that might reveal to you what is going on from the HBA level. For example, are the SAS connections dropping off, or is the HBA resetting itself and fails to come online again, or is the SAS expander going offline? The log messages should reveal that to you.

Also, another thing you can do is physically look at the HBA when the drives fall offline. Does it still have a heartbeat? If not, the HBA has stopped working. If it still has a heartbeat, then likely something downstream might be at fault, could be the SAS expander or something else.

Also, with PCIe 2.0 cards, I've had a lot of people tell me that PCIe 4.0/5.0 systems seem to have problems negotiating link speed/width. sometimes this seems to get fixed by BIOS/firmware update, but some motherboards still have the problem. If your motherboard has settings to fix the PCIe link speed and width, that tends to help work around the issue. This video explains and demonstrates:


Based on what you've mentioned, I don't think the other videos are going to help you too much (other than what is mentioned above), but in case you're interested, this is my playlist of HBA troubleshooting videos:

 
  • Like
Reactions: Stankyjawnz

Biribiri

New Member
Aug 26, 2023
6
0
1
If you're using unraid, which is based on Linux, you can use this video to analyze the kernel logs:


Since you said your drives disappeared while the system was online, I suspect there will be messages in the logs when the event happened and that might reveal to you what is going on from the HBA level. For example, are the SAS connections dropping off, or is the HBA resetting itself and fails to come online again, or is the SAS expander going offline? The log messages should reveal that to you.

Also, another thing you can do is physically look at the HBA when the drives fall offline. Does it still have a heartbeat? If not, the HBA has stopped working. If it still has a heartbeat, then likely something downstream might be at fault, could be the SAS expander or something else.

Also, with PCIe 2.0 cards, I've had a lot of people tell me that PCIe 4.0/5.0 systems seem to have problems negotiating link speed/width. sometimes this seems to get fixed by BIOS/firmware update, but some motherboards still have the problem. If your motherboard has settings to fix the PCIe link speed and width, that tends to help work around the issue. This video explains and demonstrates:


Based on what you've mentioned, I don't think the other videos are going to help you too much (other than what is mentioned above), but in case you're interested, this is my playlist of HBA troubleshooting videos:

Hello! Sorry if my post wasn't super clear but seems there is a small misunderstanding. The drives don't randomly drop while in use, the issue is that the HBA doesn't detect them on bootup and in the past it randomly detected them on some bootups, but didn't on others and currently it never detects then on boot. I never had an issue with them disappearing after they were recognized after boot.

As for the logs analyzing, I did most of my troubleshooting based on a vid in the playlist you linked and my findings where shown in the UnRaid post.
I also just looked at the first video you linked and... sadly doesn't help. Like said above, the drives aren't even getting initialized/detected on bootup, so there is not much Info in the log file. Here is a sc of the syslog file and that's literally all that's being displayed when running the command from your video. Not much to see so I'll just post a sc.
1693336834440.png

Looking at the HBAs I think there is no issue on both and the heartbeat LED is also working.

Lastly the PCIe topic, I also watched the vid before and did try it both on auto as well as setting the slots to PCIe 2.0 manually, both on Ryzen and Xeon, but without any success.

Do you have any ideas what else I could try to troubleshoot? It does seem that the HBAs are fine and it's probably the JBOD...
Also I think you are the actual AoS? Wanted to say I enjoy your videos a lot! :3
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,672
1,081
113
artofserver.com
Hello! Sorry if my post wasn't super clear but seems there is a small misunderstanding. The drives don't randomly drop while in use, the issue is that the HBA doesn't detect them on bootup and in the past it randomly detected them on some bootups, but didn't on others and currently it never detects then on boot. I never had an issue with them disappearing after they were recognized after boot.

As for the logs analyzing, I did most of my troubleshooting based on a vid in the playlist you linked and my findings where shown in the UnRaid post.
I also just looked at the first video you linked and... sadly doesn't help. Like said above, the drives aren't even getting initialized/detected on bootup, so there is not much Info in the log file. Here is a sc of the syslog file and that's literally all that's being displayed when running the command from your video. Not much to see so I'll just post a sc.
View attachment 31302

Looking at the HBAs I think there is no issue on both and the heartbeat LED is also working.

Lastly the PCIe topic, I also watched the vid before and did try it both on auto as well as setting the slots to PCIe 2.0 manually, both on Ryzen and Xeon, but without any success.

Do you have any ideas what else I could try to troubleshoot? It does seem that the HBAs are fine and it's probably the JBOD...
Also I think you are the actual AoS? Wanted to say I enjoy your videos a lot! :3
Thanks for clarifying my misunderstanding. When the drives are not detected on boot up, do the log messages show that the HBA detects the SAS expander? And which SAS expander backplane do you have in your 846? Is BPN-SAS2-846EL1? Or do you have a "A" or "TQ" backplane with a separate SAS expander board inside the JBOD?

yes, I am the "AoS" guy. thanks for your kind words; I hope my videos have been helpful, though perhaps not in this case. :)
 

Biribiri

New Member
Aug 26, 2023
6
0
1
Thanks for clarifying my misunderstanding. When the drives are not detected on boot up, do the log messages show that the HBA detects the SAS expander? And which SAS expander backplane do you have in your 846? Is BPN-SAS2-846EL1? Or do you have a "A" or "TQ" backplane with a separate SAS expander board inside the JBOD?

yes, I am the "AoS" guy. thanks for your kind words; I hope my videos have been helpful, though perhaps not in this case. :)
Checked today and it is a BPN-SAS2-846EL1 Backplane. It has 3 SAS connectors and in 2 of the 3 there are SAS cables plugged in, which directly go to a internal to external "Adapter", which then go directly to the HBA with 2 cables. For visual pics you can look at the EBay listing I also linked in the original post.

Also, how exactly would I check if the HBA detects the expander? During the HBAs bootup I see the following:1693413601915.png
I see the top part with the MPT2BIOS till the copyright message with an "Initializing" below it and then the text with the HBAs info appears. After 2 or so seconds the error messages appear. Besides that I don't see anything during the HBAs bootup regarding a SAS expander and I also didn't when it worked. Just the drives all being listed iirc.
I used the "lsscsi" command and the only device it saw was my USB for Unraid, so I guess it's not being recognized...?
1693414100324.png

And your videos definitely have been helpful for me to better understand storage hardware and co. as a Junior Sysadmin and hobby tinkerer. Even if they didn't help too much in this case to fix my issue, I still learned a lot :)
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,672
1,081
113
artofserver.com
Checked today and it is a BPN-SAS2-846EL1 Backplane. It has 3 SAS connectors and in 2 of the 3 there are SAS cables plugged in, which directly go to a internal to external "Adapter", which then go directly to the HBA with 2 cables. For visual pics you can look at the EBay listing I also linked in the original post.

Also, how exactly would I check if the HBA detects the expander? During the HBAs bootup I see the following:View attachment 31317
I see the top part with the MPT2BIOS till the copyright message with an "Initializing" below it and then the text with the HBAs info appears. After 2 or so seconds the error messages appear. Besides that I don't see anything during the HBAs bootup regarding a SAS expander and I also didn't when it worked. Just the drives all being listed iirc.
I used the "lsscsi" command and the only device it saw was my USB for Unraid, so I guess it's not being recognized...?
View attachment 31318

And your videos definitely have been helpful for me to better understand storage hardware and co. as a Junior Sysadmin and hobby tinkerer. Even if they didn't help too much in this case to fix my issue, I still learned a lot :)
If you follow the log analysis video above, I point out the log messages that show a SAS expander involved in that setup. If you don't remember, maybe re-watch the video to refresh yourself or pay special attention to when i'm talking about a SAS expander being identified. (you can jump around the video with the timestamps in the description) Look for those type of messages in your logs and see if you notice the message showing the expander going offline with all the drives or if it is just the drives.

Additionally, yes,"lsscsi" should show a SAS Expander device in the list of scsi devices. Maybe try "lsscsi -g" to get the scsi generic interfaces too, because SAS expanders usually have a /dev/sgX scsi generic interface for communication since they will not present as a scsi block device.
 

Biribiri

New Member
Aug 26, 2023
6
0
1
If you follow the log analysis video above, I point out the log messages that show a SAS expander involved in that setup. If you don't remember, maybe re-watch the video to refresh yourself or pay special attention to when i'm talking about a SAS expander being identified. (you can jump around the video with the timestamps in the description) Look for those type of messages in your logs and see if you notice the message showing the expander going offline with all the drives or if it is just the drives.

Additionally, yes,"lsscsi" should show a SAS Expander device in the list of scsi devices. Maybe try "lsscsi -g" to get the scsi generic interfaces too, because SAS expanders usually have a /dev/sgX scsi generic interface for communication since they will not present as a scsi block device.
Okey so if I understood it correctly I just have to check the logs I already had with the command, but look specifically at the "scsi layer" logs. If that's the case than those are all the logs specifically from the scsi (and sd) layer. Doesn't seem like it's detecting the SAS expander backplane.
1693426169903.png

I also did try the lsscsi -g command, same result just with generic formatting. Only the USB is showing, like on the sc above.
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,672
1,081
113
artofserver.com
Okey so if I understood it correctly I just have to check the logs I already had with the command, but look specifically at the "scsi layer" logs. If that's the case than those are all the logs specifically from the scsi (and sd) layer. Doesn't seem like it's detecting the SAS expander backplane.
View attachment 31326

I also did try the lsscsi -g command, same result just with generic formatting. Only the USB is showing, like on the sc above.
in that case, you might have a failing or failed backplane SAS expander. double check all the power connections and SAS connectors are properly seated, but if those check out then yeah... SAS expander problem perhaps...
 

Biribiri

New Member
Aug 26, 2023
6
0
1
in that case, you might have a failing or failed backplane SAS expander. double check all the power connections and SAS connectors are properly seated, but if those check out then yeah... SAS expander problem perhaps...
Hmm... yeah... I checked the cabling and I guess it is an issue with my backplane... Not very happy to get to this conclusion but at least I (hopefully) now know the issue.
Thank you very much for your support during all this!

I'll probably get a new backplane for my 846 and I did watch your SuperMicro backplane video to understand them a bit better.
I found a 120€ TQ one as well as a 200€ A backplane and I'll probably go with the TQ one since it's a lot cheaper and, from what I got from your vid, it's not really worse in any way compared to the 80€ more expensive A one, except for more cables.


There are just two last things I would like to ask you regarding how to go forward.
First is, are those two I'll link below the correct ones for my case, aka for my CSE-846 4U case and my SATA drives. I just want to make sure they are the right model and do support SATA drives.
120€ TW Backplane
200€ A Backplane

Second is, is there anything I should look out for when getting a SAS expander card/do you have some recommendations of good ones?
I know how to attach it/power it, with one of those external PCIE expanders/slots and some molex power.
For the expander I would need 6 internal ports and (preferably) 2 external ones. Are there any good and not super expensive ones out there that you could recommend, like your Adaptec AEC-82885T, or would it be better to go with 2 cards with 4 internal ports each?
 

Ezekial66

New Member
Sep 9, 2023
2
0
1
I just stumbled upon this thread and given I'm only 9 days behind I thought I would tag on lol. @BLinux I too must thank you for your videos, they have been an irreplaceable part of my journey into the server life lol.

I have been facing a similar issue to Birbiri here where I am using Unraid as my Server OS and while it recognizes my HBA, it does not see the drives connected to it.

My initial build was an LSI SAS9212-4i4e that had a single 8088 to 8470 out to two daisy chained together MD1000 JBODs. It worked just fine but the SAS1 backplane was a limitation I was looking to replace.

My new build (on the same server hardware btu new HBA) is a Dell branded LSI SAS9206-16e that has four 8644 cables coming out, two each to the primary IN ports of each Supermicro 836 JBOD enclosure (SC836BE2C-R1K03JBOD | 3U | Chassis | Products | Supermicro).

As I said Unraid does see the HBA both on boot and in Unraid, but in neither of those does it see the drives. Both enclosures were a local purchase in which all drive bays of each enclosure were tested working before pickup, and the enclosure itself seems to see all the drives normally as it should. Have also connected to each enclosure on IMPI and all shows normal as far as I can tell

The HBA is F/W 20.00.11.00 and has no BIOS (00.00.00.00)

Any help you could lend sir would be greatly appreciated! And thank you again for all you do to support this community!
 

Sean Ho

seanho.com
Nov 19, 2019
774
357
63
Vancouver, BC
seanho.com
9206-16e has two 2308 controllers; do both of them enumerate on the PCIe bus? I believe it's IT-only, yes? Do the drives enumerate on SCSI bus, i.e., do you see /dev/sg* SCSI generic devices? Those JBOD cases have the SAS3 dual-expander backplanes; you are connecting to only one expander on each of the two backplanes, yes? Try a 8644-4x8482 breakout cable directly to a drive (with SATA power without 3.3v), bypassing the backplane?
 

Ezekial66

New Member
Sep 9, 2023
2
0
1
Thank you for taking the time to reply!

9206-16e has two 2308 controllers; do both of them enumerate on the PCIe bus?
Yes when I am in the Unraid Menu I can see both controllers in the system devices section.

I believe it's IT-only, yes?
I was not aware that was the case but there is no BIOS either way currently

Do the drives enumerate on SCSI bus, i.e., do you see /dev/sg* SCSI generic devices?
I'm sorry I'm not sure where I should be watching for this. On boot up maybe? Everything moves pretty quickly so its been difficult to assess everything.

Those JBOD cases have the SAS3 dual-expander backplanes; you are connecting to only one expander on each of the two backplanes, yes?
You are correct, I have the port diagram for the JBOD's and currently have the cables installed as described which is four cables coming from the HBA, two cables per JBOD. In each, they go to the top right IN ports (right is primary, left is secondary, top two are IN, bottom two are IN/OUT)

Try a 8644-4x8482 breakout cable directly to a drive (with SATA power without 3.3v), bypassing the backplane?
Unfortunately I don't have one of those cables I would need to order one in order to test it but I think I will at this point.

Any other recommendations on how I should proceed?