FW BUG?! LSI 9280-16i4e "Adapter at Baseport is not responding" but only when drives are plugged in!

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Sniper_X

Member
Mar 11, 2021
115
16
18
I have an LSI 9280-16i4e, and 14 HGST 6tb drives.

The array is across 14 of them, with a HGST SAS 12GB/s RAID1 for the Cachecade 2.0 R/W/ cache.

After reading the release notes, I performed a firmware upgrade to get the card to the latest FW and it said it could do it online.
I'm used to this and there have never been any issues EVER with these upgrades.

Except this time the FW update completed, and the array lights responded!
The Megaraid application response seemed fine.

All drives were still operating fine - no problems.

Then I rebooted (for good measure) since I figured a fresh power-cycle would be a good idea for the card.

BAD IDEA.

Now, I get:

Adapter at Baseport is not responding
No MegaRAID Installed


Once it continues, (after about 4 minutes) Yep - NO LSI card. Of course, Megaraid Storage Manager doesn't start - no server found

I remove the card to try and reboot clean.

Booting Windows bluescreens - something about "Machine Exception".
(No, I don't need the array partitions to boot Windows - so it must be driver related.)

I replace the RAID card with a spare identical card (maybe not same firmware)

SAME BEHAVIOR

I leave that card in, and unplug all the drives.
The card scans and boots normally - seemingly no issues found!

If I plug in the drives is sees all the drives and they have a good, foreign config.
I import that config.

The card halts.
No green heartbeat light.

So, if drives are installed, the card hangs and gets baseport not responding.
No drive installed, no problems.

What do I do NOW?!
 

j_h_o

Active Member
Apr 21, 2015
644
180
43
California, US
Assuming you have a backup of the array, I would disconnect your drives, downgrade the firmware of the card, powercycle, power down again, then re-connect the drives.
 

Sniper_X

Member
Mar 11, 2021
115
16
18
I'll start by saying that I am making headway, but it's slow.
I still would REALLY appreciate some assistance in the form of advice from folks that have been through these similar issues.



----------------------------------------------------------------------------
I have tried to reflash/downgrade the card.

This resulted in a new problem at boot...

Firmware Failed Validation!!!
Adapter needs to be reflashed.
Press Any Key to Continue...

Okay, so I yanked that card and grabbed my other 9280-16i4e card.

I started with no drives installed.
It gets through the boot process - no issues.

So, I have four Chenbro enclosures that hold 4 SAS/SATA drives each.
I have 14 HGST 6TB SATA drives and 2 HGST SAS SSDs.

All this was part of the config.

I then power off and begin to add things one at a time.

I started with the empty enclosures only - no problems.

I then added one SAS SSD into its appropriate enclosure/slot.
Power up.

After scanning and getting to 100%, the card heartbeat stops and it hangs again.
Power down

I then remove the SSD, and add one HGST 6TB drive into its appropriate enclosure/slot.
It gets past the 100% point, enumerates that SATA drive and starts to boot normally (which I prevent)
Power down

I then add all HGST 6TB SATA drives and power up.
It gets past the 100% point, enumerates that SATA drive and starts to boot normally (which I ALLOW to proceed)
Once in the OS, I open Megaraid Storage Manager and import the foreign config.

Everything comes back, but is administratively blocked due to it's previously associated Cachecade 2 drive not being present.
I choose to unblock the VD and confirm "yes" to the warning about data loss. (remember, the previous card hung during an online flash.)

There are now a few (so far only MINOR) issues with data on the drive in some folders.
CHKDSK has resolved them mostly - now I need to examine what isn't correct anymore (but I'll do that later).

So, here is what i suspect so far:
Did something destroy my SAS HGST SSD's?
Are they merely in possession of a corrupt RAID config that halts the card when loaded?

I am building a test rig to diagnose this and also to unbrick / reflash my other card.

so, again...
I can likely figure out how to fix the card and diagnose issues myself, but could REALLY use some assistance in the form of advice from folks that have been through these similar issues.
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,672
1,081
113
artofserver.com
Firmware Failed Validation!!!
Adapter needs to be reflashed.
Press Any Key to Continue...
I believe this means the contents of the firmware failed a checksum. Probably means the firmware is corrupted. Could be a failed flash write, although that usually does a check at the end. Or, it could be a failing flash chip with sections that are not reading back correctly.
 

Sniper_X

Member
Mar 11, 2021
115
16
18
I believe this means the contents of the firmware failed a checksum. Probably means the firmware is corrupted. Could be a failed flash write, although that usually does a check at the end. Or, it could be a failing flash chip with sections that are not reading back correctly.
I have seen several posts on using a "mode0" reflash. It's just that none of those appear to be for my card.

Does anyone know of a good unbricking walk-through that I can follow to blast this firmware back to health?
 

Sniper_X

Member
Mar 11, 2021
115
16
18
UPDATE
  • ROOT CAUSE FOUND
  • CARD RECOVERED
  • ARRAY LOST - RECOVERED FROM BACKUP
ROOT CAUSE
To recap, I have a 65TB array running in RAID6 on a LSI/AVAGO/BROADCOM 9280-16i4e card.
It was Cachecade 2 enabled - using 2 800gb HGST SAS SSD's (HUSMM8080ASS201) in RAID1
(Yes I know CC is only 512GB max, but I have 6 of these drives available & they are enterprise grade SSD and SAS, so I used those)

The firmware on the card was:
12.14.0-0167_SAS_2108_Fw_Image_APP_2.130.393-2551

I was researching to ensure I had the latest FW, drivers and all that when I came across a newer FW for the card at the Broadcom site.
That firmware was:
12.15.0-0239_MR_2108_SAS_FW_2.130.403-4660

I see there is a naming difference here and I highlighted that in bold-red.
At the time I'm writing this, I am unsure what the implications are here - but I will continue to look around for what this means, if anything.

TLDR: this is what killed everything.
Version 12.15 did something to the Cachecade 2 drive and completely borked the cards' ability to run.
Meaning HEARTBEAT LED STOPPED

Moving on...

So, as I have done MANY TIMES with no issue, I flashed this firmware using Megaraid Storage Manager.
As usual, it said that the FW could be upgraded online, so I did it.
Then it says that a reboot is required to enable the new firmware - something new i have not seen before.

I also saw my array drive activitiy LEDs flash oddly and then several drives (ALL cache SSDs and SOME spinning drives) began to flash red.

So, I feel nervous now - but I reboot as it requested.

During POST, the card would hang after spinning up the drives, all the activity lights strobed to indicate the enumeration pass and the 100% mark appeared on the Megaraid POST message.

Then the card hung there at the flashing underline "_" cursor.

After about 3-4 minutes, I saw:

Adapter at Baseport is not responding
No MegaRAID Adapter Installed

Yay. :oops:

The card heartbeat LED stopped too.

I began to disconnect drives and I started with the cache drives.
I removed one (drive 0 in the array) and rebooted.

Same behavior - after 100% scan. (Heartbeat LED stopped)

Removed Drive 1 in the array ( a spinning HGST 6TB SATA)

Same behavior - hung after 100% scan. (Heartbeat LED stopped)

In fact, I had the same behavior until i removed drive 4 in the array. (The second HGST 800gb SAS SSD cache drive).

The POST then enumerated the remainder of the drives.

I powered off before it could proceed.
I added back all the 6TB HGST SATA drives - leaving out the two SAS SSDs.

The array enumerated and the POST proceeded as normal.

I then attempted to access the volumes that were on the array and got sporadic, odd behavior.
Some files were unreadable and others e were corrupted.

80% of this array is occupied movies and TV shows, and spotty file corruption renders many of these riddled with playback errors.
(e.g. functionally, each of these multi-gigabyte files are lost even if one block is lost.)

I also had many multi gigabyte PST (email database) files hosted on this array - same issue there.
One lost block practically kills a large file.

I then attempted to flash the previous firmware to the card and that hung.
This made things worse - I lost the card.

After several days, I found that if i used MEGAREC and then COMPLETELY erased the FLASH, and then COMPLETELY re-flashed it with the firmware I tried to roll back to, it successfully un-bricked and brought it back to factory operation.

I then, reassembled and rebuilt the array, and restored from backup.

Nothing like this has happened before.
I have used Cachecade for years and upgraded FW for years while using it.
I don't know if it was the SAS drives being present (this was my fist time using actual Enterprise grade SAS SSD's for Cachecade) or what.

I have reproduced this issue on a test rig.
Same thing happens.

Going from 12.14 > 12.15 blows up the config (or something equivalent) on the Cachecade SAS SSD's that when the card tries to read them it HALTS the card processor.

Since these cards are way past EOL, I don't think telling Broadcom will be of any value, but i warn you all - until I (or someone) can figure out why this happened, and why the MSM ALLOWED the FW flash ONLINE NO LESS, i would shy away from flashing this on a live system.

I plan to test if it's possible to do this safely with drives disconnected.

More to come.
 

Sniper_X

Member
Mar 11, 2021
115
16
18
New wrinkle found.

On the card that I used MEGAREC to erase and reflash, the 12.14 FW I used wouldn't allow me to enter the Cachecade 2 trial key.
(I wanted to test flashing 12.15 while cachecade 2 was in use like before)

Nope. -Invalid key

So, I googled a bit and found that one of the cachcade pages linked to a page of FW for all cards that was specifically stating it worked with Cachecade 2.

So, I flashed the one they said was for the 9280-16i4e.
It was a downgraded FW, but after I use MSM to do this, I rebooted and retried the trial key.
It worked.

I then flashed the 12.14 FW.
Things went as expected - all worked.

I then flashed the 12.15 FW and.... >BOOM<

ok.

Then, after a reboot it returned to not getting past 100% on the drive scan.
But it was different this time.
It just looped, rescanning ver and over going from 0% to 100%, drive array flashing red, then going green "ok" and repeating over and over.

I pulled the SAS cache drives and - voila - it passed the POST scan test and 100% went on to boot the OS.

I replaced the SAS SSD Cache drives and the foreign config was there for the SSD VD.
For kicks I scanned the foreign config.
It found the Cachecade VD...
I imported it.

The MSM interface locked up for about 7-10 minutes and popped up an error importing the config.
The card was GONE - heartbeat LED ... dark.

I rebooted without the SSDs and everything was fine.

I then reinserted the SSDs and cleared the config from them.

Only the was I able to use them and enable Cachecade 2 SSD caching using the same trial key.
(I didn't have to reenter the key - it was there from before.)

I have not tried removing it and attempting to re-apply it.
I think it might not accept it like before but I will find out.

More to come.
 

Sniper_X

Member
Mar 11, 2021
115
16
18
After further testing, (between the 12.14 & 12.15 FW only) I have found that any flashing of this card using these two firmware versions, causes the Cachecade 2 drives to halt the heartbeat on the card.

Doesn't matter what SSD's youre using (I tried three types - 1 pair of SAS and 2 other pairs of SATA SSD's).

I've never experienced this before when I have done online FW upgrades as LSI/AVAGO/BROADCOM has said it was allowable.

I will now test to see if disconnecting the CC2 drives before FW flash prevents this.
Then I'll reconnect the drives after the flash to see what happens then.

After that, I'll have to try offline (DOS/EFI) flashing to see if this behavior occurs under those circumstances.
 

Sniper_X

Member
Mar 11, 2021
115
16
18
Upon reboot - before I started to test if disconnecting the SSD's during flash prevented the halt, I saw new (bad) behavior.

The Cache drives are still disconnected, and the POST process started.
The spinning drives remain connected and are therefore enumerated by the card in POST. (100% scan completed - PC booted)

Upon launching the MSM (v17), I see that no LSI card is shown, but MSM is running.
NO card, no drives - nothing.

I then shut down (power off - I also pulled the card and reinserted it a few moments later), and retried.

Same behavior.

So I shutdown, power off.

I pulled the SATA drives and powered back up.

Card POSTs fine, 100% scan complete - PC boots.
Card is visible and okay in MSM.

I then re-insert all the SATA drives.
They come in as "Foreign, Unconfigured (BAD)"
unconf_bad_aftr_fw14_flash.png

I then manually mark then "Unconfigured good", and then scan foreign configuration.

It finds the config, and I import it.
Drive comes online.

CHKDSK shows no errors anywhere in the File System.

IMPORT-DRIVE-SCAN-OK.png

So, that's another bad things that happens on these FW flashing events on the 9280-16i4e I suppose?

I'll check my 9260-8i cards behavior with thier 2 latest firmware versions too - later though.

Back to the SSD disconnect FLASH test.
 
Last edited:

Sniper_X

Member
Mar 11, 2021
115
16
18
I have tried several "should work fine" scenarios on a test rig with the 9260-8I and the 9280-16i4e in various combinations.
Using Cachecade 2.0 drives (SATA and SAS).

In all cases below, I would first remove all SSD caching from your VD's.
Next, I would then delete all Cachecade drives and Cachecade config.

Do this before you do anything below.

If you want to change from 12.14-12.15 on the 928x cards, do it offline, no drives attached and do it in EFI or DOS.

I would also have a plan for clearing the config on your drives, re-carving the array and restoring from backup.

The inconsistency in the tests i did was maddening - but they mostly failed between these FW changes - even with DOS flash and drives disconnected.

It also turns out that if you have a 2 port 926x and you want to switch to a 928x, you should tear down all Cachecade drives first.
You can move the array to the 928x card, but depending on firmware versions (not all of which I tested) it will either allow the importation of a foreign config, or it will freak out the entire array.

Even if you can import the config, it will stil likely act "odd" during boot and/or during use.

This is true even if you have a 926x + Expander board, although I did experience more occurances "odd" behavior when i moved that setup to a 9280-16i4e card.

It used to be that you could move these arrays, between very similar cards like the 9260/8x series, but that seems to no longer be the case in the newer FW releases.

Just don't do it. :confused: