Fun times with LSI HBA and mpt2sas on AIO configs

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
OK, I am abt tired of this crap, seems like there is kernel regression for the Linux 4.x kernel series and mpt2sas when using a LSI 2008 HBA. Has happened to me now trying to use rockstor (Centos7 based w/ 4.x kernel) as well as just now trying to build out a test ZoL box on ubuntu LTS 14.04 w/ 4.2.x kernel. Seems like some are having some success with the kernel parameter 'pci-realloc=off' but no love here. Only fix it to go down to a lower kernel 3.19.x for me anyways in ubuntu (simply going from 14.04 to 14.03 LTS) and bam LSI HBA and mpt2sas playin' nice again.

Guess I am off to go play 'bug report sucker' and hit the IRC channels.

SMFH!

Anyone else seeing this garbage?
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
I do believe you are incorrect on this one buddy :-D See other post. Sorry to turd up two posts but this is a REAL issue. I'll happily flash a HBA back to v19 if you think there is a snowballs chance in hell of v19 working. I'm pretty certain I've tested that on CentOS w/ a 4.3 kernel anyways and got same result. Linux distro's are building against v20 drivers btw and I don't have any issues w/ v18 driver and v20 FW.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Proof in the puddin', that's a 2008 based HBA re-flashed down to v19 IT mode LSI FW. Same symptoms...someone else please re-produce so we can confirm/deny. I've done all I can to shed the light.

zol-ubuntu-14.04.4-mpt2sas-NOT-happy-v19-LSI-IT-FW.png
 

izx

Active Member
Jan 17, 2016
115
52
28
40
If you think there's a chance this bug was quashed upstream, try 4.3/4.4/4.5 kernels from the mainline ppa.
 

cptbjorn

Member
Aug 16, 2013
100
19
18
I'm getting it on a machine that's running Fedora22/23 to play with bcache, it has been going on since 4.1.something for me. I've been running an early fc22 4.0.4 kernel that works fine, but recently I discovered that I'm able to cold boot 4.0.4-301 and then reboot once to 4.3.5-300. If I reboot again or cold boot 4.3.5 it crashes and burns.

Mine's a H310 flashed to LSI v19
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Some gent mentioned something similar over in the rockstor forums (hell maybe it was you lol). I just tried installing the SAME identical linux-image/linux-headers for 3.19-25 (same one LTS 14.04.3 ships with) and that bastard cannot even see disks. Tried booting into 4.2.x kernel first, then a reboot into 3.19.x, no luv either. Nothing even in dmesg...even tried to just gut 4.2.x kernel w/ apt-get remove and reboot into 3.19.25 kernel which workds on a fresh build of LTS 14.04.3...nothing...Perplexing.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
And the fix for me was:

mpt2sas.msix_disable=1 (for 4.3 or older kernels)
or
mpt3sas.msix_disable=1 (for 4.4 or newer kernels)

Added to kernel boot line parameters, adding here as well for prosperity's sake.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Update/more info

mpt2sas.msix_disable=1 (for 4.3 or older kernels)
or
mpt3sas.msix_disable=1 (for 4.4 or newer kernels)

Added to kernel boot line parameters, adding here as well for prosperity's sake.

Under ubuntu or CentOS using grub2 to make it stick edit the /etc/default/grub to the following:

GRUB_CMDLINE_LINUX_DEFAULT="mpt2sas.msix_disable=1"
(mpt3sas.msix_disable=1 for 4.4 or newer kernels)

Save file:

Then update grub2 files:

Ubuntu - update-grub
CentOS - grub2-mkconfig -o /boot/grub2/grub.cfg (BIOS based machines)
grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg (UEFI based machines)
 
Last edited:
  • Like
Reactions: cptbjorn

ed209

New Member
Apr 20, 2018
1
0
1
65
Update/more info

mpt2sas.msix_disable=1 (for 4.3 or older kernels)
or
mpt3sas.msix_disable=1 (for 4.4 or newer kernels)

Added to kernel boot line parameters, adding here as well for prosperity's sake.

Under ubuntu or CentOS using grub2 to make it stick edit the /etc/default/grub to the following:

GRUB_CMDLINE_LINUX_DEFAULT="mpt2sas.msix_disable=1"
(mpt3sas.msix_disable=1 for 4.4 or newer kernels)

Save file:

Then update grub2 files:

Ubuntu - update-grub
CentOS - grub2-mkconfig -o /boot/grub2/grub.cfg (BIOS based machines)
grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg (UEFI based machines)
I know this is an old thread, but I really want to thank you for posting this information. It has been a lifesaver after 3 days of nearly going nuts.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Glad this thread assisted you in resolving your issue. I lived the nightmare for a couple of weeks so happy to see you only burned a few days. Maddening I know, this is why I love this forum, gold nuggets hidden all over this place!
 

chinesestunna

Active Member
Jan 23, 2015
621
191
43
56
Thanks for posting this @whitey ! I believe this may also be related to the issue of drives not being detected after going to sleep and resuming, then being dropped from various raid arrays documented here: ZFS io error when disks are in idle/standby/spindown mode · Issue #4713 · zfsonlinux/zfs

They seem to have come to the same conclusion and I'm testing out this fix now, running a OpenMediaVault (Debian 9) VM kernel 4.17 on top of ESXi 6.7 july 2018 build. Fingers crossed my drives will stay put in array now
 
  • Like
Reactions: whitey

chinesestunna

Active Member
Jan 23, 2015
621
191
43
56
Sigh... nope :( Unfortunately I'm still getting random disk drops all over the place, in fact just had a new error never seen before - it appears something went wrong at the ESXi level and the mdadm array within the VM was completely hung.
Does anyone know how to set mpt3sas.msix_disable at the ESXi level? Even though I'm using passthrough I've also experience a few failures even trying to boot ESXi 6.7 with screen stuck on loading mpt3sas module.
<s> should've stuck with ESXi 5.5 and Linux 2.6 </s>

Code:
[Wed Sep  5 01:07:44 2018] general protection fault: 0000 [#1] SMP PTI
[Wed Sep  5 01:07:44 2018] Modules linked in: softdog cpufreq_conservative cpufreq_userspace cpufreq_powersave xfs sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf vmw_balloon joydev serio_raw pcspkr shpchp vmwgfx evdev ttm drm_kms_helper sg drm button ac vmw_vsock_vmci_transport vsock vmw_vmci sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb btrfs zstd_decompress zstd_compress xxhash raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod hid_generic usbhid hid ses enclosure sr_mod cdrom sd_mod ata_generic crc32c_intel ata_piix aesni_intel aes_x86_64 crypto_simd cryptd glue_helper psmouse ahci uhci_hcd ehci_pci libahci mpt3sas vmxnet3 ehci_hcd mptsas raid_class libata scsi_transport_sas
[Wed Sep  5 01:07:44 2018]  mptscsih mptbase usbcore scsi_mod usb_common i2c_piix4
[Wed Sep  5 01:07:44 2018] CPU: 1 PID: 141 Comm: scsi_eh_0 Not tainted 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[Wed Sep  5 01:07:44 2018] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/28/2017
[Wed Sep  5 01:07:44 2018] RIP: 0010:_scsih_set_satl_pending+0xc/0x50 [mpt3sas]
[Wed Sep  5 01:07:44 2018] RSP: 0018:ffffb18a41073c60 EFLAGS: 00010202
[Wed Sep  5 01:07:44 2018] RAX: 6b736168001a0000 RBX: ffff9485e72db190 RCX: 0000000000000000
[Wed Sep  5 01:07:44 2018] RDX: ffff9485e72db040 RSI: 0000000000000000 RDI: ffff9485e72db190
[Wed Sep  5 01:07:44 2018] RBP: 00000000000007b8 R08: 0000000000000001 R09: 0000000000000000
[Wed Sep  5 01:07:44 2018] R10: ffff9485ee388d80 R11: 00000000000007ae R12: ffff9485ee1be798
[Wed Sep  5 01:07:44 2018] R13: 0000000000000001 R14: ffff9485ee1be940 R15: 0000000000000000
[Wed Sep  5 01:07:44 2018] FS:  0000000000000000(0000) GS:ffff9485ffc40000(0000) knlGS:0000000000000000
[Wed Sep  5 01:07:44 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Sep  5 01:07:44 2018] CR2: 00007ff0b1908540 CR3: 00000001e3e0a003 CR4: 00000000000606e0
[Wed Sep  5 01:07:44 2018] Call Trace:
[Wed Sep  5 01:07:44 2018]  _scsih_flush_running_cmds+0x79/0xe0 [mpt3sas]
[Wed Sep  5 01:07:44 2018]  mpt3sas_scsih_reset_handler+0x317/0x760 [mpt3sas]
[Wed Sep  5 01:07:44 2018]  ? vprintk_emit+0x385/0x450
[Wed Sep  5 01:07:44 2018]  ? printk+0x52/0x6e
[Wed Sep  5 01:07:44 2018]  mpt3sas_base_hard_reset_handler+0x1ac/0x540 [mpt3sas]
[Wed Sep  5 01:07:44 2018]  scsih_host_reset+0x59/0xc0 [mpt3sas]
[Wed Sep  5 01:07:44 2018]  scsi_try_host_reset+0x44/0xe0 [scsi_mod]
[Wed Sep  5 01:07:44 2018]  scsi_eh_ready_devs+0xb5d/0xe50 [scsi_mod]
[Wed Sep  5 01:07:44 2018]  ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
[Wed Sep  5 01:07:44 2018]  scsi_error_handler+0x4df/0x5e0 [scsi_mod]
[Wed Sep  5 01:07:44 2018]  kthread+0xf8/0x130
[Wed Sep  5 01:07:44 2018]  ? scsi_eh_get_sense+0x260/0x260 [scsi_mod]
[Wed Sep  5 01:07:44 2018]  ? kthread_create_worker_on_cpu+0x70/0x70
[Wed Sep  5 01:07:44 2018]  ret_from_fork+0x35/0x40
[Wed Sep  5 01:07:44 2018] Code: 08 c3 48 c1 ee 0b b8 20 00 00 00 ba 40 00 00 00 89 41 04 89 11 31 c0 89 71 08 c3 0f 1f 40 00 66 66 66 66 90 48 8b 87 f8 00 00 00 <0f> b6 10 80 fa a1 74 09 31 c0 80 fa 85 74 02 f3 c3 48 8b 47 38
[Wed Sep  5 01:07:44 2018] RIP: _scsih_set_satl_pending+0xc/0x50 [mpt3sas] RSP: ffffb18a41073c60
[Wed Sep  5 01:07:44 2018] ---[ end trace f935f580e9872343 ]---