Fun times with LSI HBA and mpt2sas on AIO configs

Discussion in 'Software Stuff' started by whitey, Feb 22, 2016.

  1. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,734
    Likes Received:
    846
    OK, I am abt tired of this crap, seems like there is kernel regression for the Linux 4.x kernel series and mpt2sas when using a LSI 2008 HBA. Has happened to me now trying to use rockstor (Centos7 based w/ 4.x kernel) as well as just now trying to build out a test ZoL box on ubuntu LTS 14.04 w/ 4.2.x kernel. Seems like some are having some success with the kernel parameter 'pci-realloc=off' but no love here. Only fix it to go down to a lower kernel 3.19.x for me anyways in ubuntu (simply going from 14.04 to 14.03 LTS) and bam LSI HBA and mpt2sas playin' nice again.

    Guess I am off to go play 'bug report sucker' and hit the IRC channels.

    SMFH!

    Anyone else seeing this garbage?
     
    #1
    Last edited: Feb 22, 2016
  2. PigLover

    PigLover Moderator

    Joined:
    Jan 26, 2011
    Messages:
    2,659
    Likes Received:
    1,041
    It's LSI v20...
     
    #2
  3. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,734
    Likes Received:
    846
    I do believe you are incorrect on this one buddy :-D See other post. Sorry to turd up two posts but this is a REAL issue. I'll happily flash a HBA back to v19 if you think there is a snowballs chance in hell of v19 working. I'm pretty certain I've tested that on CentOS w/ a 4.3 kernel anyways and got same result. Linux distro's are building against v20 drivers btw and I don't have any issues w/ v18 driver and v20 FW.
     
    #3
  4. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,734
    Likes Received:
    846
    Proof in the puddin', that's a 2008 based HBA re-flashed down to v19 IT mode LSI FW. Same symptoms...someone else please re-produce so we can confirm/deny. I've done all I can to shed the light.

    zol-ubuntu-14.04.4-mpt2sas-NOT-happy-v19-LSI-IT-FW.png
     
    #4
  5. izx

    izx Active Member

    Joined:
    Jan 17, 2016
    Messages:
    113
    Likes Received:
    36
    If you think there's a chance this bug was quashed upstream, try 4.3/4.4/4.5 kernels from the mainline ppa.
     
    #5
  6. cptbjorn

    cptbjorn Member

    Joined:
    Aug 16, 2013
    Messages:
    100
    Likes Received:
    19
    I'm getting it on a machine that's running Fedora22/23 to play with bcache, it has been going on since 4.1.something for me. I've been running an early fc22 4.0.4 kernel that works fine, but recently I discovered that I'm able to cold boot 4.0.4-301 and then reboot once to 4.3.5-300. If I reboot again or cold boot 4.3.5 it crashes and burns.

    Mine's a H310 flashed to LSI v19
     
    #6
  7. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,734
    Likes Received:
    846
    Some gent mentioned something similar over in the rockstor forums (hell maybe it was you lol). I just tried installing the SAME identical linux-image/linux-headers for 3.19-25 (same one LTS 14.04.3 ships with) and that bastard cannot even see disks. Tried booting into 4.2.x kernel first, then a reboot into 3.19.x, no luv either. Nothing even in dmesg...even tried to just gut 4.2.x kernel w/ apt-get remove and reboot into 3.19.25 kernel which workds on a fresh build of LTS 14.04.3...nothing...Perplexing.
     
    #7
  8. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,734
    Likes Received:
    846
    And the fix for me was:

    mpt2sas.msix_disable=1 (for 4.3 or older kernels)
    or
    mpt3sas.msix_disable=1 (for 4.4 or newer kernels)

    Added to kernel boot line parameters, adding here as well for prosperity's sake.
     
    #8
    niekbergboer and T_Minus like this.
  9. T_Minus

    T_Minus Moderator

    Joined:
    Feb 15, 2015
    Messages:
    6,369
    Likes Received:
    1,299
    Good to know! Noting this thread.
     
    #9
  10. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,734
    Likes Received:
    846
    Update/more info

    mpt2sas.msix_disable=1 (for 4.3 or older kernels)
    or
    mpt3sas.msix_disable=1 (for 4.4 or newer kernels)

    Added to kernel boot line parameters, adding here as well for prosperity's sake.

    Under ubuntu or CentOS using grub2 to make it stick edit the /etc/default/grub to the following:

    GRUB_CMDLINE_LINUX_DEFAULT="mpt2sas.msix_disable=1"
    (mpt3sas.msix_disable=1 for 4.4 or newer kernels)

    Save file:

    Then update grub2 files:

    Ubuntu - update-grub
    CentOS - grub2-mkconfig -o /boot/grub2/grub.cfg (BIOS based machines)
    grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg (UEFI based machines)
     
    #10
    Last edited: May 2, 2016
    cptbjorn likes this.
  11. ed209

    ed209 New Member

    Joined:
    Apr 20, 2018
    Messages:
    1
    Likes Received:
    0
    I know this is an old thread, but I really want to thank you for posting this information. It has been a lifesaver after 3 days of nearly going nuts.
     
    #11
  12. whitey

    whitey Moderator

    Joined:
    Jun 30, 2014
    Messages:
    2,734
    Likes Received:
    846
    Glad this thread assisted you in resolving your issue. I lived the nightmare for a couple of weeks so happy to see you only burned a few days. Maddening I know, this is why I love this forum, gold nuggets hidden all over this place!
     
    #12
  13. chinesestunna

    chinesestunna Active Member

    Joined:
    Jan 23, 2015
    Messages:
    516
    Likes Received:
    101
    Thanks for posting this @whitey ! I believe this may also be related to the issue of drives not being detected after going to sleep and resuming, then being dropped from various raid arrays documented here: ZFS io error when disks are in idle/standby/spindown mode · Issue #4713 · zfsonlinux/zfs

    They seem to have come to the same conclusion and I'm testing out this fix now, running a OpenMediaVault (Debian 9) VM kernel 4.17 on top of ESXi 6.7 july 2018 build. Fingers crossed my drives will stay put in array now
     
    #13
    whitey likes this.
  14. chinesestunna

    chinesestunna Active Member

    Joined:
    Jan 23, 2015
    Messages:
    516
    Likes Received:
    101
    Sigh... nope :( Unfortunately I'm still getting random disk drops all over the place, in fact just had a new error never seen before - it appears something went wrong at the ESXi level and the mdadm array within the VM was completely hung.
    Does anyone know how to set mpt3sas.msix_disable at the ESXi level? Even though I'm using passthrough I've also experience a few failures even trying to boot ESXi 6.7 with screen stuck on loading mpt3sas module.
    <s> should've stuck with ESXi 5.5 and Linux 2.6 </s>

    Code:
    [Wed Sep  5 01:07:44 2018] general protection fault: 0000 [#1] SMP PTI
    [Wed Sep  5 01:07:44 2018] Modules linked in: softdog cpufreq_conservative cpufreq_userspace cpufreq_powersave xfs sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf vmw_balloon joydev serio_raw pcspkr shpchp vmwgfx evdev ttm drm_kms_helper sg drm button ac vmw_vsock_vmci_transport vsock vmw_vmci sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb btrfs zstd_decompress zstd_compress xxhash raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod hid_generic usbhid hid ses enclosure sr_mod cdrom sd_mod ata_generic crc32c_intel ata_piix aesni_intel aes_x86_64 crypto_simd cryptd glue_helper psmouse ahci uhci_hcd ehci_pci libahci mpt3sas vmxnet3 ehci_hcd mptsas raid_class libata scsi_transport_sas
    [Wed Sep  5 01:07:44 2018]  mptscsih mptbase usbcore scsi_mod usb_common i2c_piix4
    [Wed Sep  5 01:07:44 2018] CPU: 1 PID: 141 Comm: scsi_eh_0 Not tainted 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
    [Wed Sep  5 01:07:44 2018] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/28/2017
    [Wed Sep  5 01:07:44 2018] RIP: 0010:_scsih_set_satl_pending+0xc/0x50 [mpt3sas]
    [Wed Sep  5 01:07:44 2018] RSP: 0018:ffffb18a41073c60 EFLAGS: 00010202
    [Wed Sep  5 01:07:44 2018] RAX: 6b736168001a0000 RBX: ffff9485e72db190 RCX: 0000000000000000
    [Wed Sep  5 01:07:44 2018] RDX: ffff9485e72db040 RSI: 0000000000000000 RDI: ffff9485e72db190
    [Wed Sep  5 01:07:44 2018] RBP: 00000000000007b8 R08: 0000000000000001 R09: 0000000000000000
    [Wed Sep  5 01:07:44 2018] R10: ffff9485ee388d80 R11: 00000000000007ae R12: ffff9485ee1be798
    [Wed Sep  5 01:07:44 2018] R13: 0000000000000001 R14: ffff9485ee1be940 R15: 0000000000000000
    [Wed Sep  5 01:07:44 2018] FS:  0000000000000000(0000) GS:ffff9485ffc40000(0000) knlGS:0000000000000000
    [Wed Sep  5 01:07:44 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [Wed Sep  5 01:07:44 2018] CR2: 00007ff0b1908540 CR3: 00000001e3e0a003 CR4: 00000000000606e0
    [Wed Sep  5 01:07:44 2018] Call Trace:
    [Wed Sep  5 01:07:44 2018]  _scsih_flush_running_cmds+0x79/0xe0 [mpt3sas]
    [Wed Sep  5 01:07:44 2018]  mpt3sas_scsih_reset_handler+0x317/0x760 [mpt3sas]
    [Wed Sep  5 01:07:44 2018]  ? vprintk_emit+0x385/0x450
    [Wed Sep  5 01:07:44 2018]  ? printk+0x52/0x6e
    [Wed Sep  5 01:07:44 2018]  mpt3sas_base_hard_reset_handler+0x1ac/0x540 [mpt3sas]
    [Wed Sep  5 01:07:44 2018]  scsih_host_reset+0x59/0xc0 [mpt3sas]
    [Wed Sep  5 01:07:44 2018]  scsi_try_host_reset+0x44/0xe0 [scsi_mod]
    [Wed Sep  5 01:07:44 2018]  scsi_eh_ready_devs+0xb5d/0xe50 [scsi_mod]
    [Wed Sep  5 01:07:44 2018]  ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
    [Wed Sep  5 01:07:44 2018]  scsi_error_handler+0x4df/0x5e0 [scsi_mod]
    [Wed Sep  5 01:07:44 2018]  kthread+0xf8/0x130
    [Wed Sep  5 01:07:44 2018]  ? scsi_eh_get_sense+0x260/0x260 [scsi_mod]
    [Wed Sep  5 01:07:44 2018]  ? kthread_create_worker_on_cpu+0x70/0x70
    [Wed Sep  5 01:07:44 2018]  ret_from_fork+0x35/0x40
    [Wed Sep  5 01:07:44 2018] Code: 08 c3 48 c1 ee 0b b8 20 00 00 00 ba 40 00 00 00 89 41 04 89 11 31 c0 89 71 08 c3 0f 1f 40 00 66 66 66 66 90 48 8b 87 f8 00 00 00 <0f> b6 10 80 fa a1 74 09 31 c0 80 fa 85 74 02 f3 c3 48 8b 47 38
    [Wed Sep  5 01:07:44 2018] RIP: _scsih_set_satl_pending+0xc/0x50 [mpt3sas] RSP: ffffb18a41073c60
    [Wed Sep  5 01:07:44 2018] ---[ end trace f935f580e9872343 ]---
     
    #14

Share This Page