LSI9211-8i on Ubuntu 15.10 timeouts

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gr8ape

Member
Jun 15, 2015
42
4
8
So I have a problem I have been beating my head against the wall with and I'm hoping someone can help me solve.

I have a virtualized ubuntu 15.10 server machine with LSI9211-8i PCIe passthrough. Previously, in 14.10 and 15.04 it worked flawlessly, always detecting my HDDs and booting. When I upgraded to 15.10, however, the mpt2sas driver has a timeout and no drives are presented to linux. I got the previous 3.19 kernel installed and that works great, but there seems to be some inconsistency that occasionally causes the machine to panic. Anything using > 3.19 (including the new 4.0 kernels) experiences the mpt2sas timeout and no drives are presented to the OS.

Any ideas? I'd like to take advantage of the improvements made to BTRFS in the 4.0 kernel. Thanks for any help and suggestions.
 

mattlach

Active Member
Aug 1, 2014
335
91
28
LSI hardware can be pretty particular about the driver and firmware being on the same "phase" or version as the rest of us would call it.

In fact LSI won't even support configurations with driver/firmware mismatches.

For instance, my current FreeNAS install recently (withan update) went from shipping with the P16 driver to shipping with the P20 driver, so I went in and flashed my two LSI controllers (it IT mode) to the P20 firmware so they match.

I would check your firmware revision and make sure it matches the driver version included with your linux distribution, and if it doesn't, flash them to be the same.

People have reported some pretty nasty problems on occasion when they are mismatched.
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
I've tried matched and mis-matched (v19 and v20), neither helps, there were suspicions that the v19 was 'less buggy/more solid' or a FW release but that even gave me hell. No such luck, I'm afraid they are non-functional, I see this on rockstor and DIY ZoL setup i have recently tried w/ newer kernels. Maybe I will hop on the freenode linux kernel channel and see what i can drum up one of these days. I've damn near begged here for someone else w/ similar setup or access to hardware can try this and no takers yet.
 

mattlach

Active Member
Aug 1, 2014
335
91
28
I've tried matched and mis-matched (v19 and v20), neither helps, there were suspicions that the v19 was 'less buggy/more solid' or a FW release but that even gave me hell. No such luck, I'm afraid they are non-functional, I see this on rockstor and DIY ZoL setup i have recently tried w/ newer kernels. Maybe I will hop on the freenode linux kernel channel and see what i can drum up one of these days. I've damn near begged here for someone else w/ similar setup or access to hardware can try this and no takers yet.
Well, I'm following this with avid interest. My two 9211-8i's in IT mode are working perfectly currently in my ESXi box, forwarded to a FreeNAS guest, but I have been considering migrating to Proxmox, so if they don't work well in Linux, I'll have a real problem.

Unfortunately since my ESXi box is a "home production" system of sorts, I can't just take it down to test :(
 

mattlach

Active Member
Aug 1, 2014
335
91
28
I just had a flashback to when I originally set up my first passthrough with an LSI controller a few years back, and thought of something else you might try.

The little nugget that popped up in my head while I was doing something else is as follows:

Apparently LSI Controllers don't work well with MSI/MSI-X interrupts when passed through. The recommendation is to disable these and fall back on traditional interrupts. When I did it it was a BSD specific configuration in the guest I had to set, but I think you can do it in the host configuration too, at least I could on my ESXi box. it required manually editing the guest config file though, and adding a line to disable MSI.

What host are you using? If you need it, I'll see if I can find the config line for ESXi.
 

rubylaser

Active Member
Jan 4, 2013
846
236
43
Michigan, USA
I've tried matched and mis-matched (v19 and v20), neither helps, there were suspicions that the v19 was 'less buggy/more solid' or a FW release but that even gave me hell. No such luck, I'm afraid they are non-functional, I see this on rockstor and DIY ZoL setup i have recently tried w/ newer kernels. Maybe I will hop on the freenode linux kernel channel and see what i can drum up one of these days. I've damn near begged here for someone else w/ similar setup or access to hardware can try this and no takers yet.
@whitey how are you checking the mpt2sas version number? I just checked on my Ubuntu 14.04.4 server that's been upgraded a few times from 10.04 -> 12.04 -> 14.04, and realized that the version of mpt2sas appears to be super old in my install (unless I'm looking at this wrong). I'm currently running a 4.4 kernel.
Code:
root@fileserver:~# modinfo -F version mpt2sas
09.102.00.00
root@fileserver:~# modinfo -F version mpt3sas
09.102.00.00

In dmesg at boot time, my cards all initialize and are shown to be running v19 firmware.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
I just had a flashback to when I originally set up my first passthrough with an LSI controller a few years back, and thought of something else you might try.

The little nugget that popped up in my head while I was doing something else is as follows:

Apparently LSI Controllers don't work well with MSI/MSI-X interrupts when passed through. The recommendation is to disable these and fall back on traditional interrupts. When I did it it was a BSD specific configuration in the guest I had to set, but I think you can do it in the host configuration too, at least I could on my ESXi box. it required manually editing the guest config file though, and adding a line to disable MSI.

What host are you using? If you need it, I'll see if I can find the config line for ESXi.
ESXi 6.0U1
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
@whitey how are you checking the mpt2sas version number? I just checked on my Ubuntu 14.04.4 server that's been upgraded a few times from 10.04 -> 12.04 -> 14.04, and realized that the version of mpt2sas appears to be super old in my install (unless I'm looking at this wrong). I'm currently running a 4.4 kernel.
Code:
root@fileserver:~# modinfo -F version mpt2sas
09.102.00.00
root@fileserver:~# modinfo -F version mpt3sas
09.102.00.00

In dmesg at boot time, my cards all initialize and are shown to be running v19 firmware.
Recent Linxu distro's (IE: CentOS7/Ubuntu LTS 14.04.4 and newer 16.04) release seem to never work for me. I see they do include mpt2sas v20 though. I've tried HBA's flashed to v19/v20 to no avail.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
@whitey how are you checking the mpt2sas version number? I just checked on my Ubuntu 14.04.4 server that's been upgraded a few times from 10.04 -> 12.04 -> 14.04, and realized that the version of mpt2sas appears to be super old in my install (unless I'm looking at this wrong). I'm currently running a 4.4 kernel.
Code:
root@fileserver:~# modinfo -F version mpt2sas
09.102.00.00
root@fileserver:~# modinfo -F version mpt3sas
09.102.00.00

In dmesg at boot time, my cards all initialize and are shown to be running v19 firmware.
root@xenial:~# dmesg | grep -i mpt
[ 0.000000] Device empty
[ 1.137911] Fusion MPT base driver 3.04.20
[ 1.142031] Fusion MPT SPI Host driver 3.04.20
[ 1.161487] mptbase: ioc0: Initiating bringup
[ 1.162573] mpt3sas version 12.100.00.00 loaded
[ 1.165466] mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (8175500 kB)
[ 1.222127] mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 2, max_msix_vectors: -1
[ 1.222830] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 59
[ 1.222930] mpt2sas_cm0: iomem(0x00000000fd4fc000), mapped(0xffffc90000e48000), size(16384)
[ 1.223130] mpt2sas_cm0: ioport(0x0000000000005000), size(256)
[ 1.316046] mpt2sas_cm0: Allocated physical memory: size(7579 kB)
[ 1.316156] mpt2sas_cm0: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432)
[ 1.316315] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[ 31.362130] mpt2sas_cm0: _base_event_notification: timeout
[ 31.366929] mpt2sas_cm0: sending message unit reset !!
[ 31.368915] mpt2sas_cm0: message unit reset: SUCCESS
[ 31.560499] mpt2sas_cm0: failure at /build/linux-lAMkDx/linux-4.4.0/drivers/scsi/mpt3sas/mpt3sas_scsih.c:8800/_scsih_probe()!
root@xenial:~# dmesg | grep -i lsi
[ 1.138006] Copyright (c) 1999-2008 LSI Corporation
[ 1.230249] ioc0: LSI53C1030 B0: Capabilities={Initiator}
[ 1.390718] scsi host3: ioc0: LSI53C1030 B0, FwRev=01032920h, Ports=1, MaxQ=128, IRQ=17
 

mattlach

Active Member
Aug 1, 2014
335
91
28
ESXi 6.0U1
Alright, try this.

You'll need to manually edit the .vmx file for the guest. There are many ways to do this, but I usually just download the file to my local machine, edit it in my favorite text file editor, and then save it and upload it, overwriting the old one.

When you edit the file, you'll want to find the correct pass-through identifier. If you only have one passed through device, it will be pciPassthru0, but if you have more, it might be pciPasstru1, pciPassthru2, etc. Seatch for pciPassthru and you'll find th esection. It will look something like this:

Code:
pciPassthru0.id = "00000:006:00.0"
pciPassthru0.deviceId = "0x0dd8"
pciPassthru0.vendorId = "0x10de"
pciPassthru0.systemId = "541a111c-2f01-cfda-9ad4-001517cd448e"
pciPassthru0.present = "TRUE"
Once you identify which one is the LSI device (you can do this by googling vendorid's and systemid's you see listed in the file, or if the guest only has one passthrough device it is easy, it's 0, you add the following line to the file:

Code:
pciPassthru0.msiEnabled = "FALSE"
(Where you change that 0 to whatever the number of your passthrough device is)

Save the file, upload it back to your ESXi server overwriting the old one, and MSI SHOULD now be disabled for that pass-through device.

Hopefully this will solve your timeout issues.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
No go good sir.

Edited the vmx properly and also left the 'pci=realloc=off' enabled in the kernel boot line. Also tried w/ 'pci=realloc=off' disabled in kernel boot line but .vmx still tweaked, no luv either.

Code:
root@xenial:~# dmesg | grep -i mpt
[    0.000000]   Device   empty
[    1.200914] Fusion MPT base driver 3.04.20
[    1.214758] Fusion MPT SPI Host driver 3.04.20
[    1.225542] mpt3sas version 12.100.00.00 loaded
[    1.226318] mptbase: ioc0: Initiating bringup
[    1.226424] mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (8175500 kB)
[    1.285560] mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 2, max_msix_vectors: -1
[    1.286254] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 59
[    1.286348] mpt2sas_cm0: iomem(0x00000000fd4fc000), mapped(0xffffc90000e48000), size(16384)
[    1.286502] mpt2sas_cm0: ioport(0x0000000000005000), size(256)
[    1.383195] mpt2sas_cm0: Allocated physical memory: size(7579 kB)
[    1.383301] mpt2sas_cm0: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432)
[    1.383453] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[   31.428456] mpt2sas_cm0: _base_event_notification: timeout
[   31.433241] mpt2sas_cm0: sending message unit reset !!
[   31.435235] mpt2sas_cm0: message unit reset: SUCCESS
[   31.648210] mpt2sas_cm0: failure at /build/linux-lAMkDx/linux-4.4.0/drivers/scsi/mpt3sas/mpt3sas_scsih.c:8800/_scsih_probe()!
root@xenial:~# dmesg | grep -i lsi
[    1.201001] Copyright (c) 1999-2008 LSI Corporation
[    1.296621] ioc0: LSI53C1030 B0: Capabilities={Initiator}
[    1.462024] scsi host3: ioc0: LSI53C1030 B0, FwRev=01032920h, Ports=1, MaxQ=128, IRQ=17
root@xenial:~# lspci | grep -i lsi
00:10.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 01)
0b:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
root@xenial:~#
msiEnabled-FALSE.png
 
Last edited:

mattlach

Active Member
Aug 1, 2014
335
91
28
No go good sir.

Edited the vmx properly and also left the 'pci=realloc=off' enabled in the kernel boot line. Also tried w/ 'pci=realloc=off' disabled in kernel boot line but .vmx still tweaked, no luv either.

Code:
root@xenial:~# dmesg | grep -i mpt
[    0.000000]   Device   empty
[    1.200914] Fusion MPT base driver 3.04.20
[    1.214758] Fusion MPT SPI Host driver 3.04.20
[    1.225542] mpt3sas version 12.100.00.00 loaded
[    1.226318] mptbase: ioc0: Initiating bringup
[    1.226424] mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (8175500 kB)
[    1.285560] mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 2, max_msix_vectors: -1
[    1.286254] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 59
[    1.286348] mpt2sas_cm0: iomem(0x00000000fd4fc000), mapped(0xffffc90000e48000), size(16384)
[    1.286502] mpt2sas_cm0: ioport(0x0000000000005000), size(256)
[    1.383195] mpt2sas_cm0: Allocated physical memory: size(7579 kB)
[    1.383301] mpt2sas_cm0: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432)
[    1.383453] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[   31.428456] mpt2sas_cm0: _base_event_notification: timeout
[   31.433241] mpt2sas_cm0: sending message unit reset !!
[   31.435235] mpt2sas_cm0: message unit reset: SUCCESS
[   31.648210] mpt2sas_cm0: failure at /build/linux-lAMkDx/linux-4.4.0/drivers/scsi/mpt3sas/mpt3sas_scsih.c:8800/_scsih_probe()!
root@xenial:~# dmesg | grep -i lsi
[    1.201001] Copyright (c) 1999-2008 LSI Corporation
[    1.296621] ioc0: LSI53C1030 B0: Capabilities={Initiator}
[    1.462024] scsi host3: ioc0: LSI53C1030 B0, FwRev=01032920h, Ports=1, MaxQ=128, IRQ=17
root@xenial:~# lspci | grep -i lsi
00:10.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 01)
0b:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
root@xenial:~#
View attachment 1926
Hmm.

That is too bad... This is what solved timeout problems due to interrupt storms on IBM M1015's and LSI controllers passed through to FreeNAS guests.

In FreeNAS though, I didn't disable it in the vmx file, as there is an in guest config option (system tunable) to do so (hw.pci.enable_msi/msix). Maybe it needs to be done in the guest? If so, maybe there is a linux equivalent configuration setting somewhere? (I'm not sure where it would be)

Also, the obvious question I forgot to ask was, does this problem present itself in a bare metal Ubuntu 15.10 install, or is it only in passthrough? If the problem is present in both, then my solution won't help, as that is a passthrough only issue. If it is present in both, the issue is somewhere else, like maybe the kernel or kernel driver.
 

rubylaser

Active Member
Jan 4, 2013
846
236
43
Michigan, USA
Hmm.

That is too bad... This is what solved timeout problems due to interrupt storms on IBM M1015's and LSI controllers passed through to FreeNAS guests.

In FreeNAS though, I didn't disable it in the vmx file, as there is an in guest config option (system tunable) to do so (hw.pci.enable_msi/msix). Maybe it needs to be done in the guest? If so, maybe there is a linux equivalent configuration setting somewhere? (I'm not sure where it would be)

Also, the obvious question I forgot to ask was, does this problem present itself in a bare metal Ubuntu 15.10 install, or is it only in passthrough? If the problem is present in both, then my solution won't help, as that is a passthrough only issue. If it is present in both, the issue is somewhere else, like maybe the kernel or kernel driver.
It appears to happen in VMWare with vt-d passthrough enabled. I tested this morning on a baremetal install of Ubuntu 16.04 Server with two different 2008 cards (m1015 and H310) flashed to P19 firmware and they both were initialized and detected without issue.

ZFS native on Ubuntu 16.04 LTS
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Hmm.

That is too bad... This is what solved timeout problems due to interrupt storms on IBM M1015's and LSI controllers passed through to FreeNAS guests.

In FreeNAS though, I didn't disable it in the vmx file, as there is an in guest config option (system tunable) to do so (hw.pci.enable_msi/msix). Maybe it needs to be done in the guest? If so, maybe there is a linux equivalent configuration setting somewhere? (I'm not sure where it would be)

Also, the obvious question I forgot to ask was, does this problem present itself in a bare metal Ubuntu 15.10 install, or is it only in passthrough? If the problem is present in both, then my solution won't help, as that is a passthrough only issue. If it is present in both, the issue is somewhere else, like maybe the kernel or kernel driver.
The Linux equivalent is 'supposedly' 'pci=realloc=off' on the kernel boot line parameter...does nothing for me.

Another member @rubylaser confirmed that there is no issue on bare metal HW so it certainly seems to be VM's w/ vt-D enabled and newer Linux distro's/kernel releases. Be nice to see if this is affecting other hypervisors as well.
 

gr8ape

Member
Jun 15, 2015
42
4
8
Drat! I keep following this hoping for a solution.

I did have a kernel 4.3 boot up once with all the drives detected. Then never again. What a tease. Whitey, did you already try downgrading the firmware?
 

rubylaser

Active Member
Jan 4, 2013
846
236
43
Michigan, USA
Drat! I keep following this hoping for a solution.

I did have a kernel 4.3 boot up once with all the drives detected. Then never again. What a tease. Whitey, did you already try downgrading the firmware?
Are you on a baremetal install or virtualized?
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Drat! I keep following this hoping for a solution.

I did have a kernel 4.3 boot up once with all the drives detected. Then never again. What a tease. Whitey, did you already try downgrading the firmware?
Yep, downgraded the firmware to v19...no luv