Occasional OmniOS iSCSI disconnect from VMware

TechTrend · Jun 4, 2019

I use a VM with OmniOS r151024ap and napp-it v. 18.01b as a storage server for a VMware 6.0u3 cluster. Around 30 VMs use 8 iSCSI LUNs for storage. It is mostly very stable, but about twice a year the storage VM stops responding to iSCSI requests from ESXi hosts. After an incident, vmkernel log shows messages of the form:

Code:

2019-06-02T15:11:57.699Z cpu23:32870)nmlx4_en: vmnic3: nmlx4_en_RxQAlloc - (partners/mlnx/nmlx4/nmlx4_en/nmlx4_en_multiq.c:628) RX queue 2 is allocated
2019-06-02T15:11:57.708Z cpu23:32870)nmlx4_en: vmnic3: nmlx4_en_QueueApplyFilter - (partners/mlnx/nmlx4/nmlx4_en/nmlx4_en_multiq.c:2145) MAC RX filter (class 1) at index 0 is applied on
2019-06-02T15:11:57.708Z cpu23:32870)nmlx4_en: vmnic3: nmlx4_en_QueueApplyFilter - (partners/mlnx/nmlx4/nmlx4_en/nmlx4_en_multiq.c:2152) RX ring 2, QP[0x5a], Mac address 00:0c:29:e3:b6:ca
2019-06-02T15:13:22.708Z cpu26:32870)nmlx4_en: vmnic3: nmlx4_en_QueueRemoveFilter - (partners/mlnx/nmlx4/nmlx4_en/nmlx4_en_multiq.c:2325) MAC RX filter (class 1) at index 0 is removed from
2019-06-02T15:13:22.708Z cpu26:32870)nmlx4_en: vmnic3: nmlx4_en_QueueRemoveFilter - (partners/mlnx/nmlx4/nmlx4_en/nmlx4_en_multiq.c:2332) RX ring 2, QP[0x5a], Mac address 00:0c:29:e3:b6:ca
2019-06-02T15:13:22.708Z cpu26:32870)nmlx4_en: vmnic3: nmlx4_en_RxQFree - (partners/mlnx/nmlx4/nmlx4_en/nmlx4_en_multiq.c:789) RX queue 2 is freed
...
2019-06-02T15:14:48.081Z cpu0:33527)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba37:CH:0 T:1 CN:0: Failed to receive data: Connection closed by peer
2019-06-02T15:14:48.081Z cpu0:33527)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: 00023d000001 TARGET: iqn.2010-08.org.illumos:san04-t0 TPGT: 1 TSIH: 0]
2019-06-02T15:14:48.081Z cpu0:33527)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 10.2.5.14:34960 R: 10.2.5.32:3260]
2019-06-02T15:14:48.081Z cpu0:33527)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba37:CH:0 T:1 CN:0: Connection rx notifying failure: Failed to Receive. State=Online
2019-06-02T15:14:48.081Z cpu0:33527)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: 00023d000001 TARGET: iqn.2010-08.org.illumos:san04-t0 TPGT: 1 TSIH: 0]
2019-06-02T15:14:48.081Z cpu0:33527)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 10.2.5.14:34960 R: 10.2.5.32:3260]
2019-06-02T15:14:48.081Z cpu0:33527)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba37:CH:0 T:1 CN:0: iSCSI connection is being marked "OFFLINE" (Event:6)
2019-06-02T15:14:48.081Z cpu0:33527)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000001 TARGET: iqn.2010-08.org.illumos:san04-t0 TPGT: 1 TSIH: 0]
2019-06-02T15:14:48.081Z cpu0:33527)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 10.2.5.14:34960 R: 10.2.5.32:3260]

hostd log shows all LUNs lost shortly after the iSCSI "connection closed by peer".

Code:

2019-06-02T15:14:57.519Z info hostd[42A03B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 101143 : Lost access to volume 5b9fdd93-bdc2efeb-3fe7-002590c7c9c0 (l0401) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
2019-06-02T15:14:57.520Z info hostd[42A03B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 101144 : Lost access to volume 5a5e0f16-7d3c0820-d1c0-90e2ba160560 (l0402) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
2019-06-02T15:14:57.521Z info hostd[42A03B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 101145 : Lost access to volume 57c089bc-d0040083-f68c-90e2ba160560 (l0403) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
...

iSCSI LUNs remain disconnected until the storage VM is rebooted. Before reboot, VM responds to ping on all its NICs and does not appear overloaded. After reboot, all LUNs quickly come back up and VMware datastores are operational again.

Physical host uses Connect-X 3 Pro 10G NICs. nmlx4 driver reports no errors on the logs. Physical storage NIC is on a separate storage VLAN.

Storage VM is configured with 1 e1000 NIC for LAN access and 2 vmxnet3 NICs serving 2 storage subnets. open-vm-tools version is 10.1.15 from r151024ap.

Incidents don't seem correlated with periods of highest storage activity. Haven't increased logging levels in OmniOS. Default logs don't show much.

Has anyone seen a similar behavior? Any suggestions to prevent this?

Thanks.

gea · Jun 5, 2019

I would mainly expect a timeout problem due a weak disk. ZFS has higher timeouts until it report problems per default than ESXi. Check System > Log, System > faults and under load check iostat ex busy%. With iostat all disks should behave quite similar. If one is significant worse, replace.

You can also check pool state. High iostat error values on a disk can be an indication of a disk problem.

If it is only a random problem without clear reason: Do you use sync write without Slog or a slow Slog?
Then you may care about pool iops on sync writes. Run a Pool > Benchmark for write values (compare sync vs async).

Some tunings may also help for very random problems like more RAM, higher values for tcp, NFS and vmxnet3 buffers

TechTrend · Jun 5, 2019

Gea,

Thanks for your suggestions.

There are ten HGST He8 8TB SATA drives on a mirrored (RAID10) pool. Looked at %w and %b on iostat for several iterations, but didn't see significant differences.

Code:

# iostat -xn 1 2
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 fd0
  242.0  585.2 30572.1 55926.5 35.1  5.4   42.4    6.6  16  60 pool0
    0.1    0.4    1.0    4.4  0.0  0.0    3.1    0.3   0   0 rpool
    0.1    0.5    1.0    4.4  0.0  0.0    0.0    0.3   0   0 c2t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c1t0d0
   24.3   59.2 3066.9 5563.0  0.0  0.5    0.0    6.5   0  22 c4t5000CCA254E925FEd0
   24.2   59.2 3064.0 5563.0  0.0  0.5    0.0    6.5   0  22 c4t5000CCA254E86659d0
   24.2   58.0 3055.9 5562.8  0.0  0.5    0.0    6.1   0  22 c4t5000CCA254E9D7A7d0
   24.2   57.9 3052.4 5562.8  0.0  0.5    0.0    6.2   0  22 c4t5000CCA254E9BF81d0
   23.9   58.1 3021.0 5531.9  0.0  0.6    0.0    7.0   0  22 c4t5000CCA263C066EDd0
   24.3   60.4 3063.9 5659.4  0.0  0.6    0.0    6.8   0  23 c4t5000CCA263C20C0Cd0
   24.4   60.5 3080.9 5648.0  0.0  0.5    0.0    6.4   0  22 c4t5000CCA263C178C5d0
   24.0   58.1 3029.5 5531.9  0.0  0.5    0.0    6.1   0  21 c4t5000CCA254D24538d0
   24.4   60.5 3079.6 5648.0  0.0  0.5    0.0    6.4   0  22 c4t5000CCA260D44BC0d0
   24.2   60.4 3060.1 5659.4  0.0  0.5    0.0    6.4   0  22 c4t5000CCA263C3451Bd0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 fd0
   20.0    0.0 2443.1    0.0  0.0  0.2    0.0    8.7   0  15 pool0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 rpool
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c2t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c1t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t5000CCA254E925FEd0
    6.0    0.0  769.0    0.0  0.0  0.0    0.0    2.1   0   1 c4t5000CCA254E86659d0
    7.0    0.0  897.1    0.0  0.0  0.0    0.0    1.4   0   1 c4t5000CCA254E9D7A7d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t5000CCA254E9BF81d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t5000CCA263C066EDd0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   4 c4t5000CCA263C20C0Cd0
    5.0    0.0  640.8    0.0  0.0  0.1    0.0   17.5   0   9 c4t5000CCA263C178C5d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t5000CCA254D24538d0
    2.0    0.0  136.2    0.0  0.0  0.0    0.0    9.5   0   2 c4t5000CCA260D44BC0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t5000CCA263C3451Bd0

# zpool iostat -v
                              capacity     operations    bandwidth
pool                       alloc   free   read  write   read  write
-------------------------  -----  -----  -----  -----  -----  -----
pool0                      10.4T  25.9T    242    613  29.9M  27.3M
  mirror                   2.08T  5.17T     48    120  5.99M  5.43M
    c4t5000CCA254E86659d0      -      -     24     58  2.99M  5.43M
    c4t5000CCA254E925FEd0      -      -     24     58  3.00M  5.43M
  mirror                   2.10T  5.15T     48    115  5.97M  5.43M
    c4t5000CCA254E9BF81d0      -      -     24     57  2.98M  5.43M
    c4t5000CCA254E9D7A7d0      -      -     24     57  2.98M  5.43M
  mirror                   2.12T  5.13T     47    120  5.91M  5.40M
    c4t5000CCA263C066EDd0      -      -     23     57  2.95M  5.40M
    c4t5000CCA254D24538d0      -      -     23     57  2.96M  5.40M
  mirror                   2.05T  5.20T     48    127  5.98M  5.52M
    c4t5000CCA263C3451Bd0      -      -     24     59  2.99M  5.53M
    c4t5000CCA263C20C0Cd0      -      -     24     59  2.99M  5.53M
  mirror                   2.04T  5.21T     48    128  6.02M  5.51M
    c4t5000CCA263C178C5d0      -      -     24     59  3.01M  5.52M
    c4t5000CCA260D44BC0d0      -      -     24     59  3.01M  5.52M
-------------------------  -----  -----  -----  -----  -----  -----
rpool                      9.15G  15.7G      0      0  1.02K  4.37K
  c2t0d0s0                 9.15G  15.7G      0      0  1.02K  4.37K
-------------------------  -----  -----  -----  -----  -----  -----

On the day of the incident there is an error on System, Faults, Error Details. Yet the timestamp is after the reboot, when operations were normal. That is the only fmdump error in the past 10 months.

Code:

Jun 02 2019 16:13:19.059504221 ereport.io.scsi.cmd.disk.dev.rqs.derr
nvlist version: 0
    class = ereport.io.scsi.cmd.disk.dev.rqs.derr
    ena = 0xabf6a65cbc400001
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = dev
        device-path = /pci@0,0/pci15ad,1976@10/sd@0,0
        devid = id1,sd@n6000c29dc4e185e3b6641dc0cc1692a0
    (end detector)

    devid = id1,sd@n6000c29dc4e185e3b6641dc0cc1692a0
    driver-assessment = fail
    op-code = 0x1a
    cdb = 0x1a 0x0 0x48 0x0 0x20 0x0
    pkt-reason = 0x0
    pkt-state = 0x37
    pkt-stats = 0x0
    stat-code = 0x2
    key = 0x5
    asc = 0x24
    ascq = 0x0
    sense-data = 0x70 0x0 0x5 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0x0 0x24 0x0 0x0 0xc0 0x0 0x2 0x0 0x0
    __ttl = 0x1
    __tod = 0x5cf3f59f 0x38bf65d

There are no ZIL or SLOG devices on that server. Have considered adding a SLOG device, although performance has been acceptable without it.

The storage VM is configured with 4 vCPUs and 16GB RAM. Will consider increasing vCPUs and RAM on the next maintenance window.

Is there a way to make the iSCSI service recover automatically (e.g. without a manual reboot)? That would not solve the issue, but its impact would be reduced significantly.

gea · Jun 5, 2019

If you have enabled sync write (or the same setting for a LUN: write back cache disabled) I would recommend an Slog (Optane 800P or 900P or the upcoming datacenter Optane DC 4801X-100/200 with guaranteed plp) as sync write is quite bad on disks without.

Run a Pools > Benchmark to check sync performance

An auto iscsi reconnect must be done on client side.
An option would be NFS where an auto reconnect is quite usual,

Search

Occasional OmniOS iSCSI disconnect from VMware

TechTrend

Member

gea

Well-Known Member

TechTrend

Member

gea

Well-Known Member