CX3 iSER

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
This is a continuation of previous discussion here:
NFSv3 vs NFSv4 vs iSCSI for ESXI datastores

OK, so there is good news and bad news about connectx-3 and iser on esxi.

First the bad news.

The bad news is, there are issues with flow control which might or might not be able to be resolved. More importantly, when iser hangs on esxi, it is often fatal to the extent that the system cannot boot. So use this info at your own risk, with the following warning:

If the iSER client or target is mis-configured in such a way that the LUN attaches but cannot transfer data, it may hang the system. It may then hang during shutdown, and at reboot. This can require a reinstall of esxi (and motivate use of kickstart installs).

If the client connects to a target, but cannot send/receive RDMA, or the target or lun go offline:
- Certain functions in esxi hang, including any file-system access.
- Esxi will hang during shutdown at "stopping scsi periodic probe".
- Esxi will hang during boot. It will not reach a shell or a prompt.

This can occur when a roce V1 client tries to connect to a roce V2 only server. It will silently fail.

Even with a valid configuration, if an iser datastore is mounted when shutdown occurs, the system will hang at "stopping scsi periodic probe". I just hit reset.

So, on to the good news.

The inbox drivers and iSER does work with CX3, on both RoCEv1 and v2.

One of the most crucial peices of missing information, for me, was, after a static discovery target is configured, next, one or more VMkernel adapters (vmkN) must be 'bound' to the adapter, and each of the vmkN must be tagged 'VSAN' (see 'interface tag add' below).

Second, if you add only one path to an iscsi target, the device may be marked 'degraded' because it has nofailover candidate; however, it will work. Just be aware that 'degraded' does not mean dead. (see 'statictarget add' below)

If you have a CX3Pro, consider using RoCE V2 if possible. It provides better performance and far more consistent bandwidth/throughput than V1.

I found no RDMA performance or troubleshooting tools for esxi 6.7. Mellanox has a installable RDMA sniffer, but it doesn not work with cx3/3Pro cards.

You need either global flow control (pause all traffic on switch), or better yet, priority flow control, which pauses only the recipient of the pause request. I have tested primarily using RoCEv2 and CX3 Pro cards connected to an Arista DCS-7050QX.

There are still flow control issues even with the configuration shown below, however given the lack of troubleshooting tools, I haven't yet been able to identify whether the issue is on the initiator or target side. Capturing RoCE traffic would help. I haven't yet done that, but I didn't want to delay getting this info out any further.

Here is some example steps for configuring iser on esxi 6.7 with a CX3 Pro card, just to fill in some of the gaps discussed above. This installs no additional drivers. It uses only the inbox drivers provided by the install CD. Be careful if you attempt these, and consider using kickstart installs if you want to pursue it.

Code:
# warning! This can and may hang your system if mis-configured.
esxcli system module parameters set -m nmlx4_en -p "pfctx=0x08 pfcrx=0x08"
esxcli system module parameters set -m nmlx4_core -p "enable_qos=1 enable_rocev2=1"
esxcli system module parameters set -m nmlx4_rdma -p "pcp_force=3 dscp_force=24 dscp_to_pcp=0"
esxcli network firewall ruleset set -e true -r iSCSI
esxcli network firewall ruleset set -e true -r pvrdma
esxcli iscsi software set -e true
esxcli system module set --enabled=true --module=iser
vmkload_mod iser
esxcli rdma iser add
/sbin/auto-backup.sh
reboot
esxcli network ip interface tag add -i vmk4 -t VSAN
esxcli iscsi networkportal add -A vmhba67 -n vmk4
esxcli iscsi adapter discovery statictarget add -A vmhba67 -a 172.16.1.99 -n iqn.2001-04.com.my-server
esxcli iscsi adapter discovery statictarget add -A vmhba67 -a 172.16.2.99 -n iqn.2001-04.com.my-server
esxcli iscsi adapter discovery rediscover -A vmhba67
esxcli storage core device list
esxcli storage vflash device list
Resources:
https://community.mellanox.com/s/article/what-is-iser-x
https://community.mellanox.com/s/ar...r-for-vmware-vsphere-esxi-6-5-6-7-host--cli-x
http://www.mellanox.com/page/products_dyn?product_family=36&mtag=vmware_drivers
http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_ESXi_User_Manual_v2.4.0.pdf
 
Last edited:
  • Like
Reactions: efschu2 and Rand__

efschu2

Member
Feb 14, 2019
68
12
8
Are you sure about CX3 RoCEv2 support, everywhere else I read CX3 only supports RoCEv1?
Will the esxi host hang with CX4 and unreachable iSer target too or is this only related to CX3?
 

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
Yes, the CX3 does only RoCEv1.
The CX3Pro does Both RoCEv1 and RoCEv2.

I have not seen esxi hang with the CX4. I've only seen it hang with the CX3 and CX3Pro.

For the CX3/3Pro doing RoCE, the manuals suggest it's best to use a dedicated port that isolates RDMA traffic, and eliminates most other traffic (other than the required the iscsi TCP connection). This is a hint that, while they can do full bandwidth RoCE, they are sensitive to any competition for bandwidth.

Below is a bandwidth test from a Windows 10 VM to a iSER target ramdisk. This is not intended to show expected performance. It's intended to show that the CX3 is capable of doing iSER filling a 40Gb link. I don't have bandwidth estimates for esxi,

What these tests do not show, is how the performance drops when back pressure results from limited bandwidth of the storage media. When the iscsi target blocks for any reason (buffer full, drive queue full, disk bandwidth limit, IOPS limit, etc), then the target uses priority flow control to signal the initiator to stop sending data. If the initiator does get that signal from priority flow control, then the initiator instead just sees the target stops responding, and treats it as a dead path.

Without working priority flow control going both directions, the performance drops drastically on real storage media (for SSD, NVME or hard drives).

Flow control is the last part of the puzzle here. It's possible that with RoCE traffic isolated on a given port of the CX3/3Pro, and working flow control both directions, that it could perform very well.


2019-03-22 08_06_16-Untitled - ATTO Disk Benchmark cx3 roce v2 iser ramdisk.png
 

Attachments

Last edited:

efschu2

Member
Feb 14, 2019
68
12
8
...
When the iscsi target blocks for any reason (buffer full, drive queue full, disk bandwidth limit, IOPS limit, etc), then the target uses priority flow control to signal the initiator to stop sending data. If the initiator does get that signal from priority flow control, then the initiator instead just sees the target stops responding, and treats it as a dead path.
...
View attachment 10813
What about multiple paths to the target, will esxi try the next one which should also be busy - and then what happens, explode? o_O:D
 

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
Yes, it'll try all of the paths. For both the round-robin policy and the default policy (failover) it will only fail when no paths are working ('all paths down').

To catch some of these issues, F-12 on the esxi console will switch to a view of the kernel log. Spacebar will pause/unpause the log, and arrow keys or page up/down will scroll.
 

efschu2

Member
Feb 14, 2019
68
12
8
So if there is rly heavy load on the target, the esxi will mark the storage as offline? Then what is happening to all VMs on this storage, they'll just crash? I hope there is some kind of timeout setting esxi is waiting before crashing them?

In one or two weeks I'll be able to do some tests on my own, but reading about the behaviour of esxi is already frustrating me :confused:

edit
Ok I should rtfm. RecoveryTimeout should do that, correct?
 
Last edited:

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
A heavy load won't generally make it go offline. Without working flow control both directions, the performance will degrade rapidly though.

It's RDMA mis-configuration, or a completely down network that can cause the system to hang.
 

efschu2

Member
Feb 14, 2019
68
12
8
No you misunderstood. I was thinking about esxiA is performing out the iSER target and PFC is telling esxiB to wait, then esxiB would mark the datastore offline and the VMs would crash, but nvm.
 

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
The PFC pause is likely very short and intermittant. -- it's used to throttle down the rate when a receiver is running out of buffer space, and would otherwise start dropping received data.
 

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
For those using linux, here's some performance tuning commands for connectx-3.
For more info, see: https://community.mellanox.com/s/article/performance-tuning-for-mellanox-adapters

These will enhance TCP/IP as well as iSER.

1. use the 'performance' governor to allow low power consumption when CPU is idle and higher power as demand increases. Note, in recent distros, 'on-demand' is deprecated, and replaced by 'performance'.

Code:
for i in /sys/devices/system/cpu/cpu[0-9]*/cpufreq/scaling_governor ;do echo performance >$i ;done
2. Set static IRQ assignments for the mellanox driver. This tunes IRQs for Connectx-3, -4 and -5.

Code:
IRQBALANCE_ARGS=""
for DRIVER in mlx4 mlx5 ;do
    IRQS=$(cat /proc/interrupts | grep ${DRIVER} | awk '{print $1}' | sed 's/://')
    cores=($(seq 1 $(grep -c processor /proc/cpuinfo)))
    i=0
    for IRQ in $IRQS ;do
        IRQBALANCE_ARGS="${IRQBALANCE_ARGS} -i ${IRQ}"
        core=${cores[$i]}
        let "mask=2**(core-1)"
        echo $(printf "%x" $mask) > /proc/irq/${IRQ}/smp_affinity
        let "i+=1"
        if [[ $i == ${#cores[@]} ]]; then
            i=0
        fi
    done
done
# if on Debian/Ubuntu:
FILE=/etc/default/irqbalance
# if on Centos/Redhat:
FILE=/etc/sysconfig/irqbalance
sed -i "s/^#*IRQBALANCE_ARGS=.*/IRQBALANCE_ARGS=\"${IRQBALANCE_ARGS}\"/" ${FILE}
systemctl restart irqbalance
3. If you installed the mellanox OFED drivers (that is, you are not the distro's built-in (inbox) drivers) then you can also invoke:
Code:
mlnx_tune -p HIGH_THROUGHPUT
With this I'm seeing 3GBytes/sec NFS sequental read or write and 4 to 4.5 Gbytes/sec iSER.

In fact, after using these tuning on ZFS file servers here, I see various IOPS performance measures of iSER and NFS that are so close it's questionable whether iSER has any advantage unless lower latency is crucial. I'm seeing average Q16T16 latencies of 300 usec over NFS4/TCP, and 180 usec over iSER.
 

efschu2

Member
Feb 14, 2019
68
12
8
Finaly I have gotten my cables, now I have setup a test environment with an eth-upgraded SX6012, but found out the SwitchX based ones does not support ECN. So is there any way to get iSer working only with PFC, or any other mechanism?

And if ECN is always needed, iser will not work w/o a switch?

Will rdma based NFS require ECN too? Does esxi 6.7 support rdma based NFS, if yes, how to setup?
 

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
I don't have a SX6012, though I've been considering trying one.
And I'm not a network guru, so I can't answer definitively.

The doc's say the switch must support ECN but it's possible that's just to handle congestion in the switch.

ECN is two bits in the IPv4 or IPv6 header, however for RoCE it's only used by RoCE v2 according to this:
https://community.mellanox.com/s/article/understanding-rocev2-congestion-management

So, in theory, two cards can send ECN to each other for RoCEv2.
And that could still handle the bottleneck at the receiving end may be more important.

I wish there were some RDMA bandwidth measurement tools for esxi 6.7.
That would make verifying this far easier.

I believe esxi 6.7 does not support NFS over RDMA.

I believe ECN is handled by the RoCE firmware in the nic, so it may apply to anything using RoCEv2, including NFS. I tested it briefly between linux hosts, but not enough to know if there are obstacles. I'm definitely thinking of using it for containers.
 

efschu2

Member
Feb 14, 2019
68
12
8
RoCEv1 is working with GlobalPause no VLANs and w/o ECN on Eth upgraded SX6012.
esxi initiator to LIO/SCST target is loosing connection after ~1Sec, could not find out the cause.
On tgt target it is working "wonderful" beside the high load of userspace target. on pcie2.0 8x limited system i get 0.2ms response time, 3.5GBs seq and ~50k 4k IOPS in a Windows VM. With this "multipath trick" I get 4.2GBs seq:
Code:
esxcli storage nmp device set --device <naa_id> --psp VMW_PSP_RR
esxcli storage nmp psp roundrobin deviceconfig set --device <naa_id> --iops 1 --type iops
MTU set to 4200 on switch and nics (to do 4096 rdma trafffic + overhead)
On esxi:
Code:
esxcli system module parameters set -m nmlx4_core -p "enable_rocev2=0"
On ubuntu 18.04 target:
install OFED driver (wasnt able to get it working with inbox one)
Code:
cat /etc/modprobe.d/mlx4_core.conf
options mlx4_core roce_mode=1
Maybe the kernelspace iser targets does not work cause of virtualization.

Will update when final storage hardware arrives, this test was done on esxi6.7 on dual E5620+144GB with ubuntu18.04 target VM with passthroughed dual CX3Pro.
 

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
iser should be compatible between lio or scst target on ubuntu and vsphere initiator.

I recall you may have said you are using multiple target addresses (or ports) for multipathing, so that shouldn't be the cause.

What I do is watch the iser related messages such as:
tail -f /var/log/*.log|egrep 'iscsi|iser\>|vmhba|rdma|roce|naa\.'

Because the 'fails after a few seconds' symptom can also be caused by compatibility issues with the backing store, or iscsi target parameters. zvols are relatively trouble free. mdadm arrays can have issues. This will at least distinguish between failure to login, vs fail to attach, vs no data.
 

efschu2

Member
Feb 14, 2019
68
12
8
On LIO target
Code:
2019-05-09T07:54:20Z iscsid: SessionResolve for 10.10.101.1 (via vmk1) started)
2019-05-09T07:54:20Z iscsid: SessionResolve for 10.10.103.1 (via vmk3) started)
2019-05-09T07:54:20.475Z cpu0:2098056)iser: iser_EPConnect: src ipv4 10.10.101.2
2019-05-09T07:54:20.475Z cpu0:2098056)iser: iser_EPConnect: dst ipv4 10.10.101.1
2019-05-09T07:54:20.476Z cpu2:2097653)iser: iser_RDMACMEventHandler: iser conn 0x430aff6005c0 cm event 0
2019-05-09T07:54:20.476Z cpu0:2098056)iser: iser_EPConnect: src ipv4 10.10.103.2
2019-05-09T07:54:20.476Z cpu0:2098056)iser: iser_EPConnect: dst ipv4 10.10.103.1
2019-05-09T07:54:20.476Z cpu10:2097683)iser: iser_RDMACMEventHandler: iser conn 0x430aff6005c0 cm event 2
2019-05-09T07:54:20.477Z cpu2:2097653)iser: iser_RDMACMEventHandler: iser conn 0x430aff601760 cm event 0
2019-05-09T07:54:20.477Z cpu10:2097683)iser: iser_ComplSlabInit: min 0 max 41048
2019-05-09T07:54:20.477Z cpu10:2097683)iser: iser_RouteResolvedHandler: iser conn: 0x430aff6005c0 uplink:  vmnic2
2019-05-09T07:54:20.477Z cpu0:2102212)iser: iser_DisconnectWorld: 0x430aff6005c0 iser conn disconnect world init
2019-05-09T07:54:20.478Z cpu10:2097683)iser: iser_RDMACMEventHandler: iser conn 0x430aff601760 cm event 2
2019-05-09T07:54:20.479Z cpu10:2097683)iser: iser_ComplSlabInit: min 0 max 41048
2019-05-09T07:54:20.479Z cpu10:2097683)iser: iser_RouteResolvedHandler: iser conn: 0x430aff601760 uplink:  vmnic3
2019-05-09T07:54:20.479Z cpu12:2102213)iser: iser_DisconnectWorld: 0x430aff601760 iser conn disconnect world init
2019-05-09T07:54:20.481Z cpu11:2097663)iser: iser_RDMACMEventHandler: iser conn 0x430aff6005c0 cm event 9
2019-05-09T07:54:20.482Z cpu11:2097663)iser: iser_RDMACMEventHandler: iser conn 0x430aff601760 cm event 9
2019-05-09T07:54:20.729Z cpu0:2098056)iser: iser_EPPoll: Polling ep/iser_conn 0x430aff6005c0 transport 0x430aff5a56f0 timeoutMS 1
2019-05-09T07:54:20.732Z cpu0:2098056)iser: iser_TransportBindConnection: session 0x430affd73ea0 iscsi_conn 0x430aff994f10 iser_conn 0x430aff6005c0
2019-05-09T07:54:20.738Z cpu0:2098056)iser: iser_EPPoll: Polling ep/iser_conn 0x430aff601760 transport 0x430aff5a5130 timeoutMS 1
2019-05-09T07:54:20.740Z cpu0:2098056)iser: iser_TransportBindConnection: session 0x430aff84eea0 iscsi_conn 0x430affabf7c0 iser_conn 0x430aff601760
-->          value = "naa.60014051faf320d8c6846f3974815575"
-->          value = "vmhba68:C0:T0:L0"
2019-05-09T07:54:21.002Z info hostd[2099474] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 404 : Connectivity to storage device naa.60014051faf320d8c6846f3974815575 (Datastores: Unknown) restored. Path vmhba68:C0:T0:L0 is active again.
2019-05-09T07:54:20Z iscsid: connection 1:0 (iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704 if=iser-vmnic2@vmk1 addr=10.10.101.1:3260 (TPGT:1 ISID:0x2)  (T0 C0)) has recovered (2 attempts)
2019-05-09T07:54:21Z iscsid: connection 9:0 (iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704 if=iser-vmnic3@vmk3 addr=10.10.103.1:3260 (TPGT:1 ISID:0x1)  (T2 C0)) has recovered (2 attempts)
2019-05-09T07:54:20.999Z cpu0:2098056)iser: iser_TransportStartConnection: session 0x430affd73ea0 conn 0x430aff994f10
2019-05-09T07:54:21.000Z cpu0:2098056)iser: iser_TransportStartConnection: session 0x430aff84eea0 conn 0x430affabf7c0
2019-05-09T07:54:21.000Z: [scsiCorrelator] 5767096644us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba68:C0:T0:L0 changed state from dead
2019-05-09T07:54:21.001Z: [scsiCorrelator] 5767097650us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.60014051faf320d8c6846f3974815575 (Datastores: Unknown) restored. Path vmhba68:C0:T0:L0 is active again.
-->          value = "naa.60014051faf320d8c6846f3974815575"
-->          value = "vmhba68:C0:T0:L0"
2019-05-09T07:54:21.953Z info hostd[2100711] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 405 : Lost path redundancy to storage device naa.60014051faf320d8c6846f3974815575. Path vmhba68:C0:T0:L0 is down. Affected datastores: Unknown.
2019-05-09T07:54:22.002Z info hostd[2098865] [Originator@6876 sub=Hostsvc.StorageSystem] StorageEventUpdateInt: PathMsg: [N11HostdCommon25VmkernelUpdateStoragePathE:0x000000d373231750] timestamp=5767096844 pathname=vmhba68:C0:T0:L0 uid=naa.60014051faf320d8c6846f3974815575 isValid=true
2019-05-09T07:54:22.002Z info hostd[2099476] [Originator@6876 sub=Hostsvc.DatastoreSystem] VmfsVobUpdate: Processing Vob message [N11HostdCommon6VobMsgE:0x000000d373398af0] timestamp=5767098266
2019-05-09T07:54:22.002Z info hostd[2099476] [Originator@6876 sub=Hostsvc.DatastoreSystem] VmfsVobUpdate: Processing Vob message [N11HostdCommon6VobMsgE:0x000000d372878bc0] timestamp=5768049905
2019-05-09T07:54:22.002Z warning hostd[2098865] [Originator@6876 sub=Hostsvc] UpdateStoragePathInformation: Hostctl Path naa.60014051faf320d8c6846f3974815575 not found. Refreshing storage cache
2019-05-09T07:54:22.024Z warning hostd[2098865] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:5cd032e2-cdb7-e172-f723-0019999d9330-2dfb60a8: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:22.035Z warning hostd[2098865] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:66: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:22.038Z warning hostd[2098865] [Originator@6876 sub=Libs] GetStaticDiscoveryTargetList: Discovery Method unknown on vmhba66 for target iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704
2019-05-09T07:54:22.038Z warning hostd[2098865] [Originator@6876 sub=Default] Unrecognized discovery method 4 for target iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704
2019-05-09T07:54:22.050Z warning hostd[2098865] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:67: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:22.059Z warning hostd[2098865] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:68: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:22.075Z warning hostd[2098865] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:69: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
-->          value = "naa.60014051faf320d8c6846f3974815575"
-->          value = "vmhba66:C0:T2:L0"
2019-05-09T07:54:22.255Z info hostd[2098864] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 406 : Lost connectivity to storage device naa.60014051faf320d8c6846f3974815575. Path vmhba66:C0:T2:L0 is down. Affected datastores: Unknown.
2019-05-09T07:54:22Z iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
2019-05-09T07:54:22Z iscsid: Kernel reported iSCSI connection 9:0 error (1011) state (3)
2019-05-09T07:54:21.906Z cpu8:2098002)NMP: nmpCompleteRetryForPath:327: Retry world recovered device "naa.60014051faf320d8c6846f3974815575"
2019-05-09T07:54:21.951Z cpu9:2097861)iser: iser_HandleWorkCompletion: iser_conn: 0x430aff6005c0 Completion with error: status 4 ID 0x430aff7280a0 syndrom 82
2019-05-09T07:54:21.951Z cpu0:2097619)iser: iser_TransportStopConnection: stopping iser_conn 0x430aff6005c0, iscsi_conn 0x430aff994f10, mode 255
2019-05-09T07:54:21.952Z cpu0:2097619)iser: iser_CleanupTask: QP in err state, PostSend cannot proceed
2019-05-09T07:54:21.952Z cpu0:2097619)iser: iscsi_free_task: Adding dirty task 0x430aff728000 to list...
2019-05-09T07:54:21.952Z cpu9:2097861)iser: iser_HandleWorkCompletion: Completion with error: status 5 ID 0xffffffffffffffff syndrom 249
2019-05-09T07:54:21.952Z cpu8:2098002)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60014051faf320d8c6846f3974815575" state in doubt; requested fast path state update...
2019-05-09T07:54:21.952Z cpu8:2098002)ScsiDeviceIO: 2994: Cmd(0x459a40cb5440) 0x2a, CmdSN 0xffffe001c3ec9220 from world 2102023 to dev "naa.60014051faf320d8c6846f3974815575" failed H:0x8 D:0x0 P:0x0 Invalid sense data: 0x42 0x80 0x41.
2019-05-09T07:54:21.952Z cpu8:2097663)iser: iser_RDMACMEventHandler: iser conn 0x430aff6005c0 cm event 10
2019-05-09T07:54:22.006Z cpu1:2098056)iser: iser_TransportStopConnection: stopping iser_conn 0x0, iscsi_conn 0x430aff994f10, mode 3
2019-05-09T07:54:22.006Z cpu1:2098056)iser: iser_EPDisconnect: transport 0x430aff5a56f0 ep 0x430aff6005c0
2019-05-09T07:54:22.006Z cpu1:2098056)iser: iser_EPDisconnect: Cleaning up dirty iscsi tasks...
2019-05-09T07:54:22.006Z cpu1:2098056)iser: iser_EPDisconnect: Cleaning up ep: 0x430aff6005c0
2019-05-09T07:54:22.006Z cpu1:2098056)iser: iser_FreeIbConnRes: freeing iser_conn 0x430aff6005c0 ib resources
2019-05-09T07:54:22.017Z cpu0:2102212)iser: iser_DisconnectWorld: 0x430aff6005c0 iser conn disconnect world wake up: World is marked for death
2019-05-09T07:54:22.017Z cpu0:2102212)iser: iser_DisconnectWorld: 0x430aff6005c0 disconnect world wokeup
2019-05-09T07:54:22.253Z cpu1:2097855)iser: iser_HandleWorkCompletion: iser_conn: 0x430aff601760 Completion with error: status 4 ID 0x430affd6f820 syndrom 82
2019-05-09T07:54:22.253Z cpu5:2097619)iser: iser_TransportStopConnection: stopping iser_conn 0x430aff601760, iscsi_conn 0x430affabf7c0, mode 255
2019-05-09T07:54:22.253Z cpu5:2097619)iser: iser_CleanupTask: QP in err state, PostSend cannot proceed
2019-05-09T07:54:22.253Z cpu5:2097619)iser: iscsi_free_task: Adding dirty task 0x430affd6f780 to list...
2019-05-09T07:54:22.253Z cpu1:2097855)iser: iser_HandleWorkCompletion: Completion with error: status 5 ID 0xffffffffffffffff syndrom 249
2019-05-09T07:54:22.253Z cpu7:2098000)ScsiDeviceIO: 2994: Cmd(0x459a40d2d080) 0x2a, CmdSN 0xffffe001c3ec9220 from world 2102023 to dev "naa.60014051faf320d8c6846f3974815575" failed H:0x8 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2019-05-09T07:54:22.254Z cpu11:2097663)iser: iser_RDMACMEventHandler: iser conn 0x430aff601760 cm event 10
2019-05-09T07:54:22.490Z cpu6:2102023)WARNING: NMP: nmp_IssueCommandToDevice:4990: I/O could not be issued to device "naa.60014051faf320d8c6846f3974815575" due to Not found
2019-05-09T07:54:22.490Z cpu6:2102023)WARNING: NMP: nmp_DeviceRetryCommand:133: Device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
2019-05-09T07:54:22.490Z cpu6:2102023)WARNING: NMP: nmp_DeviceStartLoop:728: NMP Device "naa.60014051faf320d8c6846f3974815575" is blocked. Not starting I/O from device.
2019-05-09T07:54:22.506Z cpu1:2098056)iser: iser_TransportStopConnection: stopping iser_conn 0x0, iscsi_conn 0x430affabf7c0, mode 3
2019-05-09T07:54:22.506Z cpu1:2098056)iser: iser_EPDisconnect: transport 0x430aff5a5130 ep 0x430aff601760
2019-05-09T07:54:22.506Z cpu1:2098056)iser: iser_EPDisconnect: Cleaning up dirty iscsi tasks...
2019-05-09T07:54:22.506Z cpu1:2098056)iser: iser_EPDisconnect: Cleaning up ep: 0x430aff601760
2019-05-09T07:54:22.506Z cpu1:2098056)iser: iser_FreeIbConnRes: freeing iser_conn 0x430aff601760 ib resources
2019-05-09T07:54:21.952Z cpu8:2098002)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60014051faf320d8c6846f3974815575" state in doubt; requested fast path state update...
2019-05-09T07:54:22.490Z cpu6:2102023)WARNING: NMP: nmp_IssueCommandToDevice:4990: I/O could not be issued to device "naa.60014051faf320d8c6846f3974815575" due to Not found
2019-05-09T07:54:22.490Z cpu6:2102023)WARNING: NMP: nmp_DeviceRetryCommand:133: Device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
2019-05-09T07:54:22.490Z cpu6:2102023)WARNING: NMP: nmp_DeviceStartLoop:728: NMP Device "naa.60014051faf320d8c6846f3974815575" is blocked. Not starting I/O from device.
2019-05-09T07:54:21.905Z: [scsiCorrelator] 5768001699us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba66:C0:T2:L0 changed state from dead
2019-05-09T07:54:21.952Z: [scsiCorrelator] 5768048287us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba68:C0:T0:L0 changed state from on
2019-05-09T07:54:21.953Z: [scsiCorrelator] 5768049265us: [esx.problem.storage.redundancy.lost] Lost path redundancy to storage device naa.60014051faf320d8c6846f3974815575. Path vmhba68:C0:T0:L0 is down. Affected datastores: Unknown.
2019-05-09T07:54:22.254Z: [scsiCorrelator] 5768350086us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba66:C0:T2:L0 changed state from on
2019-05-09T07:54:22.255Z: [scsiCorrelator] 5768351065us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.60014051faf320d8c6846f3974815575. Path vmhba66:C0:T2:L0 is down. Affected datastores: Unknown.
2019-05-09T07:54:23.257Z info hostd[2098936] [Originator@6876 sub=Hostsvc.StorageSystem] StorageEventUpdateInt: PathMsg: [N11HostdCommon25VmkernelUpdateStoragePathE:0x000000d374002900] timestamp=5768350310 pathname=vmhba66:C0:T2:L0 uid=naa.60014051faf320d8c6846f3974815575 isValid=true
2019-05-09T07:54:23.257Z info hostd[2099474] [Originator@6876 sub=Hostsvc.DatastoreSystem] VmfsVobUpdate: Processing Vob message [N11HostdCommon6VobMsgE:0x000000d372f62320] timestamp=5768351675
2019-05-09T07:54:23.257Z warning hostd[2098936] [Originator@6876 sub=Hostsvc] UpdateStoragePathInformation: Hostctl Path naa.60014051faf320d8c6846f3974815575 not found. Refreshing storage cache
2019-05-09T07:54:23.268Z warning hostd[2098936] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:5cd032e2-cdb7-e172-f723-0019999d9330-2dfb60a8: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:23.280Z warning hostd[2098936] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:66: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:23.282Z warning hostd[2098936] [Originator@6876 sub=Libs] GetStaticDiscoveryTargetList: Discovery Method unknown on vmhba66 for target iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704
2019-05-09T07:54:23.282Z warning hostd[2098936] [Originator@6876 sub=Default] Unrecognized discovery method 4 for target iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704
2019-05-09T07:54:23.294Z warning hostd[2098936] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:67: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:23.303Z warning hostd[2098936] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:68: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:23.318Z warning hostd[2098936] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:69: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:22.774Z cpu12:2102213)iser: iser_DisconnectWorld: 0x430aff601760 iser conn disconnect world wake up: World is marked for death
2019-05-09T07:54:22.774Z cpu12:2102213)iser: iser_DisconnectWorld: 0x430aff601760 disconnect world wokeup
2019-05-09T07:54:22.904Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - issuing command 0x459a40dbffc0
2019-05-09T07:54:22.904Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:715: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - failed to issue command due to Not found (APD), try again...
2019-05-09T07:54:22.904Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:765: Logical device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update...
2019-05-09T07:54:22.904Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - issuing command 0x459a40dbffc0
2019-05-09T07:54:22.904Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:715: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - failed to issue command due to Not found (APD), try again...
2019-05-09T07:54:22.904Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:765: Logical device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update...
2019-05-09T07:54:23.904Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - issuing command 0x459a40dbffc0
2019-05-09T07:54:23.904Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:715: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - failed to issue command due to Not found (APD), try again...
2019-05-09T07:54:23.905Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:765: Logical device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update...
2019-05-09T07:54:23.904Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - issuing command 0x459a40dbffc0
2019-05-09T07:54:23.904Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:715: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - failed to issue command due to Not found (APD), try again...
2019-05-09T07:54:23.905Z cpu0:2097879)WARNING: NMP: nmpDeviceAttemptFailover:765: Logical device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update...
2019-05-09T07:54:25Z iscsid: SessionResolve for 10.10.101.1 (via vmk1) started)
2019-05-09T07:54:25Z iscsid: SessionResolve for 10.10.103.1 (via vmk3) started)
2019-05-09T07:54:25.589Z cpu2:2098056)iser: iser_EPConnect: src ipv4 10.10.101.2
2019-05-09T07:54:25.589Z cpu2:2098056)iser: iser_EPConnect: dst ipv4 10.10.101.1
2019-05-09T07:54:25.590Z cpu2:2097653)iser: iser_RDMACMEventHandler: iser conn 0x430aff6005c0 cm event 0
2019-05-09T07:54:25.590Z cpu10:2097683)iser: iser_RDMACMEventHandler: iser conn 0x430aff6005c0 cm event 2
2019-05-09T07:54:25.590Z cpu1:2098056)iser: iser_EPConnect: src ipv4 10.10.103.2
2019-05-09T07:54:25.590Z cpu1:2098056)iser: iser_EPConnect: dst ipv4 10.10.103.1
2019-05-09T07:54:25.591Z cpu2:2097653)iser: iser_RDMACMEventHandler: iser conn 0x430aff601760 cm event 0
2019-05-09T07:54:25.591Z cpu10:2097683)iser: iser_ComplSlabInit: min 0 max 41048
2019-05-09T07:54:25.591Z cpu10:2097683)iser: iser_RouteResolvedHandler: iser conn: 0x430aff6005c0 uplink:  vmnic2
2019-05-09T07:54:25.592Z cpu3:2102214)iser: iser_DisconnectWorld: 0x430aff6005c0 iser conn disconnect world init
2019-05-09T07:54:25.592Z cpu10:2097683)iser: iser_RDMACMEventHandler: iser conn 0x430aff601760 cm event 2
2019-05-09T07:54:25.593Z cpu10:2097683)iser: iser_ComplSlabInit: min 0 max 41048
2019-05-09T07:54:25.593Z cpu10:2097683)iser: iser_RouteResolvedHandler: iser conn: 0x430aff601760 uplink:  vmnic3
2019-05-09T07:54:25.593Z cpu6:2102215)iser: iser_DisconnectWorld: 0x430aff601760 iser conn disconnect world init
2019-05-09T07:54:25.595Z cpu11:2097663)iser: iser_RDMACMEventHandler: iser conn 0x430aff6005c0 cm event 9
2019-05-09T07:54:25.596Z cpu11:2097663)iser: iser_RDMACMEventHandler: iser conn 0x430aff601760 cm event 9
-->          value = "naa.60014051faf320d8c6846f3974815575"
-->          value = "vmhba68:C0:T0:L0"
2019-05-09T07:54:26.117Z info hostd[2099026] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 407 : Connectivity to storage device naa.60014051faf320d8c6846f3974815575 (Datastores: Unknown) restored. Path vmhba68:C0:T0:L0 is active again.
2019-05-09T07:54:26Z iscsid: connection 1:0 (iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704 if=iser-vmnic2@vmk1 addr=10.10.101.1:3260 (TPGT:1 ISID:0x2)  (T0 C0)) has recovered (2 attempts)
2019-05-09T07:54:26Z iscsid: connection 9:0 (iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704 if=iser-vmnic3@vmk3 addr=10.10.103.1:3260 (TPGT:1 ISID:0x1)  (T2 C0)) has recovered (2 attempts)
2019-05-09T07:54:25.843Z cpu1:2098056)iser: iser_EPPoll: Polling ep/iser_conn 0x430aff6005c0 transport 0x430aff5a56f0 timeoutMS 1
2019-05-09T07:54:25.846Z cpu1:2098056)iser: iser_TransportBindConnection: session 0x430affd73ea0 iscsi_conn 0x430aff994f10 iser_conn 0x430aff6005c0
2019-05-09T07:54:25.852Z cpu1:2098056)iser: iser_EPPoll: Polling ep/iser_conn 0x430aff601760 transport 0x430aff5a5130 timeoutMS 1
2019-05-09T07:54:25.855Z cpu1:2098056)iser: iser_TransportBindConnection: session 0x430aff84eea0 iscsi_conn 0x430affabf7c0 iser_conn 0x430aff601760
2019-05-09T07:54:26.114Z cpu1:2098056)iser: iser_TransportStartConnection: session 0x430affd73ea0 conn 0x430aff994f10
2019-05-09T07:54:26.115Z cpu0:2098056)iser: iser_TransportStartConnection: session 0x430aff84eea0 conn 0x430affabf7c0
2019-05-09T07:54:26.115Z: [scsiCorrelator] 5772211172us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba68:C0:T0:L0 changed state from dead
2019-05-09T07:54:26.116Z: [scsiCorrelator] 5772212277us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.60014051faf320d8c6846f3974815575 (Datastores: Unknown) restored. Path vmhba68:C0:T0:L0 is active again.
-->          value = "naa.60014051faf320d8c6846f3974815575"
-->          value = "vmhba68:C0:T0:L0"
2019-05-09T07:54:26.952Z info hostd[2099145] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 408 : Lost path redundancy to storage device naa.60014051faf320d8c6846f3974815575. Path vmhba68:C0:T0:L0 is down. Affected datastores: Unknown.
2019-05-09T07:54:27.116Z info hostd[2099472] [Originator@6876 sub=Hostsvc.StorageSystem] StorageEventUpdateInt: PathMsg: [N11HostdCommon25VmkernelUpdateStoragePathE:0x000000d372d1d8e0] timestamp=5772211386 pathname=vmhba68:C0:T0:L0 uid=naa.60014051faf320d8c6846f3974815575 isValid=true
2019-05-09T07:54:27.116Z warning hostd[2099472] [Originator@6876 sub=Hostsvc] UpdateStoragePathInformation: Hostctl Path naa.60014051faf320d8c6846f3974815575 not found. Refreshing storage cache
2019-05-09T07:54:27.119Z info hostd[2099025] [Originator@6876 sub=Hostsvc.DatastoreSystem] VmfsVobUpdate: Processing Vob message [N11HostdCommon6VobMsgE:0x000000d3682a5370] timestamp=5772213006
2019-05-09T07:54:27.119Z info hostd[2099025] [Originator@6876 sub=Hostsvc.DatastoreSystem] VmfsVobUpdate: Processing Vob message [N11HostdCommon6VobMsgE:0x000000d3739465e0] timestamp=5773048178
2019-05-09T07:54:27.190Z warning hostd[2099472] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:5cd032e2-cdb7-e172-f723-0019999d9330-2dfb60a8: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:27.203Z warning hostd[2099472] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:66: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:27.206Z warning hostd[2099472] [Originator@6876 sub=Libs] GetStaticDiscoveryTargetList: Discovery Method unknown on vmhba66 for target iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704
2019-05-09T07:54:27.206Z warning hostd[2099472] [Originator@6876 sub=Default] Unrecognized discovery method 4 for target iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704
2019-05-09T07:54:27.218Z warning hostd[2099472] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:67: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:27.229Z warning hostd[2099472] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:68: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
-->          value = "naa.60014051faf320d8c6846f3974815575"
-->          value = "vmhba66:C0:T2:L0"
2019-05-09T07:54:27.244Z info hostd[2100710] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 409 : Lost connectivity to storage device naa.60014051faf320d8c6846f3974815575. Path vmhba66:C0:T2:L0 is down. Affected datastores: Unknown.
2019-05-09T07:54:27.246Z warning hostd[2099472] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:69: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:27Z iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
2019-05-09T07:54:27Z iscsid: Kernel reported iSCSI connection 9:0 error (1011) state (3)
2019-05-09T07:54:26.906Z cpu8:2098002)NMP: nmpCompleteRetryForPath:327: Retry world recovered device "naa.60014051faf320d8c6846f3974815575"
2019-05-09T07:54:26.949Z cpu12:2097861)iser: iser_HandleWorkCompletion: iser_conn: 0x430aff6005c0 Completion with error: status 4 ID 0x430aff72a560 syndrom 82
2019-05-09T07:54:26.949Z cpu0:2097619)iser: iser_TransportStopConnection: stopping iser_conn 0x430aff6005c0, iscsi_conn 0x430aff994f10, mode 255
2019-05-09T07:54:26.950Z cpu0:2097619)iser: iser_CleanupTask: QP in err state, PostSend cannot proceed
2019-05-09T07:54:26.950Z cpu0:2097619)iser: iscsi_free_task: Adding dirty task 0x430aff72a4c0 to list...
2019-05-09T07:54:26.950Z cpu12:2097861)iser: iser_HandleWorkCompletion: Completion with error: status 5 ID 0xffffffffffffffff syndrom 249
2019-05-09T07:54:26.950Z cpu8:2098002)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60014051faf320d8c6846f3974815575" state in doubt; requested fast path state update...
2019-05-09T07:54:26.950Z cpu8:2098002)ScsiDeviceIO: 2994: Cmd(0x459a6a649680) 0x2a, CmdSN 0xffffe001c3ec9220 from world 2102023 to dev "naa.60014051faf320d8c6846f3974815575" failed H:0x8 D:0x0 P:0x0 Invalid sense data: 0x0 0x1a 0x45.
2019-05-09T07:54:26.950Z cpu11:2097663)iser: iser_RDMACMEventHandler: iser conn 0x430aff6005c0 cm event 10
2019-05-09T07:54:27.120Z cpu1:2098056)iser: iser_TransportStopConnection: stopping iser_conn 0x0, iscsi_conn 0x430aff994f10, mode 3
2019-05-09T07:54:27.120Z cpu1:2098056)iser: iser_EPDisconnect: transport 0x430aff5a56f0 ep 0x430aff6005c0
2019-05-09T07:54:27.120Z cpu1:2098056)iser: iser_EPDisconnect: Cleaning up dirty iscsi tasks...
2019-05-09T07:54:27.120Z cpu1:2098056)iser: iser_EPDisconnect: Cleaning up ep: 0x430aff6005c0
2019-05-09T07:54:27.120Z cpu1:2098056)iser: iser_FreeIbConnRes: freeing iser_conn 0x430aff6005c0 ib resources
2019-05-09T07:54:27.183Z cpu3:2102214)iser: iser_DisconnectWorld: 0x430aff6005c0 iser conn disconnect world wake up: World is marked for death
2019-05-09T07:54:27.183Z cpu3:2102214)iser: iser_DisconnectWorld: 0x430aff6005c0 disconnect world wokeup
2019-05-09T07:54:27.241Z cpu0:2097855)iser: iser_HandleWorkCompletion: iser_conn: 0x430aff601760 Completion with error: status 4 ID 0x430affd71ce0 syndrom 82
2019-05-09T07:54:27.241Z cpu0:2097619)iser: iser_TransportStopConnection: stopping iser_conn 0x430aff601760, iscsi_conn 0x430affabf7c0, mode 255
2019-05-09T07:54:27.242Z cpu0:2097619)iser: iser_CleanupTask: QP in err state, PostSend cannot proceed
2019-05-09T07:54:27.242Z cpu0:2097619)iser: iscsi_free_task: Adding dirty task 0x430affd71c40 to list...
2019-05-09T07:54:27.242Z cpu0:2097855)iser: iser_HandleWorkCompletion: Completion with error: status 5 ID 0xffffffffffffffff syndrom 249
2019-05-09T07:54:27.242Z cpu7:2098000)ScsiDeviceIO: 2994: Cmd(0x459a6a66a380) 0x2a, CmdSN 0xffffe001c3ec9220 from world 2102023 to dev "naa.60014051faf320d8c6846f3974815575" failed H:0x8 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2019-05-09T07:54:27.242Z cpu11:2097663)iser: iser_RDMACMEventHandler: iser conn 0x430aff601760 cm event 10
2019-05-09T07:54:27.491Z cpu7:2102023)WARNING: NMP: nmp_IssueCommandToDevice:4990: I/O could not be issued to device "naa.60014051faf320d8c6846f3974815575" due to Not found
2019-05-09T07:54:27.491Z cpu7:2102023)WARNING: NMP: nmp_DeviceRetryCommand:133: Device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
2019-05-09T07:54:27.491Z cpu7:2102023)WARNING: NMP: nmp_DeviceStartLoop:728: NMP Device "naa.60014051faf320d8c6846f3974815575" is blocked. Not starting I/O from device.
2019-05-09T07:54:27.500Z cpu1:2098056)iser: iser_TransportStopConnection: stopping iser_conn 0x0, iscsi_conn 0x430affabf7c0, mode 3
2019-05-09T07:54:27.501Z cpu1:2098056)iser: iser_EPDisconnect: transport 0x430aff5a5130 ep 0x430aff601760
2019-05-09T07:54:27.501Z cpu1:2098056)iser: iser_EPDisconnect: Cleaning up dirty iscsi tasks...
2019-05-09T07:54:27.501Z cpu1:2098056)iser: iser_EPDisconnect: Cleaning up ep: 0x430aff601760
2019-05-09T07:54:27.501Z cpu1:2098056)iser: iser_FreeIbConnRes: freeing iser_conn 0x430aff601760 ib resources
2019-05-09T07:54:27.514Z cpu6:2102215)iser: iser_DisconnectWorld: 0x430aff601760 iser conn disconnect world wake up: World is marked for death
2019-05-09T07:54:27.514Z cpu6:2102215)iser: iser_DisconnectWorld: 0x430aff601760 disconnect world wokeup
2019-05-09T07:54:26.950Z cpu8:2098002)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60014051faf320d8c6846f3974815575" state in doubt; requested fast path state update...
2019-05-09T07:54:27.491Z cpu7:2102023)WARNING: NMP: nmp_IssueCommandToDevice:4990: I/O could not be issued to device "naa.60014051faf320d8c6846f3974815575" due to Not found
2019-05-09T07:54:27.491Z cpu7:2102023)WARNING: NMP: nmp_DeviceRetryCommand:133: Device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
2019-05-09T07:54:27.491Z cpu7:2102023)WARNING: NMP: nmp_DeviceStartLoop:728: NMP Device "naa.60014051faf320d8c6846f3974815575" is blocked. Not starting I/O from device.
2019-05-09T07:54:26.905Z: [scsiCorrelator] 5773001340us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba66:C0:T2:L0 changed state from dead
2019-05-09T07:54:26.950Z: [scsiCorrelator] 5773046592us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba68:C0:T0:L0 changed state from on
2019-05-09T07:54:26.951Z: [scsiCorrelator] 5773047409us: [esx.problem.storage.redundancy.lost] Lost path redundancy to storage device naa.60014051faf320d8c6846f3974815575. Path vmhba68:C0:T0:L0 is down. Affected datastores: Unknown.
2019-05-09T07:54:27.242Z: [scsiCorrelator] 5773338726us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba66:C0:T2:L0 changed state from on
2019-05-09T07:54:27.243Z: [scsiCorrelator] 5773339591us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.60014051faf320d8c6846f3974815575. Path vmhba66:C0:T2:L0 is down. Affected datastores: Unknown.
2019-05-09T07:54:28.245Z info hostd[2099475] [Originator@6876 sub=Hostsvc.StorageSystem] StorageEventUpdateInt: PathMsg: [N11HostdCommon25VmkernelUpdateStoragePathE:0x000000d3727a20f0] timestamp=5773338882 pathname=vmhba66:C0:T2:L0 uid=naa.60014051faf320d8c6846f3974815575 isValid=true
2019-05-09T07:54:28.245Z info hostd[2098936] [Originator@6876 sub=Hostsvc.DatastoreSystem] VmfsVobUpdate: Processing Vob message [N11HostdCommon6VobMsgE:0x000000d37279a550] timestamp=5773340136
2019-05-09T07:54:28.245Z warning hostd[2099475] [Originator@6876 sub=Hostsvc] UpdateStoragePathInformation: Hostctl Path naa.60014051faf320d8c6846f3974815575 not found. Refreshing storage cache
2019-05-09T07:54:28.256Z warning hostd[2099475] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:5cd032e2-cdb7-e172-f723-0019999d9330-2dfb60a8: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:28.268Z warning hostd[2099475] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:66: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:28.270Z warning hostd[2099475] [Originator@6876 sub=Libs] GetStaticDiscoveryTargetList: Discovery Method unknown on vmhba66 for target iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704
2019-05-09T07:54:28.271Z warning hostd[2099475] [Originator@6876 sub=Default] Unrecognized discovery method 4 for target iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704
2019-05-09T07:54:28.282Z warning hostd[2099475] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:67: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:28.292Z warning hostd[2099475] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:68: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:28.312Z warning hostd[2099475] [Originator@6876 sub=Default] IP Capability discovery failure on iSCSI HBA: iqn.1998-01.com.vmware:localhost.something.local:825039958:69: Check vmkernel tcp/ip settings for network properties for software and dependent hardware iscsi adapters. It's not supported from this API.
2019-05-09T07:54:27.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - issuing command 0x459a6a66a380
2019-05-09T07:54:27.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:715: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - failed to issue command due to Not found (APD), try again...
2019-05-09T07:54:27.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:765: Logical device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update...
2019-05-09T07:54:27.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - issuing command 0x459a6a66a380
2019-05-09T07:54:27.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:715: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - failed to issue command due to Not found (APD), try again...
2019-05-09T07:54:27.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:765: Logical device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update...
2019-05-09T07:54:28.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - issuing command 0x459a6a66a380
2019-05-09T07:54:28.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:715: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - failed to issue command due to Not found (APD), try again...
2019-05-09T07:54:28.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:765: Logical device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update...
2019-05-09T07:54:28.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - issuing command 0x459a6a66a380
2019-05-09T07:54:28.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:715: Retry world failover device "naa.60014051faf320d8c6846f3974815575" - failed to issue command due to Not found (APD), try again...
2019-05-09T07:54:28.904Z cpu1:2097879)WARNING: NMP: nmpDeviceAttemptFailover:765: Logical device "naa.60014051faf320d8c6846f3974815575": awaiting fast path state update...
target configuration
Code:
targetcli
targetcli shell version 2.1.fb43
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/> ls
o- / ................................................................................................................... [...]
  o- backstores ........................................................................................................ [...]
  | o- block ............................................................................................ [Storage Objects: 1]
  | | o- zfs ...................................................... [/dev/zvol/iser/isertarget (13.0GiB) write-thru activated]
  | o- fileio ........................................................................................... [Storage Objects: 0]
  | o- pscsi ............................................................................................ [Storage Objects: 0]
  | o- ramdisk .......................................................................................... [Storage Objects: 0]
  o- iscsi ...................................................................................................... [Targets: 1]
  | o- iqn.2003-01.org.linux-iscsi.iscsi-test.x8664:sn.a3569180d704 ................................................ [TPGs: 1]
  |   o- tpg1 ......................................................................................... [no-gen-acls, no-auth]
  |     o- acls .................................................................................................... [ACLs: 4]
  |     | o- iqn.1998-01.com.vmware:localhost.something.local:825039958:66 ..................................... [Mapped LUNs: 1]
  |     | | o- mapped_lun0 ............................................................................. [lun0 block/zfs (rw)]
  |     | o- iqn.1998-01.com.vmware:localhost.something.local:825039958:67 ..................................... [Mapped LUNs: 1]
  |     | | o- mapped_lun0 ............................................................................. [lun0 block/zfs (rw)]
  |     | o- iqn.1998-01.com.vmware:localhost.something.local:825039958:68 ..................................... [Mapped LUNs: 1]
  |     | | o- mapped_lun0 ............................................................................. [lun0 block/zfs (rw)]
  |     | o- iqn.1998-01.com.vmware:localhost.something.local:825039958:69 ..................................... [Mapped LUNs: 1]
  |     |   o- mapped_lun0 ............................................................................. [lun0 block/zfs (rw)]
  |     o- luns .................................................................................................... [LUNs: 1]
  |     | o- lun0 .................................................................... [block/zfs (/dev/zvol/iser/isertarget)]
  |     o- portals .............................................................................................. [Portals: 2]
  |       o- 10.10.101.1:3260 ......................................................................................... [iser]
  |       o- 10.10.103.1:3260 ......................................................................................... [iser]
  o- loopback ................................................................................................... [Targets: 0]
  o- srpt ....................................................................................................... [Targets: 0]
  o- vhost ...................................................................................................... [Targets: 0]
target block attributes
Code:
/backstores/block/zfs> get attribute
ATTRIBUTE CONFIG GROUP
======================
alua_support=1 [ro]
-------------------


block_size=512
--------------
Block size of the underlying device.

emulate_3pc=1
-------------
If set to 1, enable Third Party Copy.

emulate_caw=1
-------------
If set to 1, enable Compare and Write.

emulate_dpo=1
-------------
If set to 1, turn on Disable Page Out.

emulate_fua_read=1
------------------
If set to 1, enable Force Unit Access read.

emulate_fua_write=1
-------------------
If set to 1, enable Force Unit Access write.

emulate_model_alias=1
---------------------
If set to 1, use the backend device name for the model alias.

emulate_rest_reord=0
--------------------
If set to 0, the Queue Algorithm Modifier is Restricted Reordering.

emulate_tas=1
-------------
If set to 1, enable Task Aborted Status.

emulate_tpu=0
-------------
If set to 1, enable Thin Provisioning Unmap.

emulate_tpws=0
--------------
If set to 1, enable Thin Provisioning Write Same.

emulate_ua_intlck_ctrl=0
------------------------
If set to 1, enable Unit Attention Interlock.

emulate_write_cache=0
---------------------
If set to 1, turn on Write Cache Enable.

enforce_pr_isids=1
------------------
If set to 1, enforce persistent reservation ISIDs.

force_pr_aptpl=0
----------------
If set to 1, force SPC-3 PR Activate Persistence across Target Power Loss operation.

hw_block_size=512 [ro]
----------------------
Hardware block size in bytes.

hw_max_sectors=32768 [ro]
-------------------------
Maximum number of sectors the hardware can transfer at once.

hw_pi_prot_type=0 [ro]
----------------------
If non-zero, DIF protection is enabled on the underlying hardware.

hw_queue_depth=128 [ro]
-----------------------
Hardware queue depth.

is_nonrot=1
-----------
If set to 1, the backstore is a non rotational device.

max_unmap_block_desc_count=1
----------------------------
Maximum number of block descriptors for UNMAP.

max_unmap_lba_count=131072
--------------------------
Maximum number of LBA for UNMAP.

max_write_same_len=65535
------------------------
Maximum length for WRITE_SAME.

optimal_sectors=32768
---------------------
Optimal request size in sectors.

pgr_support=1 [ro]
------------------


pi_prot_format=0
----------------
DIF protection format.

pi_prot_type=0
--------------
DIF protection type.

pi_prot_verify=0
----------------


queue_depth=128
---------------
Queue depth.

unmap_granularity=8
-------------------
UNMAP granularity.

unmap_granularity_alignment=0
-----------------------------
UNMAP granularity alignment.

unmap_zeroes_data=0
-------------------
If set to 1, zeroes are read back after an UNMAP.
 
Last edited:

efschu2

Member
Feb 14, 2019
68
12
8
On tgt target

esx logs:
2019-05-09T08:50:02.143Z info hostd-probe[2098514] [Originator@6876 sub=Default] - Pastebin.com

tgt target configuration
Code:
tgtadm --lld iscsi --op show --mode target
Target 1: iqn.2001-04.com.iscsi-test-tgt-3
    System information:
        Driver: iser
        State: ready
    I_T nexus information:
        I_T nexus: 1
            Initiator: iqn.1998-01.com.vmware:localhost.something.local:825039958:66 alias: iser-vmnic3
            Connection: 0
                RDMA IP Address: 10.10.103.2
        I_T nexus: 3
            Initiator: iqn.1998-01.com.vmware:localhost.something.local:825039958:68 alias: iser-vmnic2
            Connection: 0
                RDMA IP Address: 10.10.101.2
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00010000
            SCSI SN: beaf10
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: null
            Backing store path: None
            Backing store flags:
        LUN: 1
            Type: disk
            SCSI ID: IET     00010001
            SCSI SN: beaf11
            Size: 13959 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: rdwr
            Backing store path: /dev/zvol/iser/isertarget
            Backing store flags:
    Account information:
    ACL information:
        ALL
Target 2: iqn.2001-04.com.iscsi-test-tgt-4
    System information:
        Driver: iser
        State: ready
    I_T nexus information:
        I_T nexus: 2
            Initiator: iqn.1998-01.com.vmware:localhost.something.local:825039958:66 alias: iser-vmnic3
            Connection: 0
                RDMA IP Address: 10.10.103.2
        I_T nexus: 4
            Initiator: iqn.1998-01.com.vmware:localhost.something.local:825039958:68 alias: iser-vmnic2
            Connection: 0
                RDMA IP Address: 10.10.101.2
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00020000
            SCSI SN: beaf20
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: null
            Backing store path: None
            Backing store flags:
        LUN: 1
            Type: disk
            SCSI ID: IET     00020001
            SCSI SN: beaf21
            Size: 2199023 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: rdwr
            Backing store path: /dev/zvol/whole/isertarget2
            Backing store flags:
    Account information:
    ACL information:
        ALL
Code:
cat /etc/tgt/conf.d/ubuntu-1804.conf
default-driver iser

<target iqn.2001-04.com.iscsi-test-tgt-3>
backing-store /dev/zvol/iser/isertarget
allow-in-use yes
</target>
<target iqn.2001-04.com.iscsi-test-tgt-4>
backing-store /dev/zvol/whole/isertarget2
allow-in-use yes
</target>

tgtadm --op show --mode target --tid 1
MaxRecvDataSegmentLength=8192
HeaderDigest=None
DataDigest=None
InitialR2T=Yes
MaxOutstandingR2T=1
ImmediateData=No
FirstBurstLength=65536
MaxBurstLength=262144
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
ErrorRecoveryLevel=0
IFMarker=No
OFMarker=No
DefaultTime2Wait=2
DefaultTime2Retain=20
OFMarkInt=Reject
IFMarkInt=Reject
MaxConnections=1
RDMAExtensions=Yes
TargetRecvDataSegmentLength=262144
InitiatorRecvDataSegmentLength=262144
MaxOutstandingUnexpectedPDUs=0
MaxXmitDataSegmentLength=8192
MaxQueueCmd=128
The only difference I see is esxi reported queue depth for targets, on LIO 128qd is discovered (as set), on tgt 113qd is discovered (but on tgt target 128 is set).
 

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
Sorry for the delay.

That's interesting that downgrading OFED improved SCST target support.
Which flavor of linux is that? I might try that when I get a chance.

Looking at your esxi logs, there are some login failures. If a CHAP user/pass are not set, I wouldn't have expected the login failures.

When I do enable CHAP, I find it useful to set esxi to require CHAP, because otherwise it'll make one attempt without a user/pass, and log one failure every time it logs in.

esxcli iscsi adapter auth chap set -A ${ADAPTER} -d uni -l required -N <user> -S <pass>

Esxi generally wants multipath, so on the LIO/SCST target, i set MaxConnections=4. I'm not sure that's critical here. If you have only one connection, esxi will report that the path or device is degraded (has no failover).

I see 113 que depth in LIO for zvol block devices. I'm not sure why what I see is lower than the 128 you see, but I've not seen 128 at all.

There are some 'could not establish a network connection to the discovery address' in the esxi log.
Is your esxi switch set to jumbo frames, as well as, the vmnic that is bound to the iser adapter?
esxcli network vswitch standard set --vswitch-name=${switchname} --cdp-status=both -m 9000
esxcli network ip interface set -i vmnicN --enabled true --mtu 9000

Hope that helps!
 

efschu2

Member
Feb 14, 2019
68
12
8
Mellanox suggests in using MTU size of 4200 for iser, as I'm doing.
This setup is on ubuntu 18.04, tgt and scst is working fine, but esxi is reporting the exact same errors with OFED 4.3 as with 4.5 on LIO. So I skip LIO cus I'm out of ideas (I think there is something going on with roce v1 and lio not working, but no way to debug that).
 

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
Ah, yea, sorry I was confused about whether esxi allowed values between 1500 and 9000. It does after all.

I'll set up an ubuntu 18.04 host with scst or tgt to try to be able to compare results. What do you prefer? SCST or tgt?

Here's how I setup a LIO target for esxi, for comparison:

Code:
tgtpar () {
    local TARGET=$1
    targetcli /iscsi/${TARGET}/tpg1 set parameter MaxConnections=64
    targetcli /iscsi/${TARGET}/tpg1 get attribute |grep = |grep -v =0
}
tgbs () {
    local TYPE=$1
    local VOL=$2
    targetcli /backstores/${TYPE}/${VOL} set attribute block_size=512
    targetcli /backstores/${TYPE}/${VOL} get attribute |grep = |grep -v =0
}
tgtmk () {
    local TARGET=$1
    local VOL=$2
    local TYPE=$3
    shift 3
    targetcli /backstores/${TYPE}/ create ${VOL} $*
    tgbs ${TYPE} ${VOL}
    targetcli /iscsi create ${TARGET}
    tgtpar ${TARGET}
    targetcli /iscsi/${TARGET}/tpg1/luns create /backstores/${TYPE}/${VOL}
    targetcli /iscsi/${TARGET}/tpg1/ set attribute demo_mode_write_protect=0 \
        generate_node_acls=1 cache_dynamic_acls=1
    targetcli /iscsi/${TARGET}/tpg1 set attribute authentication=1
    targetcli /iscsi/${TARGET}/tpg1 set auth userid=chap_user password=chap_password
    targetcli /iscsi/${TARGET}/tpg1/portals create 0.0.0.0 3260
    targetcli /iscsi/${TARGET}/tpg1/portals/0.0.0.0:3260 enable_iser true
    targetcli saveconfig
    targetcli ls / 9
}

tgtmk iqn.2019-01.local.host vol1 block /dev/zvol/pool/vol1