ESXi iSER iSCSI

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
Hi,

I have a couple of servers connected via a Mellanox SX6018 - all servers are running ConnectX-3's (non pro)

I am trying to get the holy trinity to work, ESXi, with iSER and iSCSI.

I have done all the prequisite steps, created the iSER adapter, updated both my switch and network modules with configuration for flow control.

But every time I rescan my iSER iSCSI adapter i esxi nothing happens.

Looking into the vmkernel.log I see that some errors are being logged:
Code:
2020-05-20T14:52:13.953Z cpu14:2098050)WARNING: rdmaDriver: RDMAGetValidGidType:1896: Protocol not supported by device
2020-05-20T14:52:13.953Z cpu14:2098050)WARNING: rdmaDriver: RDMACM_BindLegacy:3290: Underlying device does not support requested gid/RoCE type. Failed with status: Protocol not supported
my esxi module has not enabled RoCE v2, since it obviously does not suport it

Code:
[root@vms1:~] esxcfg-module -g nmlx4_core
nmlx4_core enabled = 1 options = 'enable_rocev2=0'
my test machine where I have the iSCSI target:

Code:
[bbs@testnas ~]$ dmesg|grep Mellanox
[ 3.560060] mlx4_core: Mellanox ConnectX core driver v5.0-2.1.8
[ 11.382985] mlx4_en: Mellanox ConnectX HCA Ethernet driver v5.0-2.1.8
[ 11.403233] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v5.0-2.1.8
[ 28.274089] mlx4_core: Mellanox ConnectX core driver v5.0-2.1.8
[ 34.724035] mlx4_en: Mellanox ConnectX HCA Ethernet driver v5.0-2.1.8
[ 34.762727] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v5.0-2.1.8
[bbs@testnas ~]$


[bbs@testnas ~]$ dmesg|grep iser
[ 0.000000] ACPI: SSDT 0x00000000EDDE83C0 000573 (v03 HP riser0 00000002 INTL 20030228)
[ 35.432518] iscsi: registered transport (iser)
[bbs@testnas ~]$
Linux module parameters
Code:
[bbs@testnas ~]$ sudo sh ./shmoduleparam.sh mlx4
mlx4
Module: mlx4_ib
Parameter: dev_assign_str -->
Parameter: en_ecn --> N
Parameter: sm_guid_assign --> 0

Module: ib_core
Parameter: netns_mode --> Y
Parameter: recv_queue_size --> 512
Parameter: roce_v1_noncompat_gid --> Y
Parameter: send_queue_size --> 128

Module: mlx4_en
Parameter: inline_thold --> 104
Parameter: pfcrx --> 3
Parameter: pfctx --> 3
Parameter: udev_dev_port_dev_id --> 0
Parameter: udp_rss --> 1

Module: mlx4_core
Parameter: block_loopback --> 1
Parameter: debug_level --> 1
Parameter: enable_4k_uar --> Y
Parameter: enable_64b_cqe_eqe --> Y
Parameter: enable_qos --> Y
Parameter: enable_sys_tune --> 0
Parameter: enable_vfs_qos --> N
Parameter: fast_drop --> 0
Parameter: high_rate_steer --> 0
Parameter: ingress_parser_mode --> 0
Parameter: internal_err_reset --> 1
Parameter: log_mtts_per_seg --> 0
Parameter: log_num_cq --> 16
Parameter: log_num_mac --> 7
Parameter: log_num_mcg --> 13
Parameter: log_num_mgm_entry_size --> -10
Parameter: log_num_mpt --> 19
Parameter: log_num_mtt --> 21
Parameter: log_num_qp --> 19
Parameter: log_num_srq --> 16
Parameter: log_num_vlan --> 0
Parameter: log_rdmarc_per_qp --> 4
Parameter: mlx4_en_only_mode --> 0
Parameter: msi_x --> 1
Parameter: num_vfs -->
Parameter: port_type_array -->
Parameter: probe_vf -->
Parameter: roce_mode --> 1
Parameter: rr_proto --> 0
Parameter: ud_gid_type -->
Parameter: use_prio --> N

Module: mlx_compat
Parameter: compat_base --> mlnx-ofa_kernel-compat-20200401-1937-5f67178
Parameter: compat_base_tree --> mlnx_ofed/mlnx-ofa_kernel-4.0.git
Parameter: compat_base_tree_version --> 5f67178
Parameter: compat_version --> 5f67178
Target configuration
Code:
[bbs@testnas ~]$ sudo targetcli
targetcli shell version 2.1.fb49
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/iscsi> ls * 10
o- iscsi .............................................................................................................. [Targets: 1]
  o- iqn.2020-05.root.dom:esxi ........................................................................................... [TPGs: 1]
    o- tpg1 .................................................................................................... [gen-acls, no-auth]
      o- acls ............................................................................................................ [ACLs: 0]
      o- luns ............................................................................................................ [LUNs: 1]
      | o- lun0 ......................................................................... [block/esxi (/dev/zd0) (default_tg_pt_gp)]
      o- portals ...................................................................................................... [Portals: 1]
        o- 0.0.0.0:3260 ..................................................................................................... [iser]
/iscsi>
I have tried creating an iSCSI connection from the same machine that is running the target and that works fine.

I have also tried different OS'es to see if it was my CentOS that ESXi did not like, but same issue, ESXi says Protocol not supported by device.

I have been in contact the Mellanox support, and they say that the Non-Pro version of the card should support RoCE v1 just fine and should run in ESXi - but I don't know if the driver in ESXi is somehow gimped because VMWare do not want us to run old hardware - or its something else.

Any ideas from those of you that have had success with iSER and ESXi on ConnectX-3's please speak up :)
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113

tsteine

Active Member
May 15, 2019
167
83
28
First off, we need more information about your setup on the ESXI side.

I had iSER running with my connectx-3 (non pro) adapters on ESXI, albeit vSphere 6.7.

How have you set this up on the ESXI side? Did you set up Network Port Binding on your iSER interface?
 

tsteine

Active Member
May 15, 2019
167
83
28
I will also note, since it appears you're using ZFS with LIO, that my experience with LIO and ZFS has been atrociously poor performance, SCST was the only linux iSCSI target that gave me great performance with ZFS and iSER
 

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
IIRC there was some virtual (?) RDMA interface that needed to be created on ESXi side - do you have that ?
Yes I have done all the required ESXi setup as far as I know:
Code:
esxcli rdma iser add
which is the command that adds the iSER adapters that you bind the iSCSI disks to

Did you set up Network Port Binding on your iSER interface?
Yes I have done this - its a good question though - its not obvious that you need to do it.

I will also note, since it appears you're using ZFS with LIO
I have also tested with SCST - and same issue, targets appear but no dice from esxi

I have updated to latest version of the 6.7 driver for connectx-3: 3.17.70.1

And just to be clear - I am trying to run this on ethernet - not infiniband - if that makes any difference
 

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
flint on my linux
Code:
[bbs@testnas ~]$ sudo flint -d 05:00.0 q
[sudo] password for bbs:
Image type:            FS2
FW Version:            2.42.5000
FW Release Date:       5.9.2017
Product Version:       02.42.50.00
Rom Info:              version_id=8025 type=CLP
                       type=PXE version=3.4.752
Device ID:             4099
Description:           Node             Port1            Port2            Sys image
GUIDs:                 0002c9030037b4a0 0002c9030037b4a1 0002c9030037b4a2 0002c9030037b4a3
MACs:                                       0002c937b4a0     0002c937b4a1
VSD:
PSID:                  HP_0280210019
Flint on my esxi
Code:
[root@vms1:~] /opt/mellanox/bin/flint -d 02:00.0 q
Image type:            FS2
FW Version:            2.42.5000
FW Release Date:       5.9.2017
Product Version:       02.42.50.00
Rom Info:              version_id=8025 type=CLP
                       type=PXE version=3.4.752
Device ID:             4099
Description:           Node             Port1            Port2            Sys image
GUIDs:                 0002c903003e5260 0002c903003e5261 0002c903003e5262 0002c903003e5263
MACs:                                       0002c93e5260     0002c93e5261
VSD:
PSID:                  HP_0280210019
So I should be running latest firmware as well.
 

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
Oh - and I am also running vsphere 6.7, but the hypervisor is called ESXi - vsphere is just the management platform.

But please ask for output from my ESXi setup - I might have missed a crucial part of the setup.
 

tsteine

Active Member
May 15, 2019
167
83
28
I recall having to restart the esxi host when adding an iSER interface and enabling PFC and setting PCP_FORCE for the nmlx4 esxi module.

It's a pity I've upgraded my connectx-3s to connectx-5, upgraded esxi to 7.0, and switched to RoCEv2, or we could've simply compared setups directly.


Just to cover all our bases here, have you set up jumbo frames?
 

tsteine

Active Member
May 15, 2019
167
83
28
Actually, hold on a moment here.

I have 2 hosts and a spare ssd, let me throw one of my connectx-3s in there and replace the drive with one running esxi 6.7
 

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
I am wondering if it could also be my linux setup that is somehow wrong, that it just appears to be running iSER, but in fact its not.
let me throw one of my connectx-3s in there and replace the drive with one running esxi 6.7
Awesome - you are the best :)
 

tsteine

Active Member
May 15, 2019
167
83
28
If we're going to get into semantics, to quote VMWare's own page

vSphere Hypervisor is a bare-metal hypervisor that virtualizes servers; allowing you to consolidate your applications while saving time and money managing your IT infrastructure. Our free vSphere Hypervisor is built on the world’s smallest and most robust architecture: VMware vSphere ESXi, which sets the industry standard for reliability, performance, and support.
Anyhow, about to boot up new installation, give me about 20~ mins to do the necessary setup for iscsi and iser for the host.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
@tsteine any idea which vSphere services actually utilize RDMA?
Is that only iSCSI or others too? Have not been able to find a proper answer and never saw any RDMA traffic in my test setup (vsphere only, no iscsi san)
 

tsteine

Active Member
May 15, 2019
167
83
28
As far as I am aware, it's purely iSCSI iSER and PVRDMA for virtual machines.

The only rdma traffic I've recorded for my setup is iSER since I don't run PVRDMA since I like to be able to vMotion virtual machines.
 

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
Hmm,
I did something :)
And it turns out something fixed it.
Code:
sudo mlxconfig -d 05:00.0 s SRIOV_EN=0 NUM_OF_VFS=0 LINK_TYPE_P1=2 LINK_TYPE_P2=2
I did read somewhere that SRIOV was not supported with iSER - but when I looked in my hardware tab on the ESXi server, it was not enabled, so I thought I was already running without SRIOV.

But mlxconfig q showed me different:
Code:
[bbs@testnas ~]$ sudo mlxconfig -d 05:00.0 q
[sudo] password for bbs:

Device #1:
----------

Device type: ConnectX3
Device: 05:00.0

Configurations: Next Boot
SRIOV_EN True(1)
NUM_OF_VFS 16
LINK_TYPE_P1 VPI(3)
LINK_TYPE_P2 VPI(3)
LOG_BAR_SIZE 5
BOOT_PKEY_P1 0
BOOT_PKEY_P2 0
BOOT_OPTION_ROM_EN_P1 True(1)
BOOT_VLAN_EN_P1 False(0)
BOOT_RETRY_CNT_P1 0
LEGACY_BOOT_PROTOCOL_P1 PXE(1)
BOOT_VLAN_P1 1
BOOT_OPTION_ROM_EN_P2 True(1)
BOOT_VLAN_EN_P2 False(0)
BOOT_RETRY_CNT_P2 0
LEGACY_BOOT_PROTOCOL_P2 PXE(1)
BOOT_VLAN_P2 1
IP_VER_P1 IPv4(0)
IP_VER_P2 IPv4(0)
CQ_TIMESTAMP True(1)

I am sorry if I have wasted your time - I will create a blog post with everything I have learnt.
 

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
Oh - and I still get errors in the vmkernel log.

2020-05-20T17:05:13.732Z cpu14:2098050)WARNING: rdmaDriver: RDMAGetValidGidType:1896: Protocol not supported by device
2020-05-20T17:05:13.732Z cpu14:2098050)WARNING: rdmaDriver: RDMACM_BindLegacy:3290: Underlying device does not support requested gid/RoCE type. Failed with status: Protocol not supported

But it works, so I don't know what it is complaining about, possibly RoCEv2
 

tsteine

Active Member
May 15, 2019
167
83
28
Right, so this appears to work out of the box with my connectx-3.

On a clean setup of 6.7 with a connectx-3 (MCX354A-FCBT ):

physical device:
adapter.JPG

vmkernel adapter
vmkernel.JPG

iser port binding
iser port binding.JPG

iser devices seen
iser devices.JPG