ESXi iSER iSCSI

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
Hmm,

Ok - I was too quick - it shows up under devices now, but when I try to create a datastore it dies - device dissapears again.
 

tsteine

Active Member
May 15, 2019
167
83
28
I am running a mellanox SN2700

[root@localhost:/var/log] cat vmkernel.log | grep 'WARNING: rdma'
2020-05-20T17:05:50.362Z cpu24:2098164)WARNING: rdmaDriver: RDMAGetValidGidType:1896: Protocol not supported by device
2020-05-20T17:05:50.362Z cpu24:2098164)WARNING: rdmaDriver: RDMACM_BindLegacy:3290: Underlying device does not support requested gid/RoCE type. Failed with status: Protocol not supported
[root@localhost:/var/log]
 

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
ok, so you also get the WARNING - thats "good" - so my changes caused the iSCSI device to appear shortly and then die with an all paths down error and to never come up again.
Restart of my linux box (possibly only the scst service) makes it reappear in ESXi.
At least I can work with this.
Thanks so far :)
 

tsteine

Active Member
May 15, 2019
167
83
28
No problem.

If restarting your linux box makes it reappear in esxi, then it does sounds somewhat like the linux box setup could be the problem.

Not going to be able to help you with a physical setup there though, since I really don't want to rip out the connectx-5 from the storage box and put a connectx-3 in there, it's simple enough with one of the esxi hosts, but the storage box is a real pain in the ass to tinker with for troubleshooting for this.
 

tsteine

Active Member
May 15, 2019
167
83
28
I should note that I did have similar problems when I had an MTU mismatch in the chain between the servers, but I'm assuming you have checked the setup from a to z several times and eliminated that issue?
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
sorry for going slightly off topic - @tsteine what kind of performance increase did you see with rdma opposed to non RDMA traffic? Any benchmarks per chance?
 

tsteine

Active Member
May 15, 2019
167
83
28
my linux setup:

Minus ECN for tcp traffic.

crontab of what that looks like:

@reboot /usr/sbin/ip link set ens4f1.100 type vlan egress-qos-map 0:3
@reboot /usr/sbin/ip link set ens4f1.100 type vlan egress-qos-map 1:3
@reboot /usr/sbin/ip link set ens4f1.100 type vlan egress-qos-map 2:3
@reboot /usr/sbin/ip link set ens4f1.100 type vlan egress-qos-map 3:3
@reboot /usr/sbin/ip link set ens4f1.100 type vlan egress-qos-map 4:3
@reboot /usr/sbin/ip link set ens4f1.100 type vlan egress-qos-map 5:3
@reboot /usr/sbin/ip link set ens4f1.100 type vlan egress-qos-map 6:3
@reboot /usr/sbin/ip link set ens4f1.100 type vlan egress-qos-map 7:3
@reboot /usr/bin/mlnx_qos -i ens4f1 --pfc 0,0,0,1,0,0,0,0
@reboot /usr/bin/mlnx_qos -i ens4f1 --trust dscp
@reboot /usr/bin/echo 106 > /sys/class/infiniband/mlx5_1/tc/1/traffic_class
@reboot /usr/sbin/cma_roce_tos -d mlx5_1 -t 106
 

tsteine

Active Member
May 15, 2019
167
83
28
@Rand__ Pretty drastic, I didn't save any benchmarks since I wasn't planning on doing any kind of write up or anything on it.

But I did test with the VMWARE io analyzer.

Without RDMA I capped out at about 150k iops in aggregate, with pretty ugly cpu usage and overhead, with RDMA on scst in aggregate across all 6 luns, the box managed to supply about 450k iops to 6 vms, or 75k per vm testing on the LUN.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
nice:)
Any idea re impact on a lower qd/job level? Or did you only look at aggregate?
 

vangoose

Active Member
May 21, 2019
326
104
43
Canada
Did you see rdma adapter under storage adapters?
You may have created the iSCSI software adaptor and it falls back to iscsi instead of rdma. When rdma iser works, you don't need iscsi adapter.
 

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
when I had an MTU mismatch
my linux was at default - setting it to 9000 now, and rebooting - will report back - but at least it seems like I got it to working on ESXi, which is awesome - now I can test on linux with different settings.

Thanks a lot for your time:)
 

tsteine

Active Member
May 15, 2019
167
83
28
@Rand__ I never did test that, I was mostly interested in putting the storage box through it's paces for the max amount of performance I could get out of it for all my vms.

it manages about 11k iops on qd1, 1 thread though, and 110k for QD32 16 threads on a single vm if that is information of interest.

edit:for 4k random io
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
@Rand__ I never did test that, I was mostly interested in putting the storage box through it's paces for the max amount of performance I could get out of it for all my vms.

it manages about 11k iops on qd1, 1 thread though, and 110k for QD32 16 threads on a single vm if that is information of interest.
How many VMs are you running?
And yes, very helpful - what blocksize? - and what filesystem behind it (san side)?

edit - is that CDM or another tool?
 

tsteine

Active Member
May 15, 2019
167
83
28
@Rand__ currently running 19 vms, blocksize is 4k and random.

San filesystem is ZFS; so the benchmark is hitting the ram cache, but even then, since I run 256gb of ram on the box and a 280gb 900p optane l2arc, the data needed by vms is usually either in ram or on the very fast optane drive.

I have toyed with the idea of trying some fun with NVMEof on the optane drive though.

edit: that is crystal disk mark, yes.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
excellent - very helpful, thanks
one last thing - could you run the same benchmark (or similar) on the san directly?
The question is how much loss the network induces - I can then run the same comparison on my filer to see the loss there which lets me deduct a potential improvement :)
 

tsteine

Active Member
May 15, 2019
167
83
28
@Rand__

I haven't really worked with FIO a whole lot, so not sure whether I used the correct settings but this is what I got:


Code:
tsteine@san:/SAN$ fio --max-jobs=1 --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=1 --size=4G --readwrite=randread
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=874MiB/s][r=224k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2144248: Wed May 20 18:37:58 2020
  read: IOPS=226k, BW=884MiB/s (927MB/s)(4096MiB/4631msec)
   bw (  KiB/s): min=888502, max=929584, per=99.74%, avg=903370.22, stdev=13424.85, samples=9
   iops        : min=222125, max=232396, avg=225842.44, stdev=3356.17, samples=9
  cpu          : usr=15.44%, sys=84.54%, ctx=8, majf=0, minf=9
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1048576,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=884MiB/s (927MB/s), 884MiB/s-884MiB/s (927MB/s-927MB/s), io=4096MiB (4295MB), run=4631-4631msec
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Could you do rand write please?
Running the same on my box now:)

Edit:
Code:
 fio --max-jobs=1 --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=1 --size=4G --readwrite=randread
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.16
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=483MiB/s][r=124k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=14967: Wed May 20 20:44:40 2020
  read: IOPS=122k, BW=478MiB/s (501MB/s)(4096MiB/8574msec)
   bw (  KiB/s): min=469976, max=502387, per=99.72%, avg=487828.06, stdev=8262.23, samples=17
   iops        : min=117494, max=125596, avg=121956.76, stdev=2065.59, samples=17
  cpu          : usr=13.02%, sys=37.49%, ctx=1048735, majf=0, minf=2
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1048576,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=478MiB/s (501MB/s), 478MiB/s-478MiB/s (501MB/s-501MB/s), io=4096MiB (4295MB), run=8574-8574msec