ESXi iSER iSCSI

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

tsteine

Active Member
May 15, 2019
167
83
28
@Rand__
Writing to a zvol and increasing the size gave a far more reasonable result:

Code:
tsteine@san:/SAN$ sudo fio --max-jobs=1 --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/zvol/SAN/TESTVOL --bs=4k --iodepth=1 --size=12G --readwrite=randread
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=281MiB/s][r=71.9k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2216158: Wed May 20 18:45:01 2020
  read: IOPS=69.4k, BW=271MiB/s (284MB/s)(12.0GiB/45346msec)
   bw (  KiB/s): min=77336, max=400160, per=100.00%, avg=281326.07, stdev=46263.62, samples=89
   iops        : min=19334, max=100040, avg=70331.39, stdev=11565.87, samples=89
  cpu          : usr=14.14%, sys=40.82%, ctx=3145792, majf=0, minf=9
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=3145728,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=271MiB/s (284MB/s), 271MiB/s-271MiB/s (284MB/s-284MB/s), io=12.0GiB (12.9GB), run=45346-45346msec
tsteine@san:/SAN$
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
And I already wondered what you were running that this was so fast:p

Edit: thats an async file system is it not?
 

tsteine

Active Member
May 15, 2019
167
83
28
write:

Code:
tsteine@san:/SAN$ sudo fio --max-jobs=1 --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/zvol/SAN/TESTVOL --bs=4k --iodepth=1 --size=12G --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2231044: Wed May 20 18:47:42 2020
  write: IOPS=32.8k, BW=128MiB/s (134MB/s)(12.0GiB/95987msec); 0 zone resets
   bw (  KiB/s): min=23552, max=170040, per=99.81%, avg=130845.55, stdev=32341.62, samples=191
   iops        : min= 5888, max=42510, avg=32711.35, stdev=8085.37, samples=191
  cpu          : usr=11.35%, sys=29.15%, ctx=3151661, majf=0, minf=8
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,3145728,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=128MiB/s (134MB/s), 128MiB/s-128MiB/s (134MB/s-134MB/s), io=12.0GiB (12.9GB), run=95987-95987msec
 

tsteine

Active Member
May 15, 2019
167
83
28
Yes, i'm running this async, but I also have redundant PSU's, a UPS, and a reasonable backup scheme, so I'm not super worried. Let me give it a try with using the optane drive as a SLOG though so that it writes synchronously.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Nice - found an old screeshot of yours with CDM showing only like 55 MB/s - was that before RDMA?
 

tsteine

Active Member
May 15, 2019
167
83
28
@Rand__

Oh yeah, that tanked the performance hard:

Code:
tsteine@san:/dev/disk/by-id$ sudo zpool add SAN log nvme-INTEL_SSDPE21D280GA_PHM27490022X280AGN
tsteine@san:/dev/disk/by-id$ sudo zfs set sync=always SAN
tsteine@san:/dev/disk/by-id$ sudo fio --max-jobs=1 --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/zvol/SAN/TESTVOL --bs=4k --iodepth=1 --size=12G --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [w(1)][6.0%][w=24.5MiB/s][w=6282 IOPS][eta 08m:04s]
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
With increased size (sync'ed fs), RandRead
Code:
fio --max-jobs=1 --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=TESTVOL --bs=4k --iodepth=1 --size=12G --readwrite=randread
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.16
Starting 1 process
test: Laying out IO file (1 file / 12288MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=444MiB/s][r=114k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=15067: Wed May 20 20:50:40 2020
  read: IOPS=112k, BW=437MiB/s (458MB/s)(12.0GiB/28106msec)
   bw (  KiB/s): min=335568, max=458632, per=99.78%, avg=446708.80, stdev=18032.97, samples=56
   iops        : min=83892, max=114658, avg=111676.93, stdev=4508.25, samples=56
  cpu          : usr=12.69%, sys=34.98%, ctx=3149776, majf=0, minf=2
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=3145728,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=437MiB/s (458MB/s), 437MiB/s-437MiB/s (458MB/s-458MB/s), io=12.0GiB (12.9GB), run=28106-28106msec
And the same with Rand Write
Code:
fio --max-jobs=1 --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=TESTVOL2 --bs=4k --iodepth=1 --size=12G --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.16
Starting 1 process
test: Laying out IO file (1 file / 12288MiB)
Jobs: 1 (f=1): [w(1)][99.2%][w=111MiB/s][w=28.4k IOPS][eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=15122: Wed May 20 20:54:46 2020
  write: IOPS=24.6k, BW=96.2MiB/s (101MB/s)(12.0GiB/127737msec)
   bw (  KiB/s): min=77132, max=127712, per=99.45%, avg=97960.64, stdev=6557.91, samples=255
   iops        : min=19283, max=31928, avg=24489.80, stdev=1639.49, samples=255
  cpu          : usr=3.35%, sys=8.06%, ctx=3164995, majf=0, minf=2
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,3145728,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=96.2MiB/s (101MB/s), 96.2MiB/s-96.2MiB/s (101MB/s-101MB/s), io=12.0GiB (12.9GB), run=127737-127737msec
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
So you couldnt per chance run that on a client vm too?;) (async only then o/c)


Older Ubuntu Client VM (running on nfs, synced), so ancient fio version here
Write:

Code:
sudo fio --max-jobs=1 --randrepeat=1 --ioengine=libaio --dir                                                                                                                   ect=1 --gtod_reduce=1 --name=test --filename=write --bs=4k --iodepth=1 --size=12                                                                                                                   G --readwrite=randwrite
[sudo] password for thomas:
test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.2.10
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 12288MB)
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/37556KB/0KB /s] [0/9389/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2440: Wed May 20 21:04:30 2020
  write: io=12288MB, bw=32144KB/s, iops=8036, runt=391450msec
  cpu          : usr=1.82%, sys=11.80%, ctx=3151417, majf=1, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=3145728/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=12288MB, aggrb=32144KB/s, minb=32144KB/s, maxb=32144KB/s, mint=391450msec, maxt=391450msec

Disk stats (read/write):
  sda: ios=32935/3173394, merge=22/555197, ticks=6584/421676, in_queue=427872, util=84.00%
Rand Read
Code:
sudo fio --max-jobs=1 --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=write --bs=4k --iodepth=1 --size=12G --readwrite=randread
test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.2.10
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [61848KB/0KB/0KB /s] [15.5K/0/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=3163: Wed May 20 21:08:21 2020
  read : io=12288MB, bw=60125KB/s, iops=15031, runt=209278msec
  cpu          : usr=2.77%, sys=20.35%, ctx=3149788, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=3145728/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=12288MB, aggrb=60125KB/s, minb=60125KB/s, maxb=60125KB/s, mint=209278msec, maxt=209278msec

Disk stats (read/write):
  sda: ios=3154268/2681, merge=44/24574, ticks=174344/2960, in_queue=177040, util=82.41%
 
Last edited:

tsteine

Active Member
May 15, 2019
167
83
28
@Rand__
Takes forever with 12gb, so changed to 1, though didn't seem to impact iops anyway, stayed the same.

Seem to be running into hard caps here in terms of latency, funny that CDM was faster.

I'm running vmware paravirtual scsi controller for the VM by the way, and the os is ubuntu 20.04 lts.

randread:
Code:
tsteine@nextcloud:~$ fio --max-jobs=1 --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=TESTVOL --bs=4k --iodepth=1 --size=1G --readwrite=randread
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=34.8MiB/s][r=8913 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=72627: Wed May 20 19:03:34 2020
  read: IOPS=8826, BW=34.5MiB/s (36.2MB/s)(1024MiB/29701msec)
   bw (  KiB/s): min=31112, max=39032, per=99.98%, avg=35297.39, stdev=1551.41, samples=59
   iops        : min= 7778, max= 9758, avg=8824.34, stdev=387.85, samples=59
  cpu          : usr=0.73%, sys=5.32%, ctx=262329, majf=0, minf=48
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=34.5MiB/s (36.2MB/s), 34.5MiB/s-34.5MiB/s (36.2MB/s-36.2MB/s), io=1024MiB (1074MB), run=29701-29701msec

Disk stats (read/write):
  sda: ios=261675/33, merge=0/30, ticks=23194/7, in_queue=16, util=99.70%
randwrite:

Code:
tsteine@nextcloud:~$ fio --max-jobs=1 --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=TESTVOL --bs=4k --iodepth=1 --size=1G --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=28.1MiB/s][w=7191 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=72736: Wed May 20 19:04:50 2020
  write: IOPS=7252, BW=28.3MiB/s (29.7MB/s)(1024MiB/36145msec); 0 zone resets
   bw (  KiB/s): min=20472, max=32472, per=100.00%, avg=29010.65, stdev=2296.96, samples=72
   iops        : min= 5118, max= 8118, avg=7252.64, stdev=574.24, samples=72
  cpu          : usr=0.85%, sys=4.33%, ctx=262492, majf=0, minf=45
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,262144,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=28.3MiB/s (29.7MB/s), 28.3MiB/s-28.3MiB/s (29.7MB/s-29.7MB/s), io=1024MiB (1074MB), run=36145-36145msec

Disk stats (read/write):
 
  • Like
Reactions: Rand__

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Thanks a million
But weird that even reads are impacted so heavily from local (nas) to remote despite RDMA
 

tsteine

Active Member
May 15, 2019
167
83
28
@Rand__
from windows vm, also with vmware paravirtual scsi controller:

I could potentially increase the Q1T1 random 4k performance by using smaller than 32k blocks for the ZVOL, but that also tanks the sequential performance a bit, i feel like this is a pretty good middleground, not to mention my vms do feel pretty damn snappy.

Not sure I'd get much better performance overall without getting a bunch of nvme drives and switching to NVME-OF

Capture.JPG
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Here are mine... (sync writes though, need to rerun with async)
at least the impact of RDMA is visible here quite good on the latency (not to mention write perf)
Whats the HW behind it?

1590003687028.png

1590003499761.png


ASYNC
1590004259419.png

1590004405594.png
 
Last edited:

tsteine

Active Member
May 15, 2019
167
83
28
@Rand__ For fun, i tried exposing the optane 900p drive through SCST with iser.

Definitely hitting some kind of hard cap in terms of latency and overhead with the iscsi/iser and vmware/vmfs stack
optane.JPG
 

tsteine

Active Member
May 15, 2019
167
83
28
@Rand__
It's running a C422 chipset with an intel xeon W-2155 10core cpu and 8 dimms of 32gb 2666mhz ecc sticks for a total of 256gb ram.

the hba is an LSI 9300 24-i, with 24 2TB wd red drives running in a 12x 2way mirror configuration.

l2arc is an optane 900p 280gb drive

edit:

the esxi hypervisors are running AMD 7302 16core cpus with 128gb of 2666mhz ecc memory each
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Ah that explains the 10G Read speed :)
For completeness sake, my box is a Xeon Gold 5122 w 512G mem, 12 mirror'ed SS300 800GB SAS3 SSDs and nvDimm slog (with a CX3) on FN 11.3
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
No unfortunately not, all my results are without.
Still contemplating a move to linux with ZoL to get RDMA capabilities
 

Bjorn Smith

Well-Known Member
Sep 3, 2019
876
481
63
49
r00t.dk
Ok, thats what I thought. I was thinking about doing the same, but I like the management features of FreeNAS - I hate the new UI - still haven't upgraded from 11.1 since nothing new that I need is there.
Perhaps when they finally either add iSER support or NFS 4.1 I might consider it, but for now I get nothing from upgrading except a shitty UI :)