Shared storage ideas...vSphere 6.5

BSDguy · Nov 4, 2017

I've been really happy with the Samsung SM863s. Good performance and an amazing endurance (3000TB).

I'm tempted to get a few more and increase my RAID10 volume to 8 drives as I think the current 860GB is not going to be enough.

Is dedupe still a no no in ZFS?

Out of the 860GB volume I have how much space can I use before things start to go wrong with fragmentation/poor performance/etc?

SlickNetAaron · Nov 4, 2017

whitey said:
In a sad attempt to get close to where you are I just blew away my raidz 4 disk 400gb husmm's w/ a matching 200GB husmm SLOG. Wonder why I can't make these sing, hell raidz was just as good w/ these devices.

Can try iSCSI I guess, may get a 10-20% gain. Starting to get a bit dejected abt these husmm's.

Just like our OP, Using a single husmm as a SLOG for 2x mirrors of the same (or faster) drives will definitely slow you down. The 2x mirrors are faster by themselves! That's probably the same or worse performance as your RAIDZ. Remove your SLOG and your speeds will go up significantly.. Like our OP.

whitey · Nov 4, 2017

SlickNetAaron said:
Just like our OP, Using a single husmm as a SLOG for 2x mirrors of the same (or faster) drives will definitely slow you down. The 2x mirrors are faster by themselves! That's probably the same or worse performance as your RAIDZ. Remove your SLOG and your speeds will go up significantly.. Like our OP.

Started w/ a 2 disk mirror, was getting 200-250MB/s, added 2 more, same thing roughly, added in a SLOG to those 4 drives (stripped mirror) and then see 300-350MB/s.

So I don't think your assumption is entirely correct.

Not cool

Will go run single hussm, mirror, stripped mirror, and raidz, ALL w/ no SLOG for now and report back shortly.

whitey · Nov 4, 2017

BSDguy said:
I've been really happy with the Samsung SM863s. Good performance and an amazing endurance (3000TB).

I'm tempted to get a few more and increase my RAID10 volume to 8 drives as I think the current 860GB is not going to be enough.

Is dedupe still a no no in ZFS?

Out of the 860GB volume I have how much space can I use before things start to go wrong with fragmentation/poor performance/etc?

De-dup is still a 'you had better have plenty of memory to deal w/ that HOG', 80% threshold is where ZFS starts to get pissy/drop off performance from my experience.

BSDguy · Nov 4, 2017

whitey said:
De-dup is still a 'you had better have plenty of memory to deal w/ that HOG', 80% threshold is where ZFS starts to get pissy/drop off performance from my experience.

I have 32GB of RAM in the server but I remember reading many issues/challenges with dedupe so I think I'll give it a miss.

Ok, so 80% is the maximum I should use in the zpool/volume...so I can use about 688GB out of the 860GB volume

One thing that is confusing me is UNMAP with FreeNAS. I copied 7GB of files to an 18GB VM and in the datastore I could see the vmdk increase in size to 25GB. I then deleted the 7GB of files (and emptied the recycle bin) but the vmdk file is still 25GB in size. This was done over an hour ago. When I used Starwinds the vmdk file would shrink in size within minutes.

Am I missing something here with UNMAP and FreeNAS?

Also, should I be overprovisioning in FreeNAS? I have moved 2 VMs to the FreeNAS datastore and vSphere shows this:

But FreeNAS shows this:

So 53GB used in vSphere vs. 26GB used in FreeNAS (I assume this is the compression) so should I rather recreate the volume/zpool and create it to be bigger, like say 1.2TB?

whitey · Nov 4, 2017

I am not an UNMAP aficionado as I typically use NFS, file does not suffer this oddity ONLY block...I'd have to research how to effectively/properly test UNMAP unless you are certain you are on the right path.

I DO see this:

whitey · Nov 4, 2017

Did you make that iSCSI zvol thin/sparse? I'd bet the space difference is related. Thin on thin madness going on maybe (thin zvol, thin vdisks), hell I am grasping.

BSDguy · Nov 4, 2017

whitey said:
Did you make that iSCSI zvol thin/sparse? I'd bet the space difference is related. Thin on thin madness going on maybe (thin zvol, thin vdisks), hell I am grasping.

Yes, it is sparse. I read that you *had* to have it sparse for unmap to work.

whitey · Nov 4, 2017

See you're further along than me, remember iSCSI is NOT my forte' so we are both learning here :-D

SlickNetAaron · Nov 4, 2017

whitey said:
Started w/ a 2 disk mirror, was getting 200-250MB/s, added 2 more, same thing roughly, added in a SLOG to those 4 drives (stripped mirror) and then see 300-350MB/s.

So I don't think your assumption is entirely correct.

Not cool

Hmm.. I did get threads mixed up. Turns out I was thinking of your thread actually! Around post #90 of whitey's FreeNAS ZFS ZIL testing.

whitey · Nov 4, 2017

No worries, was thinking of that thread/comments as well. For sure if ya use a slower slog than pool devs then you 'probably' will run into that situation. Same devs here, they just dont seem to be able to perform at expected levels for this use case...they are not bad but spec sheet makes me think they 'should' have more juice under hood.

BSDguy · Nov 7, 2017

Disaster in the lab! It was all going so well until today...

Both my ESXi hosts were show their 3 datastores as having zero bytes of disk space on them. I rebooted the hosts but this didn't help. I rescanned the iSCSI adapter but that didn't help either.

I logged onto FreeNAS via the webgui and I could see all the volumes were in a healthy state (no errors). So I rebooted FreeNAS and after doing this both ESXi hosts connected to the iSCSI targets again and I could see my VMs boot up.

Annoyingly, I've had this issue before with Starwinds when I lost connectivity to all my iSCSI targets so I'm assuming that the Mellanox ConnectX2 10Gb NICs I am using the hosts and FreeNAS box are to blame (or the cables)?

When I checked /var/log/messages I have loads of these errors:

Code:

Nov  7 08:25:51 san WARNING: 192.168.61.3 (iqn.1998-01.com.vmware:esxi2-4d9a7f4c): no ping reply (NOP-Out) after 5 seconds;

So what are my options here? Is it drivers? Firmware? Config issue? Hardware? I'm not even using a switch for my storage traffic, its all direct connect.

For some reason ESXi is losing the connection to the iSCSI target and I'm not sure why...

audio catalyst · Nov 7, 2017

Using the mellanox's in my esx boxes, and they have been rock solid.

Are you using a switch ?
If yes check those logs

I woukd check the esx logs too, and see if there might be a bad cable

Are you using fibermodules ?

Also, make sure your mellanox devices are running the latest firmware

send from a mobile device, so typo's are to be expected

BSDguy · Nov 7, 2017

audio catalyst said:
Using the mellanox's in my esx boxes, and they have been rock solid.

Are you using a switch ?
If yes check those logs

I woukd check the esx logs too, and see if there might be a bad cable

Are you using fibermodules ?

Also, make sure your mellanox devices are running the latest firmware

send from a mobile device, so typo's are to be expected

No switch for storage traffic...using DAC cables only for direct connect between hosts and SAN.

Not using fiber modules. SFP+ connections.

When the Mellanox NICs arrived earlier this years I updated the firmware on all the cards (but I did this in a Windows machine).

Which ESXi logs should I check?

Would using jumbo frames be causing an issue? I have set the MTU to 9000 on all the vmkernels and vswitches as well as in FreeNAS so I think I have set it everywhere and a test ping of 8972 comes back successful.

BSDguy · Nov 7, 2017

Some further info from the vmkernel.log off one of the hosts:

Code:

2017-11-07T21:56:14.464Z cpu5:66376)ScsiDeviceIO: 2962: Cmd(0x43950120b400) 0x89, CmdSN 0x2bd5e2 from world 66556 to dev "naa.6589cfc0000006f22a5c1eb41598028b" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0.
2017-11-07T21:56:54.496Z cpu2:66376)NMP: nmp_ThrottleLogForDevice:3617: Cmd 0x89 (0x4395013d0800, 66553) to dev "naa.6589cfc0000006f22a5c1eb41598028b" on path "vmhba64:C0:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0. Act:NONE
2017-11-07T21:56:54.496Z cpu2:66376)ScsiDeviceIO: 2962: Cmd(0x4395013d0800) 0x89, CmdSN 0x2bd66d from world 66553 to dev "naa.6589cfc0000006f22a5c1eb41598028b" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0.
2017-11-07T21:57:30.514Z cpu13:66376)ScsiDeviceIO: 2962: Cmd(0x4395012afc00) 0x89, CmdSN 0x2bd6c6 from world 66563 to dev "naa.6589cfc0000006f22a5c1eb41598028b" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0.
2017-11-07T21:57:52.721Z cpu5:66169)NMP: nmp_ResetDeviceLogThrottling:3348: last error status from device naa.6589cfc0000006f22a5c1eb41598028b repeated 1 times
2017-11-07T22:00:04.608Z cpu5:66376)NMP: nmp_ThrottleLogForDevice:3617: Cmd 0x89 (0x439501001dc0, 66558) to dev "naa.6589cfc0000006f22a5c1eb41598028b" on path "vmhba64:C0:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0. Act:NONE
2017-11-07T22:00:04.608Z cpu5:66376)ScsiDeviceIO: 2962: Cmd(0x439501001dc0) 0x89, CmdSN 0x2bd881 from world 66558 to dev "naa.6589cfc0000006f22a5c1eb41598028b" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0.
2017-11-07T22:00:30.296Z cpu7:66376)NMP: nmp_ThrottleLogForDevice:3617: Cmd 0x89 (0x4395012a3100, 65559) to dev "naa.6589cfc0000006f22a5c1eb41598028b" on path "vmhba64:C0:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x38 0x7. Act:NONE
2017-11-07T22:00:30.296Z cpu7:66376)WARNING: ScsiDeviceIO: 2728: Space utilization on thin-provisioned device naa.6589cfc0000006f22a5c1eb41598028b exceeded configured threshold
2017-11-07T22:00:30.296Z cpu7:66376)ScsiDeviceIO: 2927: Cmd(0x4395012a3100) 0x89, CmdSN 0x2bd8b0 from world 65559 to dev "naa.6589cfc0000006f22a5c1eb41598028b" failed H:0x0 D:0x2 P:0x7 Valid sense data: 0x6 0x38 0x7.

I'm a bit confused by the "Space utilization on thin-provisioned device" error:

Code:

naa.6589cfc0000006f22a5c1eb41598028b exceeded configured threshold

On FreeNAS if I do a zfs list I get:

Code:

root@san:/var/log # zfs list
NAME                                                            USED  AVAIL  REFER  MOUNTPOINT
freenas-boot                                                   1.13G   114G   176K  none
freenas-boot/.system                                           24.8M   114G   176K  legacy
freenas-boot/.system/configs-2f1f44bcdd194541bc55d43e518f686d   556K   114G   556K  legacy
freenas-boot/.system/cores                                      568K   114G   568K  legacy
freenas-boot/.system/rrd-2f1f44bcdd194541bc55d43e518f686d      22.8M   114G  22.8M  legacy
freenas-boot/.system/samba4                                     204K   114G   204K  legacy
freenas-boot/.system/syslog-2f1f44bcdd194541bc55d43e518f686d    512K   114G   512K  legacy
freenas-boot/ROOT                                              1.09G   114G   136K  none
freenas-boot/ROOT/Initial-Install                                 8K   114G  1.08G  legacy
freenas-boot/ROOT/default                                      1.09G   114G  1.08G  legacy
freenas-boot/grub                                              7.82M   114G  7.82M  legacy
pro512                                                          159G   298G    88K  /mnt/pro512
pro512/iscsi                                                    159G   298G   159G  -
sm863                                                           267G   593G    88K  /mnt/sm863
sm863/iscsi                                                     267G   593G   267G  -
wd4tb                                                           409G  3.11T    88K  /mnt/wd4tb
wd4tb/iscsi                                                     409G  3.11T   409G  -

naa.6589cfc0000006f22a5c1eb41598028b = sm863 volume

So I have 593GB free out of 860GB so why the warning??

I'm not sure what the other errors are/mean?

BSDguy · Nov 8, 2017

Ok, so I'm not getting anywhere with my troubleshooting and am pretty tired of the hosts getting disconnected from the iSCSI storage. I can only think its the dual port 10Gb NIC in the SAN and/or the cables between the hosts and SAN as when this happens it affects BOTH hosts at the same time.

So I'm considering my options!

1) Infiniband 40/56 but running it in Ethernet mode. No switch, just direct connect. I assume I don't need a subnet manager if I use cards like these in Ethernet mode?

OR

2) Fiber channel - again, no switch, just direct connect from both hosts to the SAN. I was thinking of installing a quad fiber channel card in the SAN (QLOGIC QLE2564) and low profile dual port cards for the hosts (QLOGIC QLE2562)

Still researching details but can anyone comment on the above? Good? Bad?

whitey · Nov 8, 2017

I think you are sorely mistaken that you will run IB 40/56 gear in ethernet mode...not for cheap, takes a protocol bridge license that is $$$, hence why I advised IB is a PITA path. Ohh I see direct connect, yeah you could try that but I feel like that is grasping. FC makes me cringe even more, I had my heyday w/ Comstar FC target mode years ago but when I got to 10GbE why bother.

Have you considered setting up a NFS share just to validate reliability? I KNOW you want iSCSI/thin/UNMAP/VAAI goodness that iSCSI delivers but I'd say it's at least worth a shot to see if it does not experience the same disconnects/drops.

EDIT: on your 'I can ping @ jumbo frames'...did you use vmkping and target the proper vmk interface to send that traffic out of on the stg network? Make sure you didn't miss any hops along the way. In your case should be on vSwitch, vmkernel (vmk#) port, and FreeNAS AIO vmxnet3 nics.

BSDguy · Nov 8, 2017

whitey said:
I think you are sorely mistaken that you will run IB 40/56 gear in ethernet mode...not for cheap, takes a protocol bridge license that is $$$, hence why I advised IB is a PITA path. Ohh I see direct connect, yeah you could try that but I feel like that is grasping. FC makes me cringe even more, I had my heyday w/ Comstar FC target mode years ago but when I got to 10GbE why bother.

Have you considered setting up a NFS share just to validate reliability? I KNOW you want iSCSI/thin/UNMAP/VAAI goodness that iSCSI delivers but I'd say it's at least worth a shot to see if it does not experience the same disconnects/drops.

I am grasping because I have been struggling with this for a year now. Had endless storage issues. When my storage crashed yesterday morning at least I could login to the FreeNAS gui and see that all the ZFS pools/volumes were online so I think its safe to say that the disks and controllers are ok.

Which means its an issue with the 10Gb NICS, DAC cables and/or a config issue with iSCSI/VMware. I can't even swap out the NICs/cables to troubleshoot as I have no spares.

I was thinking FC with FreeNAS 11 but I'm just considering options at this stage. When 10Gb works the performance is amazing but the stability is a nightmare.

I could delete my smaller volume and set it up as an NFS datastore but I am still baffled by the errors/failures logged in the vmkernel.logs.

Could jumbo frames be an issue? I have done some test jumbo frame pings using 8972 packet size and all tests were successful.

I guess what I'm saying is: I'm desperate for a stable storage setup!!

BSDguy · Nov 8, 2017

So I've been keeping my eye on the errors in vmkernel.log and I get LOADS of these iSCSI sense codes:

Code:

2017-11-08T21:54:29.391Z cpu15:66050)ScsiDeviceIO: 2962: Cmd(0x43950118d940) 0x89, CmdSN 0x62c from world 134150 to dev "naa.6589cfc000000a63b770ad1ddd260d2a" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0.

2017-11-08T21:55:04.755Z cpu2:66169)NMP: nmp_ResetDeviceLogThrottling:3348: last error status from device naa.6589cfc000000a63b770ad1ddd260d2a repeated 3 times

2017-11-08T21:51:44.551Z cpu6:66376)NMP: nmp_ThrottleLogForDevice:3617: Cmd 0x2a (0x4395011968c0, 129566) to dev "naa.6589cfc0000006f22a5c1eb41598028b" on path "vmhba64:C0:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x38 0x7. Act:NONE

2017-11-08T21:51:44.552Z cpu6:66376)ScsiDeviceIO: 2927: Cmd(0x4395011968c0) 0x2a, CmdSN 0xdc0001 from world 129566 to dev "naa.6589cfc0000006f22a5c1eb41598028b" failed H:0x0 D:0x2 P:0x7 Valid sense data: 0x6 0x38 0x7.

2017-11-08T21:56:44.833Z cpu15:66376)NMP: nmp_ThrottleLogForDevice:3617: Cmd 0x2a (0x439501012ec0, 129131) to dev "naa.6589cfc0000006f22a5c1eb41598028b" on path "vmhba64:C0:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x38 0x7. Act:NONE

2017-11-08T21:56:44.833Z cpu15:66376)WARNING: ScsiDeviceIO: 2728: Space utilization on thin-provisioned device naa.6589cfc0000006f22a5c1eb41598028b exceeded configured threshold

2017-11-08T21:56:44.833Z cpu15:66376)ScsiDeviceIO: 2927: Cmd(0x439501012ec0) 0x2a, CmdSN 0x410001 from world 129131 to dev "naa.6589cfc0000006f22a5c1eb41598028b" failed H:0x0 D:0x2 P:0x7 Valid sense data: 0x6 0x38 0x7.

I found a really good SCSI sense decoder and this was the result of the one error:

Still not sure what to make of this? Why is thin provisioning saying a soft limited has exceeded when I have loads of free space on the volumes:

Code:

root@san:/var/log # zfs list
NAME                                                            USED  AVAIL  REFER  MOUNTPOINT
freenas-boot                                                   1.13G   114G   176K  none
freenas-boot/.system                                           26.6M   114G   176K  legacy
freenas-boot/.system/configs-2f1f44bcdd194541bc55d43e518f686d   660K   114G   660K  legacy
freenas-boot/.system/cores                                      636K   114G   636K  legacy
freenas-boot/.system/rrd-2f1f44bcdd194541bc55d43e518f686d      24.4M   114G  24.4M  legacy
freenas-boot/.system/samba4                                     204K   114G   204K  legacy
freenas-boot/.system/syslog-2f1f44bcdd194541bc55d43e518f686d    556K   114G   556K  legacy
freenas-boot/ROOT                                              1.09G   114G   136K  none
freenas-boot/ROOT/Initial-Install                                 8K   114G  1.08G  legacy
freenas-boot/ROOT/default                                      1.09G   114G  1.08G  legacy
freenas-boot/grub                                              7.82M   114G  7.82M  legacy
pro512                                                          157G   301G    88K  /mnt/pro512
pro512/iscsi                                                    157G   301G   157G  -
sm863                                                           258G   602G    88K  /mnt/sm863
sm863/iscsi                                                     258G   602G   258G  -
wd4tb                                                           503G  3.02T    88K  /mnt/wd4tb
wd4tb/iscsi                                                     503G  3.02T   503G  -

dswartz · Nov 8, 2017

I use connectx3-en cards in point to point mode between a vsphere 6.5 host and a CentOS7 host serving NFS. Rock solid...

Shared storage ideas...vSphere 6.5

Member

Member

Moderator

Moderator

Member

Moderator

Attachments

Moderator

Member

Moderator

Member

Moderator

Member

Member

Member

Member

Member

Moderator

Member

Member

Active Member