10G fiber link between VMWare and TrueNAS Scale

Janus006

New Member
Jan 28, 2021
12
2
3
I’m trying to configure a 10Gb fiber link between my two ESX and my TrueNAS Scale. I configured all my IP address correctly (subnet/netmask) but I’m unable to ping the destination interface. As the connection are direct between both machines, vlan is not configured.

My first idea is to think my configuration in VMWare is not correct.

The links are shown as UP, I created my vswitch, vmkernel interfaces, configure jumbo frame on both side, but unable to PING from any side.

Have you some reference/guide to help me with my configuration ?

The objective is to add iSCSI datastore to my VMWare

Thanks
 

Janus006

New Member
Jan 28, 2021
12
2
3
ESX1 10.0.30.31/24
ESX2 10.0.31.31/24

TrueNAS_FC_int1 10.0.30.254/24
TrueNAS_FC_int2 10.0.31.254/24
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
5,566
1,214
113
I think i had a similar issue years ago - i believe i solved it by adding a gateway vm and using that for the vmk.
Didn't make any sense on a local connection but i think it helped.
 

itronin

Well-Known Member
Nov 24, 2018
673
383
63
Denver, Colorado
ESX1 10.0.30.31/24
ESX2 10.0.31.31/24

TrueNAS_FC_int1 10.0.30.254/24
TrueNAS_FC_int2 10.0.31.254/24
so you don't have any other interfaces/vmks on your ESXI system (no management for exmaple), and no other interfaces on your TNS? Do you have DefGW's configured on your ESXI hosts and your TNS? Just trying to provide you with another set of eyes.

You didn't mention (though you kind of implied) whether you can ping the local interface on each system. Ie. ESX1 can ping 10.0.30.31? TNS can ping either of the .254's?

I have not had to do what @Rand__ suggested but I have seen instances on ESX where you don't have a def GW configured on an ESX host and the network behaves oddly.

Lastly you said fiber and direct connect. Did you try flipping the the strands to make your direct connect a crossover?
 

Janus006

New Member
Jan 28, 2021
12
2
3
so you don't have any other interfaces/vmks on your ESXI system (no management for exmaple), and no other interfaces on your TNS? Do you have DefGW's configured on your ESXI hosts and your TNS? Just trying to provide you with another set of eyes.
Sorry, I beleived you wanted to know about the interconnection only ...

Here are settings:
For the moment, I'm configuring only TNS and the ESX2 (as POC)
ESX1:
vmk Mgmt 192.168.50.30/24 with a gateway
vmk vMotion 10.10.10.30/24
vmk iSCSI 10.0.30.30/24 (Not connected for now)

ESX2:
vmk Mgmt 192.168.50.31/24 with a gateway
vmk vMotion 10.10.10.31/24
vmk iSCSI 10.0.31.31/24

a vSwitch for my vmotion configured with a dedicated 10G adapter (ESX1/ESX2)
a vSwitch for iSCSI configured with a dedicated 10G adapter ESX2 only
a vSwitch for a planned HA with a dedicated 1G adapter (ESX1/ESX2)

TNS:
bond with 2x 1G interfaces 192.168.50.254/24
1FC conencted to ESX2 10.0.31.254/24
1FC not-connected to ESX1 10.0.30.254/24 (planned)

I don't want to use 172.16.x.x on iSCSI because TNS is using these subnet for docker/apps and may cause issues
 
  • Like
Reactions: itronin

Janus006

New Member
Jan 28, 2021
12
2
3
You didn't mention (though you kind of implied) whether you can ping the local interface on each system. Ie. ESX1 can ping 10.0.30.31? TNS can ping either of the .254's?
I'm only able to ping the local IP on any machine
ESX2 can only ping 10.0.31.31 and TNS can only ping 10.0.31.254, so communication does not seems to work at all
 

itronin

Well-Known Member
Nov 24, 2018
673
383
63
Denver, Colorado
Not sure if you saw this at the end of my post:

Lastly you said fiber and direct connect. Did you try flipping the the strands to make your direct connect a crossover?

Looks like your vmotion interfaces are direct connect fiber too? are you able to ping the other sides of your vmotion connections? e.g. ESX1 to ESX2 and vice-versa?
 

Janus006

New Member
Jan 28, 2021
12
2
3
Sorry, did not saw the last comment/line

As they are directly connected, I will try to swap the strands. (FC adapter are not wise enough to detect direct connection like RJ45 can now do ?)

And non, I cannot ping from vMotion interface, but I beleived it was because the stack was vMotion and not TCP-IP
 

Janus006

New Member
Jan 28, 2021
12
2
3
I just flipped them and now, I lost the physical link.So it does not seems it was the issue :(
 

Rand__

Well-Known Member
Mar 6, 2014
5,566
1,214
113
But you use vmkping and not regular ping , don't you? Just to be sure...
Same issue if you direct connect the two esxi hosts or the two scale ones?
 

itronin

Well-Known Member
Nov 24, 2018
673
383
63
Denver, Colorado
I just flipped them and now, I lost the physical link.So it does not seems it was the issue :(
yes it was a shot in the dark. didn't know what kind of f/o cards you had nor how old they were.

And non, I cannot ping from vMotion interface, but I beleived it was because the stack was vMotion and not TCP-IP
uhmmm... hmmmm.... I have not seen an instance where I cannot ping the vmotion configured vmk.. in fact if I can't do that I assume something is wrong and starting working the configuration to figure out why...

for example:

two hosts, each with vmotion vmk configured. One at 242.81 and one at 242.61. Screenshot of vmk configuration.
vmotion-vmk.png

I enabled shell on 250.61 and logged in to test via ping, first to itself and then between the two esxi hosts:

[root@vmware61:~] ping -I vmk2 192.168.242.61
PING 192.168.242.61 (192.168.242.61): 56 data bytes
64 bytes from 192.168.242.61: icmp_seq=0 ttl=64 time=0.088 ms
64 bytes from 192.168.242.61: icmp_seq=1 ttl=64 time=0.065 ms
64 bytes from 192.168.242.61: icmp_seq=2 ttl=64 time=0.046 ms

--- 192.168.242.61 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.046/0.066/0.088 ms

[root@vmware61:~]
[root@vmware61:~] ping -I vmk2 192.168.242.81
PING 192.168.242.81 (192.168.242.81): 56 data bytes
64 bytes from 192.168.242.81: icmp_seq=0 ttl=64 time=0.267 ms
64 bytes from 192.168.242.81: icmp_seq=1 ttl=64 time=0.417 ms
64 bytes from 192.168.242.81: icmp_seq=2 ttl=64 time=0.407 ms

--- 192.168.242.81 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.267/0.364/0.417 ms

[root@vmware61:~]

my configuration is a bit different. both hosts are connected via AOC to an ICX6610, VLANS for vmotion, storage and obs not using dedicated interfaces...

Edit: Do you have the same optics in the cards in all three hosts? Are they all SFP+ 10Gbe optics or QSFP 40gbe optics?
Edit: Might as well tell us your optics parts and your NIC parts...

I really think you should be able to ping your vmotion interfaces from each side.

if that's the case then that shows your issue is not related to TNS - ESXI if ESXI-ESXI doesn't work... At least that's how it looks to me.

Same type of NIC in all three hosts? ESX1, ESX2 and TNS?

Edit: I used both ping and vmkping to test on my end

[root@vmware61:~] vmkping 192.168.242.81
PING 192.168.242.81 (192.168.242.81): 56 data bytes
64 bytes from 192.168.242.81: icmp_seq=0 ttl=64 time=0.354 ms
64 bytes from 192.168.242.81: icmp_seq=1 ttl=64 time=0.441 ms
64 bytes from 192.168.242.81: icmp_seq=2 ttl=64 time=0.433 ms

--- 192.168.242.81 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.354/0.409/0.441 ms
 
Last edited:

Janus006

New Member
Jan 28, 2021
12
2
3
Now switching to vMotion stack ... (vmnic5 for vmotion / vmnic4 for iSCSI)
---------------------------------------------------
As I can see, your interface for vmotion is configured with TCP-IP stack. I used vMotion stack (read somewhere is better, is it really ?)

Just to be sure, I added a vlan, to "isolate traffic", just to be sure...
Same issue.
I also found command to ping with vmption stack

esxcli network diag ping -I vmk1 --netstack=vmotion -H 10.10.10.31
No communication, but link is UP

I also tried with same config, except Standard TPC/IP Stack, same result


For the Parts, here are the network card used:
I have SFP+ on each of them

ESX1: esxcli network nic list
vmnic4 0000:41:00.0 qfle3 Up Down 0 Half 00:0e:1e:b4:1e:30 1500 QLogic Inc. QLogic 57810 10 Gigabit Ethernet Adapter
vmnic5 0000:41:00.1 qfle3 Up Up 10000 Full 00:0e:1e:b4:1e:32 9000 QLogic Inc. QLogic 57810 10 Gigabit Ethernet Adapter


ESX2: esxcli network nic list
vmnic4 0000:05:00.0 qfle3 Up Up 10000 Full 00:0a:f7:5a:b5:10 9000 QLogic Corporation NetXtreme II BCM57810 10 Gigabit Ethernet
vmnic5 0000:05:00.1 qfle3 Up Up 10000 Full 00:0a:f7:5a:b5:12 9000 QLogic Corporation NetXtreme II BCM57810 10 Gigabit Ethernet

TNS: lspci | grep Ethernet
03:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01)
03:00.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01)
And the interface linked to ESX2:
enp3s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000


Net_config.png

Thank you very much for your help/time...
 

itronin

Well-Known Member
Nov 24, 2018
673
383
63
Denver, Colorado
so my thoughts, and you have a very odd problem for sure...
enp3s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
When things get weird try and make them as simple as they can be... you have no switch involved here so this shouldn't matter but I see you have Jumbos enabled on the TNS side. Don't know if you did that on the vMotion nics as well. Revert it to default.

Simplest config first and then try advanced stuff. My gut says you have something else going on so I'll be surprised if this fixes it.
there's something baselevel and its affecting all three systems.

so I'd turn off Jumbo on any interfaces on which you've enabled it and and work it from there.
 

Janus006

New Member
Jan 28, 2021
12
2
3
Simplest config first and then try advanced stuff.
You're completely right. I've reconfigures frame size to 1500 and still the same issue.

A configuration i'm not sure. As we configured everything on the Standard TCP/IP stack, when I verified the config of the stack, I see the default GW it set for this stack. As we configure everything on the TCP/IP stack, does all interface qill try to ise it ? I know, "normally" if you're on the same subnet, you're not supposed to use your GW, but ... only a guess ...
default_gw_ipstack.png
default_gw_ipstack.png
 

itronin

Well-Known Member
Nov 24, 2018
673
383
63
Denver, Colorado
At this point I'd be pulling my hair out. Don't know if this is feasible, but if you are more familiar with another operating system and have spare disks/ssds you might swap drives out install that OS to get even simpler in your config to perform additional testing. If you can't get that to work maybe use some sniffer software on one end and see what your pings show up as...

I'm not familiar with the qlogic or emulex nics so I am not sure what options they might have that can be configured. If these were used somewhere else before you received them its possible they were specifically configured for a use case and that (or those) option(s) is (are) interfering with your use.
 

Janus006

New Member
Jan 28, 2021
12
2
3
Thanks for your help. I will continu top look at the problem. I'm curently trying to see something with tcpdump ...
 

kapone

Well-Known Member
May 23, 2015
1,044
615
113
Interesting problem. The Qlogic 57810 Nics are nothing special or exotic, they should work. I have a few Tyan boards that have these, and ESXI works OOTB on those...however, I'm not using a direct connect config like yours. So, let's see if we can create a checklist to troubleshoot.

1. ESXI is very finicky about DNS. So, disconnect the ESX servers from TNS, make sure these 10g nics that you plan to use for direct connect are not used for anything else in ESXI.
2. Install (assuming not installed yet) and configure ESXI, do the standard MGMT, vMotion etc networking, and make sure that is working as expected.
3. Take one ESX host, direct connect it to the TNS box.
4. Running "esxcli network nic list" should now show that NIC as "Connected/Up", whether you've defined an IP address on it or not.
5. If all you want on each ESX box for this NIC, is a direct connection to the TNS box, make sure that this connection/route is not doing anything else. Which in turn means, you don't need VLANs or anything like that.
6. Create a vSwitch tied to that NIC. Standard config, keep mtu at default, jumbo frames can be configured later, assuming everything works.
7. What you need (I think) at this point, is to define a VMKernel NIC, which is what will be configured with a static IP, provides the routing/connectivity for that subnet, and any upstream services (like iSCSI) will need to be configured with IP address based hosts (we don't have DNS on this direct connection)
8. Create a standard VMKernel NIC --> New Port group (give it a meaningful name) --> VLAN ID = 0 --> MTU = 1500 --> IP version IPv4 only -->IPv4 settings = static --> Give it an IP address and subnet mask --> Default TCP/IP stack --> Don't check any of the services checkboxes (yet)

This assumes that you've already done the config on the TNS box with a static IP as well. Test...

It should work.
 

Janus006

New Member
Jan 28, 2021
12
2
3
Hey, I solved my communication issue ... and sorry guys to make you loose time and effort. It was fairly a code 18 :(
Unplug every cable and connect only the one you're working on will help you to be sure the vmnic you're working on is the right one...
I was working on the wrong vmnic since the begining.
Now i'm able to ping. I will configure storage and other stuff later.

Thanks all for your help
 
  • Like
Reactions: itronin