Quanta LB6M 10GBE Switch and VSAN Setup

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Emulsifide

Active Member
Dec 1, 2014
212
93
28
Ouch :( I'm a failure I wish I could figure this one out. Looks like switch cannot ping any of the hosts via the vsan ip


Sent from my iPhone using Tapatalk
Keep it simple. Reset the switch settings to 1500 MTU for starters. Set each physical connection for the switch to trunk. Set up a vDs for the nodes to use that has defined vmkernels for vmotion, vsan, and management. Use different VLANS for each one to isolate the different networks. From there, start pinging each host using ping -I vmkX ipaddress. The -I is to isolate your pings to a specific interface. Once everything looks good, bump up the MTU and make sure everything is still pingable. When good, provision the VSAN by enabling it on the cluster and creating your disk groups.
 
  • Like
Reactions: alex1002

alex1002

Member
Apr 9, 2013
519
19
18
Keep it simple. Reset the switch settings to 1500 MTU for starters. Set each physical connection for the switch to trunk. Set up a vDs for the nodes to use that has defined vmkernels for vmotion, vsan, and management. Use different VLANS for each one to isolate the different networks. From there, start pinging each host using ping -I vmkX ipaddress. The -I is to isolate your pings to a specific interface. Once everything looks good, bump up the MTU and make sure everything is still pingable. When good, provision the VSAN by enabling it on the cluster and creating your disk groups.
I'm going to try this tomorrow. Thank you for the advise.


Sent from my iPhone using Tapatalk
 

alex1002

Member
Apr 9, 2013
519
19
18
I tried your advise still no luck. It does the samething, everything is green. Then a host go to a different group.
 

Emulsifide

Active Member
Dec 1, 2014
212
93
28
At this point, you need more information to get to the bottom of the problem then. Is all of your hardware on the VSAN HCL (I don't think I see everything in your past screenshots)? Does your health check for HCL What are the events that you see in the event log for the host that lead up to the VSAN partition split?

I recommend some light reading from Chapter 10 and 11 in the following guide:

https://www.vmware.com/content/dam/...san/vsan-troubleshooting-reference-manual.pdf

Give some of the suggestions in there a shot and let us know what you discover.
 

alex1002

Member
Apr 9, 2013
519
19
18
I read the document. At this point it's for sure something I miss configured on the switch. And I am confident it is the multicasting settings. Which I read the manuals for the switch and none of the commands to enable ip igmp worked :(

Sent from my Nexus 6P using Tapatalk
 

alex1002

Member
Apr 9, 2013
519
19
18
Yea this sounds like Jumbo Frames type of issues... are you on 1500 MTU? ensure that all VMKernels used for vSAN have 1500 MTU cause a mismatch could cause what you are seeing... another easy way to see is in the cmd line.

start with these to start looking at how they are configured...

check the adapters and what MTU they are reading
esxcli network nic list
or
esxcfg-nics -l
[root@SNODE2V:~] esxcfg-nics -l
Name PCI Driver Link Speed Duplex MAC Address M TU Description
vmnic0 0000:04:00.0 igbn Up 1000Mbps Full 0c:c4:7a:84:4a:7e 1 500 Intel Corporation I350 Gigabit Network Connection
vmnic1 0000:04:00.1 igbn Down 0Mbps Half 0c:c4:7a:84:4a:7f 1 500 Intel Corporation I350 Gigabit Network Connection
vmnic1000202 0000:02:00.0 nmlx4_en Up 10000Mbps Full f4:52:14:60:53:01 9 000 Mellanox Technologies MT27500 Family [ConnectX-3]
vmnic2 0000:02:00.0 nmlx4_en Down 0Mbps Half f4:52:14:60:53:00 1 500 Mellanox Technologies MT27500 Family [ConnectX-3]


check the kernels on all hosts see what MTU they are reading
esxcli network ip interface list
or
esxcfg-vmknic -l

[root@SNODE2V:~] esxcfg-vmknic -l
Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
vmk0 Management Network IPv4 192.168.0.26 255.255.255.0 192.168.0 .255 0c:c4:7a:84:4a:7e 1500 65535 true STATIC defaultTcpipStack
vmk0 Management Network IPv6 fe80::ec4:7aff:fe84:4a7e 64 0c:c4:7a:84:4a:7e 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk1 VSAN IPv4 192.168.5.2 255.255.255.248 192.168.5 .7 00:50:56:6b:9e:0d 9000 65535 true STATIC defaultTcpipStack
vmk1 VSAN IPv6 fe80::250:56ff:fe6b:9e0d 64 00:50:56:6b:9e:0d 9000 65535 true STATIC, PREFERRED defaultTcpipStack


next thing i would check is to make sure Multicast isnt getting hit as vSAN uses Multicast to communicate against other nodes as well
do this on all 3 hosts... typically if theres a multicast type issue its in the physical switch.

check your vsan multicast settings.
esxcli vsan network list

if your switch has IGMP Querier turned on for the VLAN/Ports you are using. See for example
Quanta LB6M (10GbE) -- Discussion
[root@SNODE2V:~] esxcfg-vmknic -l
Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
vmk0 Management Network IPv4 192.168.0.26 255.255.255.0 192.168.0 .255 0c:c4:7a:84:4a:7e 1500 65535 true STATIC defaultTcpipStack
vmk0 Management Network IPv6 fe80::ec4:7aff:fe84:4a7e 64 0c:c4:7a:84:4a:7e 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk1 VSAN IPv4 192.168.5.2 255.255.255.248 192.168.5 .7 00:50:56:6b:9e:0d 9000 65535 true STATIC defaultTcpipStack
vmk1 VSAN IPv6 fe80::250:56ff:fe6b:9e0d 64 00:50:56:6b:9e:0d 9000 65535 true STATIC, PREFERRED defaultTcpipStack
 

alex1002

Member
Apr 9, 2013
519
19
18
(FASTPATH Routing) #show run
!Current Configuration:
!
!System Description "Quanta LB6M, 1.2.0.14, Linux 2.6.21.7"
!System Software Version "1.2.0.14"
!System Up Time "0 days 0 hrs 12 mins 7 secs"
!Additional Packages FASTPATH QOS
!Current SNTP Synchronized Time: Not Synchronized
!
vlan database
vlan 20
vlan routing 20 20
exit
configure
ip routing
aaa authentication enable "enableList" enable
line console
exit
line telnet
exit
line ssh
exit
spanning-tree configuration name "60-EB-69-BA-BF-72"
!
interface 0/2

--More-- or (q)uit

set igmp
vlan pvid 20
vlan participation include 20
exit
interface 0/3
vlan pvid 20
vlan participation include 20
exit
interface 0/4
vlan pvid 20
vlan participation include 20
exit
interface 0/5
vlan pvid 20
vlan participation include 20
exit
interface 2/20
routing
ip address 192.168.5.5 255.255.255.248
exit
router rip
exit
router ospf
exit

--More-- or (q)uit

exit
 

alex1002

Member
Apr 9, 2013
519
19
18
Update: I took down one of the hosts. and now it passes allmost everything host connectivity. I tired to ping this host via VSAN interface from the Switch and this is the only host that fails ping...


VSAN.JPG

(FASTPATH Routing) #ping 10.10.10.2
Pinging 10.10.10.2 with 0 bytes of data:
(FASTPATH Routing) #ping 10.10.10.3
Pinging 10.10.10.3 with 0 bytes of data:

Reply From 10.10.10.3: icmp_seq = 0. time= 4772 usec.

----10.10.10.3 PING statistics----
1 packets transmitted, 1 packets received, 0% packet loss
round-trip (msec) min/avg/max = 4/4/4

(FASTPATH Routing) #ping 10.10.10.4
Pinging 10.10.10.4 with 0 bytes of data:

Reply From 10.10.10.4: icmp_seq = 0. time= 4785 usec.

----10.10.10.4 PING statistics----
1 packets transmitted, 1 packets received, 0% packet loss
round-trip (msec) min/avg/max = 4/4/4

(FASTPATH Routing) #ping 10.10.10.5
Pinging 10.10.10.5 with 0 bytes of data:

Reply From 10.10.10.5: icmp_seq = 0. time= 4773 usec.

----10.10.10.5 PING statistics----
1 packets transmitted, 1 packets received, 0% packet loss
round-trip (msec) min/avg/max = 4/4/4

I dont understand why one host fails. I try to remove the config and re-add same issue.
 

Emulsifide

Active Member
Dec 1, 2014
212
93
28
Try pulling the 4th host out completely (re-add it, destroy the disk groups on it, and then move it out of the cluster) and see if things stabilize. If they do, you obviously have an issue with a network port (if you've kept the host on the same port since starting this), a cable/transceiver issue, or a NIC problem.