Quanta LB6M 10GBE Switch and VSAN Setup

Emulsifide · Mar 15, 2017

alex1002 said:
Ouch I'm a failure I wish I could figure this one out. Looks like switch cannot ping any of the hosts via the vsan ip

Sent from my iPhone using Tapatalk

Keep it simple. Reset the switch settings to 1500 MTU for starters. Set each physical connection for the switch to trunk. Set up a vDs for the nodes to use that has defined vmkernels for vmotion, vsan, and management. Use different VLANS for each one to isolate the different networks. From there, start pinging each host using ping -I vmkX ipaddress. The -I is to isolate your pings to a specific interface. Once everything looks good, bump up the MTU and make sure everything is still pingable. When good, provision the VSAN by enabling it on the cluster and creating your disk groups.

alex1002 · Mar 15, 2017

Emulsifide said:
Keep it simple. Reset the switch settings to 1500 MTU for starters. Set each physical connection for the switch to trunk. Set up a vDs for the nodes to use that has defined vmkernels for vmotion, vsan, and management. Use different VLANS for each one to isolate the different networks. From there, start pinging each host using ping -I vmkX ipaddress. The -I is to isolate your pings to a specific interface. Once everything looks good, bump up the MTU and make sure everything is still pingable. When good, provision the VSAN by enabling it on the cluster and creating your disk groups.

I'm going to try this tomorrow. Thank you for the advise.

Sent from my iPhone using Tapatalk

alex1002 · Mar 20, 2017

I tried your advise still no luck. It does the samething, everything is green. Then a host go to a different group.

Emulsifide · Mar 20, 2017

At this point, you need more information to get to the bottom of the problem then. Is all of your hardware on the VSAN HCL (I don't think I see everything in your past screenshots)? Does your health check for HCL What are the events that you see in the event log for the host that lead up to the VSAN partition split?

I recommend some light reading from Chapter 10 and 11 in the following guide:

https://www.vmware.com/content/dam/...san/vsan-troubleshooting-reference-manual.pdf

Give some of the suggestions in there a shot and let us know what you discover.

alex1002 · Mar 22, 2017

I read the document. At this point it's for sure something I miss configured on the switch. And I am confident it is the multicasting settings. Which I read the manuals for the switch and none of the commands to enable ip igmp worked

Sent from my Nexus 6P using Tapatalk

alex1002 · Mar 24, 2017

DaSaint said:
Yea this sounds like Jumbo Frames type of issues... are you on 1500 MTU? ensure that all VMKernels used for vSAN have 1500 MTU cause a mismatch could cause what you are seeing... another easy way to see is in the cmd line.

start with these to start looking at how they are configured...

check the adapters and what MTU they are reading
esxcli network nic list
or
esxcfg-nics -l
[root@SNODE2V:~] esxcfg-nics -l
Name PCI Driver Link Speed Duplex MAC Address M TU Description
vmnic0 0000:04:00.0 igbn Up 1000Mbps Full 0c:c4:7a:84:4a:7e 1 500 Intel Corporation I350 Gigabit Network Connection
vmnic1 0000:04:00.1 igbn Down 0Mbps Half 0c:c4:7a:84:4a:7f 1 500 Intel Corporation I350 Gigabit Network Connection
vmnic1000202 0000:02:00.0 nmlx4_en Up 10000Mbps Full f4:52:14:60:53:01 9 000 Mellanox Technologies MT27500 Family [ConnectX-3]
vmnic2 0000:02:00.0 nmlx4_en Down 0Mbps Half f4:52:14:60:53:00 1 500 Mellanox Technologies MT27500 Family [ConnectX-3]

check the kernels on all hosts see what MTU they are reading
esxcli network ip interface list
or
esxcfg-vmknic -l

[root@SNODE2V:~] esxcfg-vmknic -l
Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
vmk0 Management Network IPv4 192.168.0.26 255.255.255.0 192.168.0 .255 0c:c4:7a:84:4a:7e 1500 65535 true STATIC defaultTcpipStack
vmk0 Management Network IPv6 fe80::ec4:7aff:fe84:4a7e 64 0c:c4:7a:84:4a:7e 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk1 VSAN IPv4 192.168.5.2 255.255.255.248 192.168.5 .7 00:50:56:6b:9e:0d 9000 65535 true STATIC defaultTcpipStack
vmk1 VSAN IPv6 fe80::250:56ff:fe6b:9e0d 64 00:50:56:6b:9e:0d 9000 65535 true STATIC, PREFERRED defaultTcpipStack

next thing i would check is to make sure Multicast isnt getting hit as vSAN uses Multicast to communicate against other nodes as well
do this on all 3 hosts... typically if theres a multicast type issue its in the physical switch.

check your vsan multicast settings.
esxcli vsan network list

if your switch has IGMP Querier turned on for the VLAN/Ports you are using. See for example
Quanta LB6M (10GbE) -- Discussion

[root@SNODE2V:~] esxcfg-vmknic -l
Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
vmk0 Management Network IPv4 192.168.0.26 255.255.255.0 192.168.0 .255 0c:c4:7a:84:4a:7e 1500 65535 true STATIC defaultTcpipStack
vmk0 Management Network IPv6 fe80::ec4:7aff:fe84:4a7e 64 0c:c4:7a:84:4a:7e 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk1 VSAN IPv4 192.168.5.2 255.255.255.248 192.168.5 .7 00:50:56:6b:9e:0d 9000 65535 true STATIC defaultTcpipStack
vmk1 VSAN IPv6 fe80::250:56ff:fe6b:9e0d 64 00:50:56:6b:9e:0d 9000 65535 true STATIC, PREFERRED defaultTcpipStack

alex1002 · Mar 27, 2017

(FASTPATH Routing) #show run
!Current Configuration:
!
!System Description "Quanta LB6M, 1.2.0.14, Linux 2.6.21.7"
!System Software Version "1.2.0.14"
!System Up Time "0 days 0 hrs 12 mins 7 secs"
!Additional Packages FASTPATH QOS
!Current SNTP Synchronized Time: Not Synchronized
!
vlan database
vlan 20
vlan routing 20 20
exit
configure
ip routing
aaa authentication enable "enableList" enable
line console
exit
line telnet
exit
line ssh
exit
spanning-tree configuration name "60-EB-69-BA-BF-72"
!
interface 0/2

--More-- or (q)uit

set igmp
vlan pvid 20
vlan participation include 20
exit
interface 0/3
vlan pvid 20
vlan participation include 20
exit
interface 0/4
vlan pvid 20
vlan participation include 20
exit
interface 0/5
vlan pvid 20
vlan participation include 20
exit
interface 2/20
routing
ip address 192.168.5.5 255.255.255.248
exit
router rip
exit
router ospf
exit

--More-- or (q)uit

exit

alex1002 · Mar 27, 2017

Update: I took down one of the hosts. and now it passes allmost everything host connectivity. I tired to ping this host via VSAN interface from the Switch and this is the only host that fails ping...

(FASTPATH Routing) #ping 10.10.10.2
Pinging 10.10.10.2 with 0 bytes of data:
(FASTPATH Routing) #ping 10.10.10.3
Pinging 10.10.10.3 with 0 bytes of data:

Reply From 10.10.10.3: icmp_seq = 0. time= 4772 usec.

----10.10.10.3 PING statistics----
1 packets transmitted, 1 packets received, 0% packet loss
round-trip (msec) min/avg/max = 4/4/4

(FASTPATH Routing) #ping 10.10.10.4
Pinging 10.10.10.4 with 0 bytes of data:

Reply From 10.10.10.4: icmp_seq = 0. time= 4785 usec.

----10.10.10.4 PING statistics----
1 packets transmitted, 1 packets received, 0% packet loss
round-trip (msec) min/avg/max = 4/4/4

(FASTPATH Routing) #ping 10.10.10.5
Pinging 10.10.10.5 with 0 bytes of data:

Reply From 10.10.10.5: icmp_seq = 0. time= 4773 usec.

----10.10.10.5 PING statistics----
1 packets transmitted, 1 packets received, 0% packet loss
round-trip (msec) min/avg/max = 4/4/4

I dont understand why one host fails. I try to remove the config and re-add same issue.

Emulsifide · Mar 27, 2017

Try pulling the 4th host out completely (re-add it, destroy the disk groups on it, and then move it out of the cluster) and see if things stabilize. If they do, you obviously have an issue with a network port (if you've kept the host on the same port since starting this), a cable/transceiver issue, or a NIC problem.

Naim · Oct 10, 2022

What is the command to show the mac address?

Search

Quanta LB6M 10GBE Switch and VSAN Setup

Emulsifide

Active Member

alex1002

Member

alex1002

Member

Emulsifide

Active Member

alex1002

Member

alex1002

Member

alex1002

Member

alex1002

Member

Emulsifide

Active Member

Naim

New Member