Quanta LB6M 10GBE Switch and VSAN Setup

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

alex1002

Member
Apr 9, 2013
519
19
18
Good Day,
I need some help with this setup and VSAN. I am testing 4 node VSAN Cluster. I have one dedicated switch LB6M and each server its own 10GBE PORT for VSAN.

For some reason every 2-3 minutes one of the nodes becomes part for network partition 2 and a new group 2 is created.

Under network partition theres group 1 and all four hosts are in it, every 2-3 minutes one host moves its self to network partition 2 and I see a new group 2 is created.

Anyone can please help me with this switch and VSAN.

Thank you
 

alex1002

Member
Apr 9, 2013
519
19
18
NO advsie whats so ever? The vsan stays fine for a few moments, or even hours then when I try to add vms to the VSAN Datastore it timesouts and see I see the network partition group is been moved to another group 2.
 

Emulsifide

Active Member
Dec 1, 2014
212
93
28
I'm not familiar with the LB6M, but I have used an LB4M, so I'm assuming the configuration environment is the same (except for the fact that the interconnects themselves are significantly different).

I'm also not familiar with network partitioning. What is the purpose of it?

VSAN works great on a flat network with all VMware functionality (VSAN, vMotion, Management, VM Data) separated into VLANs. Use a vDs switch across the nodes with network resource pools and NIOC to shape your traffic per the VSAN guidelines white paper:

https://www.vmware.com/content/dam/...n/virtual-san-6.2-design-and-sizing-guide.pdf
 

alex1002

Member
Apr 9, 2013
519
19
18
I tihnk my swithc config is messed up. Vmware requires multicast for the VSAN Traffic/switch. Not sure what to use on the LB6M to use do this. But looking at the manual is it the same LB4m.


Multicast is a network requirement for Virtual SAN. Multicast is used to
discover ESXi hosts participating in the cluster as well as to keep track
of changes within the cluster. It is mandatory to ensure that multicast
traffic is allowed between all the nodes participating in a Virtual SAN
cluster.
Multicast performance is also important, so one should ensure a high
quality enterprise switch is used. If a lower-end switch is used for Virtual
SAN, it should be explicitly tested for multicast performance, as unicast
performance is not an indicator of multicast performance. Multicast
Virtual SAN 6.2 Design and Sizing Guide
VMwa re Stora g e a nd A v a ila b ili ty Doc um e nt a ti o n / 2 2
performance can be tested by the Virtual SAN Health Service. While
IPv6 is supported verify multicast performance as older networking
gear may struggle with IPv6 multicast performance.
 

Emulsifide

Active Member
Dec 1, 2014
212
93
28
If you suspect multicast is not working, do the multicast performance test under Monitor, Virtual SAN, Proactive tests. Your VSAN health check should also be screaming at you about multicast with some major alarms.
 

Emulsifide

Active Member
Dec 1, 2014
212
93
28
That just shows the alarm and how it's triggered. Let's see a screenshot of your VSAN health status overall. Expand anything that has a major alarm. like this (which is available when you select your VSAN cluster on the left-hand side of the vCenter client):

upload_2017-3-8_10-18-54.png
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
If you suspect multicast is not working, do the multicast performance test under Monitor, Virtual SAN, Proactive tests.
At your screenshot way down within proactive tests;)

And is it always the same host or different ones?
Have you checked whether the multicast config is the same on all nodes?
Have you run large pings on your vsan vmk? There were some MTU errors too in your screenshot
 

Emulsifide

Active Member
Dec 1, 2014
212
93
28
Thanks for the screenshot. I understand now!! By network partition, you mean the VSAN cluster itself split off into two separate partitions because the error timeout has elapsed. This means, the VSAN is treating one or more nodes as failed and is now ignoring it completely. In your left-hand view in vCenter, do you have one or more nodes that are failing to communicate with another node?

Are you using jumbo frames? The MTU issue that @Rand__ has brought up is definitely of concern. What does your virtual network infrastructure look like? Are you using vSwitches or vDs?
 

alex1002

Member
Apr 9, 2013
519
19
18
Hi guys,
This is the issue, all the sudden the network health is back to normal. Then after 5-10minues, even 4 hours it goes back to failed.
Network.JPG
 

alex1002

Member
Apr 9, 2013
519
19
18
Thanks for the screenshot. I understand now!! By network partition, you mean the VSAN cluster itself split off into two separate partitions because the error timeout has elapsed. This means, the VSAN is treating one or more nodes as failed and is now ignoring it completely. In your left-hand view in vCenter, do you have one or more nodes that are failing to communicate with another node?

Are you using jumbo frames? The MTU issue that @Rand__ has brought up is definitely of concern. What does your virtual network infrastructure look like? Are you using vSwitches or vDs?
I have dedicated network on each host with its own controller for VSAN, static IP each. MTU 9000 on the ports and also on the switch ports.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
I had a similar behavior when I was using the witness appliance a while back but you said you are on 4 phys hosts...
All have the same patch level?

Still the question whether its always the same host or if they are moving/random
 

DaSaint

Active Member
Oct 3, 2015
282
79
28
Colorado
Yea this sounds like Jumbo Frames type of issues... are you on 1500 MTU? ensure that all VMKernels used for vSAN have 1500 MTU cause a mismatch could cause what you are seeing... another easy way to see is in the cmd line.

start with these to start looking at how they are configured...

check the adapters and what MTU they are reading
esxcli network nic list
or
esxcfg-nics -l

check the kernels on all hosts see what MTU they are reading
esxcli network ip interface list
or
esxcfg-vmknic -l

next thing i would check is to make sure Multicast isnt getting hit as vSAN uses Multicast to communicate against other nodes as well
do this on all 3 hosts... typically if theres a multicast type issue its in the physical switch.

check your vsan multicast settings.
esxcli vsan network list

if your switch has IGMP Querier turned on for the VLAN/Ports you are using. See for example
Quanta LB6M (10GbE) -- Discussion
 

alex1002

Member
Apr 9, 2013
519
19
18
(FASTPATH Routing) #show igmpsnooping

Admin Mode..................................... Disable
Multicast Control Frame Count.................. 0
IGMP Router-Alert check........................ Disabled
Interfaces Enabled for IGMP Snooping........... None
VLANs enabled for IGMP snooping................ None


I cant find a way to enable it. Please advise
I tried ip igmpsnooping interfacemode
 

alex1002

Member
Apr 9, 2013
519
19
18
Heres my switch config

!System Description "Quanta LB6M, 1.2.0.14, Linux 2.6.21.7"

!System Software Version "1.2.0.14"

!System Up Time "7 days 6 hrs 41 mins 55 secs"

!Additional Packages FASTPATH QOS

!Current SNTP Synchronized Time: Not Synchronized

!

network protocol none

vlan database

vlan 2

vlan routing 2 1

exit

configure

ip routing

aaa authentication enable "enableList" enable

line console

exit

line telnet

exit

line ssh

exit

spanning-tree configuration name "60-EB-69-BA-BF-72"

!


--More-- or (q)uit


set igmp

interface 0/1

mtu 9216

vlan pvid 2

vlan participation include 2

vlan tagging 2

exit

interface 0/2

mtu 9216

vlan pvid 2

vlan participation include 2

vlan tagging 2

exit

interface 0/3

mtu 9216

vlan pvid 2

vlan participation include 2

vlan tagging 2

exit

interface 0/4

mtu 9216

vlan pvid 2

vlan participation include 2

vlan tagging 2


--More-- or (q)uit


exit

interface 0/5

mtu 9216

exit

interface 0/6

mtu 9216

exit

interface 0/7

mtu 9216

exit

interface 0/8

mtu 9216

exit

interface 0/9

mtu 9216

exit

interface 0/10

mtu 9216

exit

interface 0/11

mtu 9216

exit

interface 0/12

mtu 9216


--More-- or (q)uit


exit

interface 0/13

mtu 9216

exit

interface 0/14

mtu 9216

exit

interface 0/15

mtu 9216

exit

interface 0/16

mtu 9216

exit

interface 0/17

mtu 9216

exit

interface 0/18

mtu 9216

exit

interface 0/19

mtu 9216

exit

interface 0/20

mtu 9216


--More-- or (q)uit


exit

interface 0/21

mtu 9216

exit

interface 0/22

mtu 9216

exit

interface 0/23

mtu 9216

exit

interface 0/24

mtu 9216

exit

interface 0/25

mtu 9216

exit

interface 0/26

mtu 9216

exit

interface 0/27

mtu 9216

exit

interface 0/28

mtu 9216


--More-- or (q)uit


exit

interface 2/1

routing

exit

router rip

exit

router ospf

exit

exit
 

alex1002

Member
Apr 9, 2013
519
19
18
Age Time (seconds)............................. 1200
Response Time (seconds)........................ 1
Retries........................................ 4
Cache Size..................................... 6144
Dynamic Renew Mode ............................ Disable
Total Entry Count Current / Peak .............. 0 / 0
Static Entry Count Configured / Active / Max .. 0 / 0 / 128

IP Address MAC Address Interface Type Age
--------------- ----------------- -------------- -------- -----------
 

alex1002

Member
Apr 9, 2013
519
19
18
Ouch :( I'm a failure I wish I could figure this one out. Looks like switch cannot ping any of the hosts via the vsan ip


Sent from my iPhone using Tapatalk