Cisco SG550XG-24F stacked => vLAG => packet drops on ceph cluster network

hanshans

New Member
Dec 14, 2013
5
1
3
I bought 2 Cisco SGSG550XG-24F for our new Ceph cluster.

The cluster has been setup in the lab with 2 of our old Blade G8124 24x10G Switches and worked seamlessly with good performance. For the sake of simplicity no VLAN config has been used in the lab setup.

Now we moved to the SG550XG (and even used fiber instead of DAC). The 2 switches are stacked with 3 fiber links (done via GUI). Every server connects to each of the switches with one 10G link for redundancy/performance. The links are bonded on linux side with mode LACP. On the switch side the according ports of switch 1 and switch 2 are trunked together with LACP enabled. When using this setup, I see many dropped packets on the cluster interfaces and a extreme delay on the ceph network. Using the switches without bonding/LAG, the performance is way below our Blade switches, but still acceptable. In LAG mode, I realize delays of up to 10sec and thousands of dropped packets.

I am not the network guy nor familiar with Cisco, so I was wondering if someone has an idea.
Does the LAG across the two switches cause such huge delays or is there something else to take care of, when using vLAG?