Sonic - MLAG & Management network issues

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Sjhwilkes

New Member
Oct 17, 2020
28
2
3
Using several Celestica DX010 as in interim platform until Arista/Mellanox/Dell/Cisco can actually deliver anything. Mostly working well, this is a layer 2 config only, remaining issues are MLAG, for which a dev at Broadcom is working on a fix, and intermittent management.

I suspect the management issue is ARP confusion as the mgmt subnet is physically connected to Eth0 as well as on a VLAN interface, just took it off the latter so hopefully it stabilizes.

How are other people managing Sonic - via the OOB Eth0 / Ethernet0 or VLAN interfaces? Is it stable?

Anyone else out there using MLAG with any success? For us it works for 6 hours then the daemons fail which causes the data plane to soft reset, then it works another 6 hours...

It's a pain that ICCPD is not enabled in the official builds, as my build had a memory leak, which I discovered the hard way after 60 days.
 

necr

Active Member
Dec 27, 2017
156
48
28
124
Have you tried pure L3 mode with VXLAN on top? Or you rely on bonds and MLAG?
 

Sjhwilkes

New Member
Oct 17, 2020
28
2
3
No because have a dozen Brocade switches hanging off two pairs of these in the design - without MLAG we're currently running with just a LAG from each Brocade into a single DX010, and a LAG between the two live ones. But yes did thing about rebuilding it as VXLAN - just doesn't solve the issue downstream.
 

Sjhwilkes

New Member
Oct 17, 2020
28
2
3
Tried a couple of different builds of 2205. There must be working builds out there, including the LEDs fans and stuff, but with a new build every day on each release train it's impossible to pick one and test. It's both interesting and terrifying to see all the issues and merges in Github - I'm sure normal for switch software but you're not usually able to see it.
 

salvadorb

Member
Jul 14, 2021
45
2
8
Hi,
Unfortunately newer versions (202111 and 202205) don't work fine with the Celestica DX010 on my experience, several commands stopped working.

I'm currently running 202106 with mclag and it's working ok. I have faced some issues with the ICCPd modules not starting properly after a reboot, but after rebooting the machines twice it works fine again. I'm currently evaluating enterprise Sonic from some vendors who claim to have a much stable version.
 

Sjhwilkes

New Member
Oct 17, 2020
28
2
3
Are you building 202106 with ICCPD turned on yourself, or have you found anywhere with builds - how many DX010 do you have?
 

salvadorb

Member
Jul 14, 2021
45
2
8
Yes, we had to build it 202106 with ICCPD turned on ourselves. We have several dozens in production so far
 

DSpazman

New Member
Jan 11, 2023
1
0
1
Yes, we had to build it 202106 with ICCPD turned on ourselves. We have several dozens in production so far
Would you be able to share the build output for that by any chance? I understand it would be a use at your own risk kind of output, but this iccpd / mclag on the seastone seems to be a very common problem.
 

Sjhwilkes

New Member
Oct 17, 2020
28
2
3
Yes the Seastone support is highly imperfect as it is - with the environmentals/LEDs only working on 202012 build and earlier. No easy to see what fixes are missing from that build. I had bad luck with memory leaks which cause crashes after 60 days on recent builds. Shame as this would be a killer ROI for lab use at least if stable enough.
 

applepi

Member
Jun 15, 2013
86
67
18
Yes the Seastone support is highly imperfect as it is - with the environmentals/LEDs only working on 202012 build and earlier. No easy to see what fixes are missing from that build. I had bad luck with memory leaks which cause crashes after 60 days on recent builds. Shame as this would be a killer ROI for lab use at least if stable enough.
Have you had any luck with memory leaks, I'm still having issues after a few days shame it doesn't auto reboot on panic.