Drag to reposition cover

Brocade ICX Series (cheap & powerful 10gbE/40gbE switching)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

AndroidCat

Member
Mar 3, 2015
32
26
8
Does anyone have any pointers on debugging / profiling an ICX switch?

I’ve had an ICX6450-48P that’s been fairly bullet proof running 8.0.30u, but recently moved the core to an ICX7150-C12 on 8.0.95g to squeeze some longer runtime on my UPS. The 7150 had been working well for the first few days but locked up last night, with L2 traffic flowing through the switch but no L3 traffic passing across VEs and no response over the USB-C or the RJ45 serial consoles.

With no ability to access the switch via SSH or over serial, I pulled power and everything came back up just fine, but I’d like to figure out how to diagnose this if it happens again.

After reboot, I checked
Code:
sh log
but it looks as if the log was cleared on reboot.
You need to stream your logs to external syslog server.
For instance:
Code:
logging host <IP>  udp-port 1514
logging enable rfc5424
 
  • Like
Reactions: Rttg

LodeRunner

Active Member
Apr 27, 2019
557
237
43
Code:
SSH@ICX6450-24P#sh med e 1/2/3
Port   1/2/3: Type  : 10GE SR 300m ((SFP+))
             Vendor: BROCADE            Version: A
             Part# : 57-0000075-01      Serial#: AAF210210000E3G
SSH@ICX6450-24P#sh med e 1/2/1
Port   1/2/1: Type  : 10GE SR 300m ((SFP+))
             Vendor: OEM                Version: 02
             Part# : SFP-10G-SR         Serial#: CSF101L34485
SSH@ICX6450-24P#sh med e 1/2/4
Port   1/2/4: Type  : 10GE SR 300m ((SFP+))
             Vendor: OEM                Version: 02
             Part# : SFP-10G-SR         Serial#: CSF101L34484
SSH@ICX6450-24P#show optic 1/2/3
Port  Temperature   Tx Power     Rx Power       Tx Bias Current
+----+-----------+--------------+--------------+---------------+
1/2/3   49.0507 C  -002.6114 dBm -002.3837 dBm    8.366 mA
        Normal      Normal        Normal         Normal

SSH@ICX6450-24P#show optic 1/2/1
SSH@ICX6450-24P#show optic 1/2/4
1/2/3 is Brocade SFP+ fiber. 1/2/1 and 1/2/4 are ipolex 10GBase-T RJ45 SFP+ copper. One copper soon to be removed for a passive DAC since I moved my OPNsense box right next to the switch.
Since 1/2/1 and 1/2/4 are RJ45 SFP+, then there will be no optic monitoring data for them either, Brocade or not. They're not optics. Also odd that 1/2/1 and 1/2/4 report themselves as 10G-SR; that's an optical designation, IIRC. And no 10G over CAT5/6 is going 300m, spec is 100m max.

I suspect if you go into conf t > int e 1/2/1 and issue whatever command enables optical monitoring, you'd get an error similar to my 7450:
Code:
SSH@core(config)#int e 1/3/1
SSH@core(config-lag-if-lg4)#optical-monitor 8
Port lg4 is 40G copper and cannot support Optical Monitoring feature.
 

grenskul

Active Member
Nov 8, 2020
181
85
28
Just got a 6450 . Trying to set up 2 lags of 2 10G ports each . Is there anyway I can see the speed of the lag (like in most other switches ? I do " show lag" and this is what I get I was expecting to see 20G somewhere.
Code:
Total number of LAGs:          3
Total number of deployed LAGs: 2
Total number of trunks created:2 (122 available)
LACP System Priority / ID:     1 / 748e.f8b8.86e0
LACP Long timeout:             120, default: 120
LACP Short timeout:            3, default: 3

=== LAG "desktop" ID 2 (dynamic Deployed) ===
LAG Configuration:
   Ports:         e 1/2/3 to 1/2/4
   Port Count:    2
   Primary Port:  1/2/4
   Trunk Type:    hash-based
   LACP Key:      20002
Deployment: HW Trunk ID 2
Port       Link    State   Dupl Speed Trunk Tag Pvid Pri MAC             Name
1/2/3      Down    None    None None  2     No  20   0   748e.f8b8.86e0
1/2/4      Down    None    None None  2     No  20   0   748e.f8b8.86e0

Port       [Sys P] [Port P] [ Key ] [Act][Tio][Agg][Syn][Col][Dis][Def][Exp][Ope]
1/2/3           1        1   20002   Yes   S   Agg  Syn  No   No   Def  No   Dwn
1/2/4           1        1   20002   Yes   S   Agg  Syn  No   No   Def  No   Dwn


Partner Info and PDU Statistics
Port          Partner         Partner     LACP      LACP
             System ID         Key     Rx Count  Tx Count
1/2/3    1-0000.0000.0000       66       55      7079
1/2/4    1-0000.0000.0000       67       54      7078

=== LAG "unraid" ID 1 (dynamic Deployed) ===
LAG Configuration:
   Ports:         e 1/2/1 to 1/2/2
   Port Count:    2
   Primary Port:  1/2/1
   Trunk Type:    hash-based
   LACP Key:      20001
Deployment: HW Trunk ID 1
Port       Link    State   Dupl Speed Trunk Tag Pvid Pri MAC             Name
1/2/1      Up      Forward Full 10G   1     No  20   0   748e.f8b8.86e0
1/2/2      Up      Forward Full 10G   1     No  20   0   748e.f8b8.86e0

Port       [Sys P] [Port P] [ Key ] [Act][Tio][Agg][Syn][Col][Dis][Def][Exp][Ope]
1/2/1           1        1   20001   Yes   L   Agg  Syn  Col  Dis  No   No   Ope
1/2/2           1        1   20001   Yes   L   Agg  Syn  Col  Dis  No   No   Ope


Partner Info and PDU Statistics
Port          Partner         Partner     LACP      LACP
             System ID         Key     Rx Count  Tx Count
1/2/1    65535-e435.c87e.3549       15      377      3417
1/2/2    65535-e435.c87e.3549       15      235      2910
 
Last edited:

klui

༺༻
Feb 3, 2019
992
582
93
Newer versions show that, like 8.0.90. Maybe the feature you're looking for was added in 8.0.61 when they enhanced LAGs. See Terry Henry's YT channel for that.

The 6000 series have EOLed at 8.0.30.
 

LodeRunner

Active Member
Apr 27, 2019
557
237
43
Just got a 6450 . Trying to set up 2 lags of 2 10G ports each . Is there anyway I can see the speed of the lag (like in most other switches ? I do " show lag" and this is what I get I was expecting to see 20G somewhere.
[CODE
Total number of LAGs: 3
Total number of deployed LAGs: 2
Total number of trunks created:2 (122 available)
LACP System Priority / ID: 1 / 748e.f8b8.86e0
LACP Long timeout: 120, default: 120
LACP Short timeout: 3, default: 3

=== LAG "desktop" ID 2 (dynamic Deployed) ===
LAG Configuration:
Ports: e 1/2/3 to 1/2/4
Port Count: 2
Primary Port: 1/2/4
Trunk Type: hash-based
LACP Key: 20002
Deployment: HW Trunk ID 2
Port Link State Dupl Speed Trunk Tag Pvid Pri MAC Name
1/2/3 Down None None None 2 No 20 0 748e.f8b8.86e0
1/2/4 Down None None None 2 No 20 0 748e.f8b8.86e0

Port [Sys P] [Port P] [ Key ] [Act][Tio][Agg][Syn][Col][Dis][Def][Exp][Ope]
1/2/3 1 1 20002 Yes S Agg Syn No No Def No Dwn
1/2/4 1 1 20002 Yes S Agg Syn No No Def No Dwn


Partner Info and PDU Statistics
Port Partner Partner LACP LACP
System ID Key Rx Count Tx Count
1/2/3 1-0000.0000.0000 66 55 7079
1/2/4 1-0000.0000.0000 67 54 7078

=== LAG "unraid" ID 1 (dynamic Deployed) ===
LAG Configuration:
Ports: e 1/2/1 to 1/2/2
Port Count: 2
Primary Port: 1/2/1
Trunk Type: hash-based
LACP Key: 20001
Deployment: HW Trunk ID 1
Port Link State Dupl Speed Trunk Tag Pvid Pri MAC Name
1/2/1 Up Forward Full 10G 1 No 20 0 748e.f8b8.86e0
1/2/2 Up Forward Full 10G 1 No 20 0 748e.f8b8.86e0

Port [Sys P] [Port P] [ Key ] [Act][Tio][Agg][Syn][Col][Dis][Def][Exp][Ope]
1/2/1 1 1 20001 Yes L Agg Syn Col Dis No No Ope
1/2/2 1 1 20001 Yes L Agg Syn Col Dis No No Ope


Partner Info and PDU Statistics
Port Partner Partner LACP LACP
System ID Key Rx Count Tx Count
1/2/1 65535-e435.c87e.3549 15 377 3417
1/2/2 65535-e435.c87e.3549 15 235 2910




[/CODE]
“sh int br” shows link speeds I believe?
 

jayb998

New Member
Oct 31, 2022
9
5
3
Anyone running a 6450-24P with Noctuas? If so what kind of temps are you getting? Mine seems to constantly volley between ~58-64C... runs at fan speed 1, heats up, fan speed 2 cools it down, back to 1, repeat process. Like every 15 minutes or so. I only have 40W of POE draw which I consider to be a pretty light load considering this switch is rated for 370W.

I really like the noise level of the Noctuas (and was not thrilled with the Sunon KDEs I ordered previously) so I would like to keep the Noctuas if possible...

EDIT: Getting much better temps now with some network cabinet ventilation and rearranging devices in my rack. Still experimenting a bit.
 
Last edited:

LodeRunner

Active Member
Apr 27, 2019
557
237
43
Per port yes and I can see 10G on the ports that make up the lacp but nowhere have ai found the speed that actually makes up the lacp anywhere.
Maybe it's a newer thing then. When I do sh int br I get a list of all interfaces, including the LAGs:
Code:
lg4        Up      Forward Full 80G   4     Yes 1    0   cc4e.248b.3270
 

entertwined

New Member
Nov 12, 2022
1
1
1
I'm having issues updating my new 7250-24P to the latest 08095hufi.bin image in the download bundle. I followed the set up directions and got it running the non-UFI 08090mc.bin image, but when I try to run:

Code:
copy tftp flash <tftp server ip> ICX7xxx/SPR08095hufi.bin primary
I get the following output:

Code:
Load to buffer (8192 bytes per dot)
............................................
............................................
............................................
.............................................
<etc......>
...............................abort called
TFTP session timed out

Error in downloading bundle image

Error in processing bundle image
Oddly enough when I try to reflash the 08090mc.bin file via TFTP I get a slightly different error:

Code:
 !!! Downloading this application image can result in application-boot image mismatch. Please use UFI image.
Load to buffer (8192 bytes per dot)
.....................................
.....................................
......................................
etc......................abort called

TFTP session timed out

TFTP to Flash Error - code 5
My TFTP server logs show the correct file is being requested and at one point it worked to flash the original 08090mc.bin image, so I assume there is no configuration problem with the TFTP server or network. I've tried redownloading the firmware image a few times just in case it's some kind of file corruption. I've been running all this over serial, but I've also tried using SCP both to and from the switch per the instructions in the Ruckus 08.0.95 upgrade guide, but despite ironing out various issues with legacy SSH ciphers/options (I think) it still doesn't seem to work either. Very much appreciate any troubleshooting suggestions, I have a feeling I'm messing up something very simple but I can't figure out what it could be.


UPDATE: In case anyone else finds this, I just ended up flashing the firmware via USB. Still very confused why the TFTP flashing worked in the boot environment, but not when flashing a UFI image from within Fast Iron, I guess it must have been some network configuration change once the application was up and running.
 
Last edited:
  • Like
Reactions: Ibuytoomuchgadgets

Craig Curtin

Member
Jun 18, 2017
103
20
18
60
NOW... to build on this question a bit, two things:
  1. If I want to add a different VLAN for a different device port, do I essentially repeat the same steps? Do I need to re-add 1/2/1 to dual-mode after tagging it to a new VLAN, or is that part done?
  2. What if I want a hypervisor or a downstream switch or a WAP that passes traffic on multiple VLANs? (I have a Proxmox server and a couple of small Unifi switches.) Do I tag the device port with every VLAN it might conceivably use and put it in dual-mode/untagged on VLAN 1?
Thanks so much for the help!
1) Yes just repeat the steps i listed for additional ports and leave out the dual mode unless you need the VLAN 1 support
2) Yes if you have a "trunk" (Cisco namng) style device with multiple VLANs - just treat it like the Unifi as well - remembering that the opposite of cisco you add ports to VLANs not VLANs to ports.

Craig
 
  • Like
Reactions: jayb998

Craig Curtin

Member
Jun 18, 2017
103
20
18
60
I had exactly the same problem as you.
Try to add the line to each physical interface:
Code:
no spanning-tree
e.g.
interface ethernet 1/2/1
port-name xxxx
no spanning-tree
Let me know if that helps.

The explanation is somewhere within this thread. I think it was about spanning tree being active per port even though it is globally disabled.
At least it helped in my case.
Just trying this now - at first glance - just adding it to each of the 1/2/x ports does not appear to have made any difference - just migrating some VMs now so i can restart the Host

Craig
 

jayb998

New Member
Oct 31, 2022
9
5
3
Kudos for the assistance Craig. Everything you outlined above has worked like a charm. I was able to add 1/2/1 to the "new" VLANs and didn't need to re-add to dual-mode nor did I lose connection via SSH during the process. Setting up new downstream ports to my WAPs, switches and hypervisor was even easier.

Once I'm feeling brave enough I may dabble into L3 routing, but the UDM SE is working well enough right now as router-on-a-stick so no urgent need. Nice to know that the L3 is available if I grow into it in the future.
 

Craig Curtin

Member
Jun 18, 2017
103
20
18
60
Kudos for the assistance Craig. Everything you outlined above has worked like a charm. I was able to add 1/2/1 to the "new" VLANs and didn't need to re-add to dual-mode nor did I lose connection via SSH during the process. Setting up new downstream ports to my WAPs, switches and hypervisor was even easier.

Once I'm feeling brave enough I may dabble into L3 routing, but the UDM SE is working well enough right now as router-on-a-stick so no urgent need. Nice to know that the L3 is available if I grow into it in the future.
No worries - i too am new on the ICX bandwagon so glad i could help.

Remember if you move from the UDM pro for routing/filtering you will get more speed - BUT will have to learn how to write ACLs etc for the switch to perform the necessary filtering between VLANs etc - so thats another whole can of worms !

Once i get my ESXi hosts stable on the ICXs - that will be my next rabbit hole to go down !

Craig
 

Craig Curtin

Member
Jun 18, 2017
103
20
18
60
I had exactly the same problem as you.
Try to add the line to each physical interface:
Code:
no spanning-tree
e.g.
interface ethernet 1/2/1
port-name xxxx
no spanning-tree
Let me know if that helps.

The explanation is somewhere within this thread. I think it was about spanning tree being active per port even though it is globally disabled.
At least it helped in my case.
OK been through it and did the following

conf t
int e 1/2/1 to 1/2/10
no spanning-tree

wr mem

No change at the ESXi host

Rebooted the Host - no difference

logged back into the switch and did a disable and enable on the ports

No change at the ESXI host

Pretty much at my wits end now

Got one of the 3 hosts running on one switch and another on a different switch - one using Arista breakout cables and the other using FC.com breakouts

Tried with Dualport Intel 520DA-2 card (Intel 82599 and ixgben drivers in ESXi)
Tried with 2 x Single port Intel 520 card
Tried with Intel Dual port 540T cards and Twisted pair transceivers in the 1/3/x slots

I am waiting for some dual port Mellanox CX-3 adapters to arrive (CX312a)

Not sure what else i can do

Anyone ?

Craig
 

itronin

Well-Known Member
Nov 24, 2018
1,353
896
113
Denver, Colorado
Yeah i might move it back to street power for a little while and see what happens - hopefully it is not that finicky with power requirements !!

...

Craig
did you end up getting your switches to stop spontaneously rebooting? on street power? half n half?

btw, I did not go back and find the last copy of your config - so would you clarify: Are you LAGGing the ports to your ESXI hosts?

FWIW, I have not had issues with the SM dual 10gbe sfp+ which is a 520 card, nor chelsio 520's, nor CX312A's nor the 40Gbe version (using 40 or 10gbe with an step down adapter).
 

Craig Curtin

Member
Jun 18, 2017
103
20
18
60
did you end up getting your switches to stop spontaneously rebooting? on street power? half n half?

btw, I did not go back and find the last copy of your config - so would you clarify: Are you LAGGing the ports to your ESXI hosts?

FWIW, I have not had issues with the SM dual 10gbe sfp+ which is a 520 card, nor chelsio 520's, nor CX312A's nor the 40Gbe version (using 40 or 10gbe with an step down adapter).
Hey thanks for the follow up

Nope i have removed the 6610-POE for the moment as that was a bridge too far with all the other issues i am having with my hosts. I have ordered the replacement memory module that Fodeesha recommended and will swap that in when it arrives and then set the 6610-poe back up in my testing system.

No i do not have any lagging to the hosts - i am trying to get two seperate 10G links to the hosts - one of which will go to a dedicated vSwitch for NFS and vMotion traffic and the other for VM communications - the onboard 1GB card is then used for Management.

It has got me baffled - definitely something weird happening - its like the slightest change of config sets something in the switch to tell it not to enable the port again.

I am trying to methodically step through and test everything but it is doing my head in !

I could blame the non certified Optiplex PCs - but it was also happening with (non certified) HP units also.

I have just retired an R710 here from a customer with ESXI 5.5. on it - my plan tomorrow is to fire it up and try it to see if i can make any more progress

Craig
 

AndroidCat

Member
Mar 3, 2015
32
26
8
OK been through it and did the following

conf t
int e 1/2/1 to 1/2/10
no spanning-tree

wr mem

No change at the ESXi host

Rebooted the Host - no difference

logged back into the switch and did a disable and enable on the ports

No change at the ESXI host

Pretty much at my wits end now

Got one of the 3 hosts running on one switch and another on a different switch - one using Arista breakout cables and the other using FC.com breakouts

Tried with Dualport Intel 520DA-2 card (Intel 82599 and ixgben drivers in ESXi)
Tried with 2 x Single port Intel 520 card
Tried with Intel Dual port 540T cards and Twisted pair transceivers in the 1/3/x slots

I am waiting for some dual port Mellanox CX-3 adapters to arrive (CX312a)

Not sure what else i can do

Anyone ?

Craig
Sorry to hear that. It definitely helped in my case, even though the logic behind this had been unclear to me. Furthermore I also tried to find any log relevant to that port blocking and wasn't able to spot anything.
If that helps any, I've been using Mellanox X3 (flashed to ethernet) with both 10G and 40G links towards 6610. I don't have any LAG towards ESXi, it's all active+spare configuration (40G active + 10G spare).
Also ESXi 6.7 and 7.0 behaved exactly the same just reporting physical port down.

I'd ask @fohdeesha for any hints if you can 100% rule out HW/optics/DACs.
 

Craig Curtin

Member
Jun 18, 2017
103
20
18
60
Sorry to hear that. It definitely helped in my case, even though the logic behind this had been unclear to me. Furthermore I also tried to find any log relevant to that port blocking and wasn't able to spot anything.
If that helps any, I've been using Mellanox X3 (flashed to ethernet) with both 10G and 40G links towards 6610. I don't have any LAG towards ESXi, it's all active+spare configuration (40G active + 10G spare).
Also ESXi 6.7 and 7.0 behaved exactly the same just reporting physical port down.

I'd ask @fohdeesha for any hints if you can 100% rule out HW/optics/DACs.
Thanks for the follow up.

Yep i am pretty sure that i have ruled out almost everything

Started on switch #1 (6610) with FS.COM QSFP to SFP+ breakouts - thats where the problems started - and i thought first it was the FS cables (had 3 of them) so tried each one and same problems - so that started me down the path that it was a cable issue. Then purchased some working Arista cables (same QSFP to SFP+ breakouts) and got the same sorts of intermittent issues.

So decided it must be a switch problem (even though it felt very much like a Spanning tree issue)

So purchased a 6610POE and cabled it to the 6610 with a Dell SFP+ DAC 3 metre cable (one that i had on hand that i have had for about 6 years)

Thats when i started having the problems with rebooting of the POE switch - so gave up on that and swapped in a spare 6610 - so i now have the first 6610 which has 2 x 1GB copper links in a Trunk/Channel to my Cisco 4948 (that i am trying to retire) - which is where nearly all my devices are connected.

On this 6610 i have one FS.COM breakout for QSFP to SFP+ and have two of the ports on there connected to a single Dell host on a dual port card and that is working and is solid - as long as i do not make any VLAN changes or other config changes on the ports.

This 6610 also has a fibre SFP+ module going to another Linux host that has not missed a beat at any point.

The 2nd 6610 is now mounted in my rack (and will become the permanent one) and is attached through a single copper DAC cable to one of the 1/3/x ports on the first 6610.

This is the one i am doing all the testing on at the moment and can not nail down.

So i am pretty confident it is not

a cable problem
a switch problem (as in faulty switch)
a card problem (although all of them i have tested have been Intel 82599 based)
a transceiver problem

Last night after a switch restart with the no-spanning tree lines on each of the 1/2/x ports and a host restart i now have two Intel cards in the one host talking to that switch - i will do more VLAN changes and updates tonight and see if it breaks again.

The other thing that sort of points to some form of spanning tree issue is that the port/cable remains blocked - rather than a card issue on the host i.e. if i start plugged into (say) 1/2/2 and it has a problem and drops out - then i can not connect that to anything else and get it to come back up - but i can take the cable for say 1/2/3 and connect that to the same port on the same host and it comes back up at both the ESXi level and the switch Int Brief level.

There must be a table somewhere on the switch of ports that are blocked for whatever reason - but it is not being reported in the logs (or to the sysylog server i have setup) - nor anywhere else i can find



Craig
 

Craig Curtin

Member
Jun 18, 2017
103
20
18
60
Thanks for the follow up.

Yep i am pretty sure that i have ruled out almost everything

Started on switch #1 (6610) with FS.COM QSFP to SFP+ breakouts - thats where the problems started - and i thought first it was the FS cables (had 3 of them) so tried each one and same problems - so that started me down the path that it was a cable issue. Then purchased some working Arista cables (same QSFP to SFP+ breakouts) and got the same sorts of intermittent issues.

So decided it must be a switch problem (even though it felt very much like a Spanning tree issue)

So purchased a 6610POE and cabled it to the 6610 with a Dell SFP+ DAC 3 metre cable (one that i had on hand that i have had for about 6 years)

Thats when i started having the problems with rebooting of the POE switch - so gave up on that and swapped in a spare 6610 - so i now have the first 6610 which has 2 x 1GB copper links in a Trunk/Channel to my Cisco 4948 (that i am trying to retire) - which is where nearly all my devices are connected.

On this 6610 i have one FS.COM breakout for QSFP to SFP+ and have two of the ports on there connected to a single Dell host on a dual port card and that is working and is solid - as long as i do not make any VLAN changes or other config changes on the ports.

This 6610 also has a fibre SFP+ module going to another Linux host that has not missed a beat at any point.

The 2nd 6610 is now mounted in my rack (and will become the permanent one) and is attached through a single copper DAC cable to one of the 1/3/x ports on the first 6610.

This is the one i am doing all the testing on at the moment and can not nail down.

So i am pretty confident it is not

a cable problem
a switch problem (as in faulty switch)
a card problem (although all of them i have tested have been Intel 82599 based)
a transceiver problem

Last night after a switch restart with the no-spanning tree lines on each of the 1/2/x ports and a host restart i now have two Intel cards in the one host talking to that switch - i will do more VLAN changes and updates tonight and see if it breaks again.

The other thing that sort of points to some form of spanning tree issue is that the port/cable remains blocked - rather than a card issue on the host i.e. if i start plugged into (say) 1/2/2 and it has a problem and drops out - then i can not connect that to anything else and get it to come back up - but i can take the cable for say 1/2/3 and connect that to the same port on the same host and it comes back up at both the ESXi level and the switch Int Brief level.

There must be a table somewhere on the switch of ports that are blocked for whatever reason - but it is not being reported in the logs (or to the sysylog server i have setup) - nor anywhere else i can find



Craig
Still not making a lot of progress

I took a port that was fine 1/2/10 cabled it into a host that had previously been connected on another 6610 - port came up immediately on both the Host and Switch and looked fine

Then went and addedd it to a single VLAN as tagged and the port immediately dropped at the ESXI host and the switch shows it in blocking mode

1668467408860.png

1668467499303.png

Any ideas from the brains trust on this ? @Fodeesha - i have a kidney i can donate if you can work this out or even give me a hint

Craig