Turbocharge your Quanta LB6M - Flash To Brocade TurboIron - Big Update!

TheBloke

Active Member
Feb 23, 2017
200
40
28
42
Brighton, UK
Guys, got another issue.. I just turned off LACP on both server and desktop, and disabled it on the switch. I wanted to go back a simple 2 x separate links.

The problem is that the switch is now blocking two of the ports, the first port in each of the old LACP groups:
Code:
switch10g#show int br eth 1 to 2 eth 23 to 24
Port    Link    State   Dupl Speed Trunk Tag Pvid Pri MAC            Name
1       Up      Blocked Full 10G   None  No  1    0   089e.0193.0832  10Gserve
2       Up      Forward Full 10G   None  No  1    0   089e.0193.0832  10Gserve
23      Up      Blocked Full 10G   None  No  1    0   089e.0193.0832  10Gdeskt
24      Up      Forward Full 10G   None  No  1    0   089e.0193.0832  10Gdeskt
I removed all LACP config from all the individual ports, eg:
Code:
interface ethernet 1
 port-name 10Gserver1
!
interface ethernet 2
 port-name 10Gserver2
No trunks exist either. Looking through the full details via "show int", I can't see anything different between a non-working Blocked port and a working Forwarding port, besides it saying Blocked or Forwarding.

I've just been trawling the docs, and I see that this Blocked is part of STP. But I don't know why disabling LACP has caused this. I tried disabling spanning-tree globally on the switch, which made no difference, nor did re-enabling it (besides kicking me out of all TCP connections while it initially put all ports into 'Learn'!)

I've tried bringing up and down the NICs on both server and client multiple times, and pulling cables. I even power cycled the server completely, but as soon as it powered the switch port changed from Down, back to Blocked. So it can't be any OS config, as it showed Blocked before the server OS had even booted.

My next step is just to power-cycle the switch, but it'd be good to learn the proper way. I can't find anything in the docs that indicates how to clear this out - I guess it's there because some state exists it doesn't like. But I have no idea what that is.

EDIT: OK I power cycled it and the ports are working fine now. Would still be grateful if anyone can figure out what happened and how I could resolve it without a cold boot.
 
Last edited:

TheBloke

Active Member
Feb 23, 2017
200
40
28
42
Brighton, UK
One that just crossed my mind that I had not considered.... what if RDMA is coming into play? RDMA does not play nice with nic teaming because the packets go directly to the adapter and bypass the network stack.
Good thoughts, thanks, but my X520s don't have RDMA, only RSS
 
Interesting stuff. With what my company does, single threaded performance is of greater importance than a high number of cores (even though many of the applications we use are multi threaded).
I like this talk:

Get a 8 cores, disable them all and use only 1 core. More L3 cache available :)
 
Last edited:
One that just crossed my mind that I had not considered.... what if RDMA is coming into play? RDMA does not play nice with nic teaming because the packets go directly to the adapter and bypass the network stack.
Mellanox is weird, they have one controller and two media ports on some of their cards, precisely for this purpose. Most dual port cards have 2 controllers. When you use software like DPDK with Mellanox, you have to utilize both ports with it, you can't use only one port using user space networking and have a normal kernel TCP stack on the other port.

The ConnectX5 cards also have a embedded PCI-E hardware, so that if you connect a ring of them together, they will forward packets to eachother without going through the host. This allows more than 3 computers to be connected without a switch in the middle. This article talks about that: https://www.nextplatform.com/2016/06/15/next-gen-network-adapters-oomph-switchless-clusters/
 
  • Like
Reactions: mixmansc
Guys, got another issue.. I just turned off LACP on both server and desktop, and disabled it on the switch. I wanted to go back a simple 2 x separate links.

EDIT: OK I power cycled it and the ports are working fine now. Would still be grateful if anyone can figure out what happened and how I could resolve it without a cold boot.
You may need to wait for the MAC addrs to age out of the port. I think that trunking ports causes the hosts to use the same MACs on both ports. If you turn it off the trunk and another MAC appears on one port, the old MAC will eventually die, but it's a timed event. I'm just guessing, it could be something else.
 

fohdeesha

Kaini Industries
Nov 20, 2016
2,333
2,475
113
31
fohdeesha.com
yeah that would make sense. I know there's a "clear mac-address vlan 1" to clear all macs in a vlan, you can also do it by port, or just clear a specific mac

next time it happens and the ports get blocked do a "sh mac-address all" and it should show the macs of your intel card and the state (blocking)
 
  • Like
Reactions: TheBloke

TheBloke

Active Member
Feb 23, 2017
200
40
28
42
Brighton, UK
You may need to wait for the MAC addrs to age out of the port. I think that trunking ports causes the hosts to use the same MACs on both ports. If you turn it off the trunk and another MAC appears on one port, the old MAC will eventually die, but it's a timed event. I'm just guessing, it could be something else.
yeah that would make sense. I know there's a "clear mac-address vlan 1" to clear all macs in a vlan, you can also do it by port, or just clear a specific mac

next time it happens and the ports get blocked do a "sh mac-address all" and it should show the macs of your intel card and the state (blocking)
Thanks guys. I'm dumb - I did think of MAC addresses, but I kept clearing the ARP cache! I'm too used to L3 stuff on servers, not L2 switching. Or switching at all really :)

I'm sure it must have been that.

And actually that now makes me wonder if that was also the problem with the workstation LACP not working without spanning-tree disabled. Maybe if I had first cleared the MAC cache, then tried to enable the workstation into LACP, it wouldn't have kept bouncing the connections with (presumably) STP failures.

Or, pulled the cables between workstation and switch, enabled LACP on two different switch ports, then plugged the workstation links into those new ports, such that they were fresh ports with no MAC address registrations that had now only been used with LACP. I might have to try that, just to see if it works. Don't think I'll be going with LACP as a long term config, but always good to understand what's going on.
 

TheBloke

Active Member
Feb 23, 2017
200
40
28
42
Brighton, UK
Interested in seeing all of these results and tests though.
And I'm now curious to see how many TCP connections are opened by SMB MultiChannel when I don't use LACP. It must be at least two per server, which is more than I'm getting when LACP is on.
OK, to complete the story: I've tested again and Windows only ever opens one connection per remote IP.

The Microsoft doc clearly states that it will open more connections if RSS is available ("With SMB Multichannel, if the NIC is RSS-capable, SMB will create multiple TCP/IP connections for that single session, avoiding a potential bottleneck on a single CPU core when lots of small IOs are required"). But it doesn't do so for me, because the SMB connection thinks the NICs are not RSS capable.

I have 2 x 10G links on each of server and desktop. For the following test I am running a single instance of Samba's smbd on the server, but configured with two IPs - 192.168.100.54 (on 10G NIC #1) and 192.168.200.54 (NIC 2). The workstation NICS are configured across the same two subnets: 192.168.100.20 (NIC1) and 200.20 (NIC2).

I'm running benchmarks from the Windows client using the iozone disk benchmarker, with two to four concurrent instances. On the server I have enabled ZFS filesystem compression which almost completely removes the disk subsystem from the equation, as iozone data is 100% compressible. I just get activity on metadata writes; reads are all from cache. I have also tested with compression off, and the figures are basically the same as I'm well below the max throughput of the PCIe SSD array I'm testing with.

I first establish an SMB connection to 192.168.100.54:
net use z: \\192.168.100.54\tomj\copytest

Through the SMB protocol handshaking, Windows finds out about the other IP address(es):
Code:
C:\WINDOWS\system32> Get-SmbMultichannelConnection
Server Name    Selected Client IP      Server IP      Client Interface Index Server Interface Index Client RSS Capable
-----------    -------- ---------      ---------      ---------------------- ---------------------- ------------------
192.168.100.54 True     192.168.200.20 192.168.200.54 7                      3                      False
192.168.100.54 True     192.168.100.20 192.168.100.54 6                      2                      False
192.168.100.54 True     192.168.1.20   192.168.100.54 2                      2                      False
It shows one row per workstation NIC, and of these two are able to connect to the server. As these are on separate NICs, I achieve one TCP connection per NIC on both server and client.

Important to note is that it always shows "Client RSS Capable = False", even though I have RSS enabled on both Intel 10G NICs. This must be why I don't get more than one connection per IP - it relies on RSS being detected. RSS definitely is on, and it's confirmed by both Get-NetAdapterRSS and Get-SmbClientNetworkInterface. The latter being interesting and odd, because it's part of the SMB software:
Code:
C:\WINDOWS\system32> Get-SmbClientNetworkInterface | findstr "Interface True"
Interface Index RSS Capable RDMA Capable Speed    IpAddresses                                                           Friendly Name
6               True        False        10 Gbps  {fe80::65f0:5b9f:2376:8deb, 192.168.100.20}                           Ethernet 5
7               True        False        10 Gbps  {fe80::3907:8867:ca06:31c3, 192.168.200.20}                           Ethernet 6
At first I thought the lack of RSS detection might be a deficiency of the consumer code versus Server, but the Microsoft doc specifically mentions support in Windows 8. And I've since seen several discussions from people having the same issue in Server 2012. Here's an example, an unsolved post on Microsoft TechNet, posted by a user on Windows Server 2012 and with other users reporting the same issue, the latest as recent as October 2017.

It's possible it's NIC or driver related. Maybe it works with some NICs and not others. But that's not be something I can test anytime soon.

However, the main stated benefit for having more connections is to avoid a bottleneck on a single workstation CPU/thread, and luckily I don't seem to have that constraint; it's the server/Samba end where I hit a CPU bottleneck. My desktop CPU is the same six-core/twelve-thread Westmere Xeon x5670 that my server has two of, but in the workstation it's overclocked to 4.4Ghz and has much faster RAM. I'm quite impressed with client load: on write tests I see overall utilisation of 20-25%, with no single core above 50%. On reads client utilisation totals 25%, with two cores at up to 60%. Total utilisation figures include all the usual background desktop stuff, including dozens of open browser tabs, and usually a YouTube or Netflix running at the same time as the tests :)

This test gives me a write speed averaging 12Gb/s with reads around 9.5Gb/s. The read test is where I most clearly hit the Samba CPU limit - I can see that the single smbd process is pegged at 100% of one core, as it's single threaded. The write test doesn't go as high, but must also be bottlenecked because if I perform the same test across two Samba instances (on the same physical server), writes go up to 15Gb/s. I don't know why reads need more CPU than writes.

Reads spread across two (or more) Samba instances average around 17Gb/s, with 17.8 being my fastest ever result. There's a lot more workstation CPU load in this case: about 45% total utilisation, and two cores up at 90-95%. All those figures are with iozone configured with a record size of 1MB. With 8k - probably more realistic of real filesystem usage - write performance drops by 5%, and reads by 20%, but that's not caused by increased CPU usage as that is also proportionally reduced. I'm not completely sure what bottleneck(s) keep me from getting closer to the max 20Gb/s. But again it seems like I'm not being disadvantaged by not getting the extra threads and connections promised by SMB MultiChan when detected RSS is detected.

On the server side, which is definitely a bottleneck when using a single instance (as I mostly will in real usage), Samba's smbd instance only uses a single process even though I'm connecting twice or more across multiple IPs. I suppose that's reasonable - it's considered a single connection and SMB MultiChannel has to ensure synchronisation in order to put the data into a single file(s) with no corruption. I guess that is hard to do across processes, or even across threads - though Windows can do it OK :) But Samba still lists MultiChan as experimental, so it's probably not had much or any optimisation work. Maybe it will eventually become multi-threaded.

I do plan to ask the Samba guys about it, and about whether I can do any further tuning. I've already done the obvious things like enabling Jumbo frames and tuning TCP SO_SNDBUF/SO_RECVBUF.

Finally, I wondered what happened if I assigned multiple IPs per NIC on both ends. So on the server I added 192.168.110.54 on NIC1 and 210.54 on NIC2, and 110.20 and 210.20 for the workstation. Sure enough, the SMB connection now got four TCP connections to a single instance, all of which transferred data equally. Of course no speed improvements as I was already bottlenecked on the server side - in fact speed drops slightly, likely due to extra synchronisation work. But it is good to know it scales by IP and not by NIC, allowing one to work around the lack of RSS detection if it would be beneficial to do so.

All this made me realise that when I first tested LACP, I didn't do so properly. I assumed it would open an extra connection because of the team, which of course it does not without RSS detected. In my LACP test I configured a single IP per server instance. I did use two servers so that gave me two connections total, but if I'd configured two IPs per server as per my non-LACP tests, I would have got four total connections and LACP would have balanced a bit better than my initial LACP test.

But later I did anyway test LACP with four total connections (one each to four Samba instances) and it was still slower than non-LACP with two connections to one server, and much slower than non-LACP with four connections to two servers. So I think LACP is a dead end for my use case.
 
Last edited:
  • Like
Reactions: fohdeesha

fohdeesha

Kaini Industries
Nov 20, 2016
2,333
2,475
113
31
fohdeesha.com
Finally got a couple spare minutes and updated the firmware zip - added the better documentation linked earlier, the l2 switching only firmware in case anyone wants it, and the latest quanta/fastpath image and bootloader for when the revert guide is put up.

Also updated a couple pieces of text in the guide, mainly adding an even more explicit warning to NOT try and manually type commands, after we got our first brick (check the other lb6m thread). Hopefully the more explicit warning to only copy-paste will help prevent any more.

Hopefully tomorrow I'll get some time to do the most important ones - adding the MAC reset guide, the management port warning, and the quanta revert guide. Oh, and the fan speed commands. Anything I'm forgetting?

Also worth mentioning: My JTAG unit gets here in a couple days, when it does I'm going to attempt to flash the Dell 8024 firmware. The 8024 is identical to the quanta/brocade except the powerPC management CPU is very slightly newer, albeit in the same e500 line. I give it a ~20% chance of working. If it does, that will add ipv6 routing to this platform. The Broadcom ASIC has always supported it, just up to the CPU and OS to implement it properly
 
Last edited:

TheBloke

Active Member
Feb 23, 2017
200
40
28
42
Brighton, UK
Finally got a couple spare minutes and updated the firmware zip - added the better documentation linked earlier, the l2 switching only firmware in case anyone wants it, and the latest quanta/fastpath image and bootloader for when the revert guide is put up.
Great stuff foh!

One question: I recall from your FastPath revert pastebin that you noted that one should initially flash to 1.0.0.10, to set up flash properly:

tftpboot 192.168.1.8:lb6m.1.0.0.10.bin (later images will not update the flash FS properly from scratch)

You've not included that file in the ZIP's "Fastpath Revert" folder, only 1.2.0.18. So is the 1.0.0.10 intermediate step no longer needed - you found a fix for that?

Also worth mentioning: My JTAG unit gets here in a couple days, when it does I'm going to attempt to flash the Dell 8024 firmware. The 8024 is identical to the quanta/brocade except the powerPC management CPU is very slightly newer, albeit in the same e500 line. I give it a ~20% chance of working. If it does, that will add ipv6 routing to this platform. The Broadcom ASIC has always supported it, just up to the CPU and OS to implement it properly
Wow, so it is theoretically possible to have ipv6 routing on this LB6M if the right software can be found? That would make a lot of folks happy. Good luck!
 

mixmansc

Member
Feb 15, 2016
45
26
18
Hope that works out with the Dell firmware! That would make these right up up there with current enterprise class switch offerings! Would also make it pair up even better with the H3C S5800 that I just got and converted to the latest HP A5800 software. :) If you are looking for a top shelf 48 port copper gigabit switch with 4 10Gbe SFP+ ports, find one. The one I got had some very old 2011 firmware on it and there are a number of steps needed to fully default it, then upgrade it to an intermediate firmware version (I found out the hard way that you cannot directly upgrade but its easy enough to revert) that will let you change the branding so that you can then upgrade it to the newest (late 2017) firmware update from HP. I wanted to upgrade my gigabit switch for all the copper 10/100/1000 devices on the network and have something with an easy connection to the new LB6M. The HP fits the bill perfectly and is a fully features L3 switch (with a nice web GUI too once you get a bunch of things configured). The HP is also surprisingly quiet and has some nice power saving features like being able to have it power down specific ports between certain hours, etc. There is also a POE version that does use a LOT more power as well. This is the specific one I bought - thing looks brand new inside and out too. Genuine HP JC105A A5800-48G 48GE 4 10GE SFP+ 885631166109 | eBay

Edit to add, if anyone is interested I'll document all the steps and make an easy guide like we now have for the LB6M. Will make things a lot faster and easier if the steps were actually documented with some quick config tips and whatnot. Only gotcha is if some prior owner actually configured a bootloader password. No need to really but if that is set and not known then its pretty much over before you start. Any other passwords for the CLI or anything else can be removed easily as long as you can get into the bootloader on one.
 
Last edited:
  • Like
Reactions: fohdeesha

fohdeesha

Kaini Industries
Nov 20, 2016
2,333
2,475
113
31
fohdeesha.com
Great stuff foh!

One question: I recall from your FastPath revert pastebin that you noted that one should initially flash to 1.0.0.10, to set up flash properly:

tftpboot 192.168.1.8:lb6m.1.0.0.10.bin (later images will not update the flash FS properly from scratch)


You've not included that file in the ZIP's "Fastpath Revert" folder, only 1.2.0.18. So is the 1.0.0.10 intermediate step no longer needed - you found a fix for that?

I was only having that weird symptom because I was missing a ton of uboot environment arguments that properly set up the JFFS2 file system for fastpath. @AT S37=0 provided the proper args from a stock LB6M so with those that intermediate step should no longer be needed

Wow, so it is theoretically possible to have ipv6 routing on this LB6M if the right software can be found? That would make a lot of folks happy. Good luck!
I'm not an architecture guy, but I'm ~80% sure that to properly support IPV6 routing and the creation and loading of IPV6 routing tables into the ASIC, the processor needs support for 64 bit registers (at least with the dell firmware), which our MPC8541E does not have. This would explain why all the IPV6 models have the 64-bit register variant in the e500 line. Still a small chance it's not needed though, so worth a try. I know some older brocades (FESX6) support ipv6 routing with an even older processor than we have, so it's possible to create IPV6 tables without it, I just have a feeling the way the dell firmware is written depends on them
 

fohdeesha

Kaini Industries
Nov 20, 2016
2,333
2,475
113
31
fohdeesha.com
JTAG unit finally arrived and is working great. Thanks to @Foray for the unit suggestion. First thing I tried was the dell 8024 software - glad I waited until I had a JTAG unit standing by to do this, because it bricked it right away, the bootloader wouldn't even come up. Seems like even the vxworks bootloader they use relies on the 64-bit floating point registers that the 8024 CPU has. So still just brocade or fastpath for now.

Since our success rate has been so high with flashing, I'll offer this: any already existing STH member that flashes their LB6M and bricks it, I'll unbrick it for you if you pay shipping both ways. If you take the PSU's out it should be about 20 dollars or less each way unless you live in the middle of nowhere

 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,082
532
113
New York City
www.glaver.org
JTAG unit finally arrived and is working great. Thanks to @Foray for the unit suggestion. First thing I tried was the dell 8024 software - glad I waited until I had a JTAG unit standing by to do this, because it bricked it right away, the bootloader wouldn't even come up. Seems like even the vxworks bootloader they use relies on the 64-bit floating point registers that the 8024 CPU has. So still just brocade or fastpath for now.
When these companies use the toolchain for building, they generally use the default code generator for the CPU in their design, as that should generated the tighest / fastest code. The only exception is sometimes for things like the very first stuff that happens at power-on. As an example, most of the hardware + firmware I designed would output something like "No memory?" if it powered up and found no RAM. It's a bit hard to do that in C + libraries, so that is an assembler stub that then transfers control to the real bootloader once it has determined that everything is "present and accounted for".

BTW, I should point out that the 8024 is also a FASTPATH switch. It just happens to have a newer FASTPATH and more features enabled than the LB6M FASTPATH:
Code:
 ***************** Show Version ******************

Switch: 1

System Description............................. Powerconnect 8024, 5.1.12.2,
                                                VxWorks 6.6
Machine Description............................ Dell Ethernet Switch
Machine Type................................... Powerconnect 8024
Machine Model.................................. PC8024
Serial Number.................................. CN0Y295K28298xxxxxxxxxx
FRU Number.....................................
Part Number.................................... BCM56820
Maintenance Level.............................. A
Manufacturer................................... 0xbc00
Burned In MAC Address.......................... D067.E5xx.xxxx
Software Version............................... 5.1.12.2
Operating System............................... VxWorks 6.6
Network Processing Device...................... BCM56820_B0
Additional Packages............................ FASTPATH QOS
                                                FASTPATH Multicast
                                                FASTPATH Stacking
                                                FASTPATH Routing
 
  • Like
Reactions: fohdeesha

mangodoc

Member
Apr 26, 2017
32
21
8
@fohdeesha ... Looks like you bought a BDI2000, any links and price to expect ? I would like to purchase one to play around with. Have you figured what went wrong with the 8024f jtag attempt ?
 

fohdeesha

Kaini Industries
Nov 20, 2016
2,333
2,475
113
31
fohdeesha.com
@fohdeesha ... Looks like you bought a BDI2000, any links and price to expect ? I would like to purchase one to play around with. Have you figured what went wrong with the 8024f jtag attempt ?

Didn't try an 8024f jtag, I just used the built-in bootloader copy commands (like we use for the brocade flash) to flash the dell firmware onto an LB6M. The LB6M has a different CPU though, so it didn't work, and I had to use the JTAG unit to recover the LB6M.

The bootloader copy commands should be able to flash a delta 7024 into a dell (I'd imagine this is what the chinese ebay sellers were doing), but I need someone to test it, and there's a slight bricking risk

as for the BDI, the average price seems to be about $100 on ebay, but they're useless unless you can find a listing that states it includes the right firmware (you need the MPC85xx firmware). Abatron is supposedly going out of business in a few months though, so after that they might send it to you for free
 

fohdeesha

Kaini Industries
Nov 20, 2016
2,333
2,475
113
31
fohdeesha.com
One tiny correction on the docs, for the sake of completeness: in the output paste following md 0xfff80000 20, you're missing a space at the end of line 2, ie after 06 (Apr 19 2011.
Finally got around to fixing this :) I completely forgot after we immediately dove into the management port insanity
 
  • Like
Reactions: TheBloke

ASG16_4

New Member
May 9, 2017
12
1
3
22
Firstly, Thank you @fohdeesha for this amazing work! I've flashed my LB6M to the brocade firmware the jankiest way (power almost when out because of a thunderstorm and I wasn't plugged into UPS and I hand typed the commands lol!!!! But I still got it working and didn't brick anything!)

Okay, I know I sound like a super noob for asking but how the hell do I actually create an OSPF route? Page 244 on the document you provided titled L3_guide says that you need to
  1. Enable OSPF on the router
  2. Assign the areas to which the router will be attached.
  3. Assign individual interfaces to the OSPF areas.
  4. Configure route map for route redistribution, if desired.
  5. Enable redistribution, if desired.
  6. Modify default global and port parameters as required.
  7. Modify OSPF standard compliance, if desired.
I get up to step 3 and theoretically i'm ready to start making routes, but yet the documentation doesn't actually list any way to create a route! I've gone through a good 75% of the material and nowhere does it say explicitly how to create an OSPF route. I've come across adding static routes to a road map that can then be referenced by OSPF (enabling Static routes through OSPF) and route summarization with OSPF. Side note, I have assigned the port with an IP address.

I can't find anything in the (port config (im using port 26 for now...)) or (config-ospf-router) menus. I'm really drawing a blank right now!
 

fohdeesha

Kaini Industries
Nov 20, 2016
2,333
2,475
113
31
fohdeesha.com
That would be because you're trying to add ips/routes to PORTS, which is what you would do with layer 2 firmware. with a layer3 router, you need to get rid of any port ip addresses, and configure a virtual interface for you desired vlan (probably vlan 1 in your case, which is what all the ports are in by default). Then give that virtual interface an IP, and then ospf route. For more/separate networks, you would set up another vlan and do the same again, in a different subnet etc, then it will also route between them)

Doing that is covered in the l3 quick guide on the flash site, follow that to get your VE set up, then follow this:
 
  • Like
Reactions: ASG16_4