SMB, LACP, NIC Teaming, & You.

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Dajinn

Active Member
Jun 2, 2015
512
78
28
33
I'm having a particular issue with NIC Teaming, LACP, and SMB.

At first I was getting ready to team all the NICs across my server for increase throughput to/from nodes/storage. Come to find out that LACP only theoretically increases bandwidth for multiple TCP/IP sessions and what I actually wanted was something similar to MPIO, but, not literally MPIO.

I learned that SMB 3.0 supports multi-channel out of the box through a variety of configs. e.g. the NICs have to support RSS, or they have to support RDMA, or they have to be teamed, and that's pretty much it.

So all night I've been testing this to try to get the magic to happen and I'm completely stumped and am seeking ya'lls advice. I'll go over all of the scenarios I've tested:

I've teamed all available NICs on each of my test beds using combinations of all of these settings:

In the OS level: "Switch Independent / Dynamic", "Switch Independent / Address Hash", "Static Teaming / Dynaminc", "Static Teaming, Address Hash", "LACP / Dynamic", "LACP / Address Hash"

Each pair of options corresponds to every method I've attempted to team the NICs on both the client and the server. But nothing seems to work. LACP is definitely functioning as a protocol as Windows Server reports LACP as failed until I literally go into my switch and enable the LAG group for those ports so I know that's not the issue.

Trying all of these things I only continue to see a single Gigabit of throughput when copying files.

Now here is what DOES work:

Leaving both sets of NICs un-teamed, un-paired, etc. Basically unconfigured. As soon as I start copying a large file from the SMB share I immediately notice it's transferring at over 2 Gbps.

What ALSO works is teaming ONLY the client set of NICs and leaving the server NICs un-teamed. I get combined interface throughput that way.

Nothing else works at all and I've been researching this for hours now. Can anyone chime in on their experience with this?
 

TuxDude

Well-Known Member
Sep 17, 2011
616
338
63
You are correct that LACP (and pretty much every other method of NIC teaming) will not help you much - a single connection (source/destination IP) will use a single NIC from the team.

SMB Multi-Channel is probably what you want - it is like MPIO for SMB, and allows a single SMB session (eg. file transfer) to use multiple connections. It does NOT require NIC teaming or RDMA - teaming will turn the multiple NICs into a single connection where multi-channel needs multiple connections, while RDMA is needed for SMB-Direct which is an entirely different thing.

Unless you have very specific hardware for RDMA support (either IB or certain 10GbE NICs with ROCE, and a DCB-enabled 10GbE switch) you can forget about SMB-Direct.

If file-copies over SMB protocol are the only thing you are doing that needs the bandwidth I would say just stick with SMB-multichannel over un-teamed NICs - keep it simple. If there will be other protocols also using a lot of bandwidth then look into teaming for the server and leave the SMB clients un-teamed - that should still be able to get you SMB-multichannel to those clients, while the server can do LACP load-balancing of the other traffic (assuming again it is to multiple destinations)
 
  • Like
Reactions: PnoT and Chuckleb

DavidRa

Infrastructure Architect
Aug 3, 2015
329
152
43
Central Coast of NSW
www.pdconsec.net
A couple of other caveats to all this.

First, by default, SMB prefers NICs that have a default gateway assigned and working (which really sucks if you wanted to confine the traffic to a dedicated, non-routed VLAN).

Secondly, even if you use SMB Multichannel constraints to force the traffic to the "storage VLAN", I've found SMBMC will often use the public-NIC instead because (in my case) it's RSS-capable and the NICs on the storage network aren't. Yeah, that's probably uncommon, they're old HP NICs I got cheap.

Third, clusters impose further constraints - you need multiple VLANs/subnets for this to work properly.

Also, I've had weird connectivity problems if you define multiple constraints to a single VLAN/subnet - almost as if the host decides that even though I've constrained to sNIC1 and sNIC2, and the conversation is on sNIC1, the constraint to use sNIC2 forces the traffic to switch NICs, and access dies. Then it wants to switch back again. I'm hoping this is in the way I've configured constraints, I'm still fiddling/testing (it's tough when your only HV cluster/file server is production and breaking stuff breaks your world).
 
  • Like
Reactions: Dajinn