NAS Speed Tests - 10GE vs 4GE LACP [Crystaldisk Performance] 4K confusion?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

dyseac

New Member
Mar 26, 2016
4
0
1
38
Hi all, first time posting here. Hopefully this is the correct area and im not asking something too abstract.

So for the past couple of weeks I've been on a mission to increase NAS speeds.
As we all know 10GE gear is expensive as hell still so short of buying expensive gear I came up with the following solution (Only have 2-3 clients connecting to 1 storage area).

First attempt was a $20 4x 1000 Intel Pro NIC. After a lot of issues unknown to me at the time with Teaming being disabled in current versions of win 8 & 10, along with no SMB3 support in 7 I ended up doing my test by installing a trial of windows server. As you can see the results were great.


This gave me the confidence to then go and buy 3x x540 10GE nics. I set up a simple token ring network and most of the results are good, however, for whatever reason the 4K write speeds are not really any different to the 1GE speeds previously. Interestingly though, if we go back via the Switch the 4K speeds double. And just as interesting the 4x 1GE LACP was around 3x faster than a single 10GE to 10GE connection.

I'm struggling to understand the relationships here. I would have thought the switch would add an un-necessary layer. and likewise, why would the LACP be faster than the 10GE connection if they both effectively use similar stacks / buses etc..

Can anyone share some insight? :)

Here are some images to help explain the above scenario.

Some hardware details. All systems are X79 using either intel Pro 1000 for LACP or x540-T2. CAT6 or CAT6e cabling. Switch is just a layer2 Netgear something... All mapped drives were Ram Drives with local results being magnitudes greater than what you see. I did this to eliminate any hard drive speed / controller limitations. My focus is on network connection capabilities only. (Or could this have something to do with it?..)

Thanks for reading!

lacpvs10ge.jpg
 
Last edited:

Pete L.

Member
Nov 8, 2015
133
23
18
56
Beantown, MA
Try removing the LAG / LACP Group and just use them as individual links with their own IPs. Multi-Channel in Windows 8, 10 and Server 2012 works by "Knowing" the various paths available between the two systems. LACP kind of takes away which ability. I am thinking that for this to work with LACP / LAG / Bonds in the future there might need to be some kind of new implementation of LACP?

LACP uses a single IP Address then load balances across the different "Lanes" if you will. I have messed with this in our lab at work and by going with individual IPs (they don't have to be in the same subnet as long as they are reachable by each devices). This worked impressively well. I tried to get it to work with my Synology only to find out that it Does Support SMB3 (not by default / you have to set it) but it does NOT as of yet Support "Multi-Channel". Apparently not many NAS's do as of yet. If / when they do then 10G will become a lot less desirable for us geeks that can easily saturate a single gig link / want more speed.

Of course the problem with this is that by going with individual IPs you are now taking away the redundancy which is why I was going two different Bonds which does seem to work but you will only be using the speed of two links.
 

dyseac

New Member
Mar 26, 2016
4
0
1
38
Hey, thanks for the reply however I'm not sure that answers my question. In both cases the LACP group exists, however, when using the switch the 4K speeds are double what they are when bypassing the switch... What is the switch doing that increases the speed so much that going direct can't ? Or are you suggesting moving away from 802.3ad ?

Cheers
 

Pete L.

Member
Nov 8, 2015
133
23
18
56
Beantown, MA
So to be clear you are trying to go from a Single 10G connection to a system with Four 1G Connections and you are only really seeing about 1G/b of transfer but then you go from Four 1G to another Four 1G between machines you are getting much better throughput?

The better throughput it through a switch or just LACP set up between the cards directly connected?

If they are directly connected then yes for the sake of testing try taking LACP / LAG / 802.3ad out of the equation completely (switch / cards) and let them get individual IPs (or assign them) from there you should be able to start a single file copy and it should use all 4 connection (more like 3+) which you will be able to see in Task Manager.

I am not sure how this will behave between a 10G connection and the 10G that is one I haven't tried. I've just been doing the multiple 1G Connections. From what I've seen / observed LACP does not allow it to pass through the switch.
 

Pete L.

Member
Nov 8, 2015
133
23
18
56
Beantown, MA
You have me curious, I have a complete 10G Network so I am building up a system with Server 2k12 with dual NICs to see how it will work =)
 

Pete L.

Member
Nov 8, 2015
133
23
18
56
Beantown, MA
So this is interesting and I will have to mess with it a little more. I built up a Server 2012 R2 System (Base install / No updates) and it just has a dual port Intel Pro 1000 Card in it so I let them get IPs from DHCP, the Server is Connected to a Cisco Switch that has Dual 1G Links (LACP) to my 10G Switch and I did a file copy from one of my 10G Windows 2012r2 Servers and it actually does use both NICs and it was also using both connections in the LAG / LACP Group HOWEVER they were only equaling up to about 1GB as seen below.

File Copy Test-PRTG.jpg File Copy Test.jpg
 

Attachments

dyseac

New Member
Mar 26, 2016
4
0
1
38
So to be clear you are trying to go from a Single 10G connection to a system with Four 1G Connections and you are only really seeing about 1G/b of transfer but then you go from Four 1G to another Four 1G between machines you are getting much better throughput?
No and yes. The purpose of this project is to increase 4k write/read speeds. I have tried numerous combinations of 4x1GE LACP and 1x/2x 10GE LACP both to the same and combined. My inherent question remains as why do the 4K speeds double when going via the switch rather than direct. As per my images above if we just look at 4k read speeds.

10GE -> 10GE Direct = 7Mb/s
10GE -> Switch -> 10GE = 12Mb/s *
* Note: I have only shown a screenshot of 1GE -> Switch -> 10GE however the 10GE -> Switch -> 10GE was the same. I did the 1GE test out of curiosity to see if it was a lot slower than the 10GE test however the result was the same.

In all other scenarios the Direct Connection appears to be a winner. I'm trying to ascertain the logic behind why the 4K test performs better going via the switch rather than directly paired.

If they are directly connected then yes for the sake of testing try taking LACP / LAG / 802.3ad out of the equation completely (switch / cards) and let them get individual IPs (or assign them) from there you should be able to start a single file copy and it should use all 4 connection (more like 3+) which you will be able to see in Task Manager.

I am not sure how this will behave between a 10G connection and the 10G that is one I haven't tried. I've just been doing the multiple 1G Connections. From what I've seen / observed LACP does not allow it to pass through the switch.
I've had no problems with various LACP capable Layer2+ switches. Again, the aim here is not to have increased single file transfer speeds but increase IOPS for 4K / SEQ files.

Oh and if you are curious what 2x10GE -> 2x10GE with Jumbo frames looks like here it is :)

10gelacp.jpg
 

bds1904

Active Member
Aug 30, 2013
271
76
28
It seems like some kind of weird latency issue.

Try using a crossover cable nic-to-nic to see if it's the auto-mdix causing the issue.

This exact issue is why we are still using fiber-based 10Gb for our storage network at work. During our performance testing we encountered similar issues with low performance and high latency on 10GbE, even with short 6ft patch cords. Even using DAC's on 10Gb SFP+ cards added enough latency to reduce the speed of multiple small read/writes by a third, especially on Windows.

We ended up going with regular 10Gb SFP+ because it just worked, and worked on any OS.