Mellanox ConnectX-2 EN and Windows 10?

Layla

Game Engine Developer
Jun 21, 2016
173
129
43
38
@fmatthew5876, did you get anywhere with this? I'm having the same issues on the 40gbps adapters on Windows 10. It seems like something is not allowing the hardware offload to function on Windows 10, though I haven't figured it out yet.
 

saivert

Member
Nov 2, 2015
124
13
18
38
Norway
I have done lots of testing and still only see between 5 and 6.5 Gbps downstream. Upstream is never an issue and runs at full 9.8Gbps speeds.
I have a Core i7 2600K on the client side and see one CPU core fully loaded when I run the iperf3 test so I guess this might be CPU limited because some HW offloading isn't working. Receive side scaling is turned on according to Get-NetAdapterRss powershell cmdlet.

When running iperf3 with the -P 2 option for two streams I'm able to break 7.2Gbps maximum. Increasing the number of parallel streams doesn't help.

If I boot Linux on the client PC it works full speed both direction always no matter how many tests I do.
I also installed fans on the NICs because they ran very hot (heatsink was uncomfortable to touch). At least they stay around 41C now which is within the manufacturer specified limits.
I guess only solution is to get a different NIC for proper support under Windows 10.

Firmware version: 2.9.1200 Driver version: 5.35.12978.0
 
Last edited:

i386

Well-Known Member
Mar 18, 2016
3,400
1,141
113
33
Germany
Hmmm...
I never got more than 7.2gbits in iperf3 (windows 10), but when I copy files from my workstation (mellanox cx2 10gbe nic) with an iodrive2 1.2tb to my file server (fyi 2 core pentium d, 10gbe sfp+ onboard) with a raid 6 + maxcache I can get writes over 800 mb/s.

Also iperf3 was not made to benchmark bandwidth: Multithreaded iperf3 · Issue #289 · esnet/iperf · GitHub

About the temperatures: Vendors usually specify the ambient temperature, not the temperature of the controller itself. The connect-x are specified for up to 55°c, the controller itself can get way hotter (my cx-2 is right now about 79°c hot :D)
 

fmatthew5876

Member
Mar 20, 2017
80
18
8
37
I never was able to get good speeds with the Mellanox ConnectX-2 in windows 10.

I just got an Intel X710 DA2 off of ebay and installed it. I ran iperf between my FreeBSD box (also intel 10gb nic, client side) and this Windows 10 (cygwin) machine and get 9.19 Gbits/sec right off the bat. No configuration or tweaking of anything.

At this point I really would not recommend using ConnectX-2 with windows 10.
 

saivert

Member
Nov 2, 2015
124
13
18
38
Norway
fmatthew5876:
Intel has some conservative interrupt throttle values that kick in to limit CPU usage though. Did you tweak that with your X710 card? I tried with the X520 card.

Since I posted in this thread last time I actually get 9Gbps with my ConnectX-2 but only if CPU is idle. Just closing down software like MPC-HC helps for some reason even if video I'm streaming from LAN is on pause.
Firing up some real CPU demanding stuff like a video game makes throughput drop as low as 3Gbps.

Now I use a Intel X520 card (due to SR-IOV support out of the box) in the server and the ConnectX-2 in my desktop PC.
 
Last edited:

fmatthew5876

Member
Mar 20, 2017
80
18
8
37
fmatthew5876:
Intel has some conservative interrupt throttle values that kick in to limit CPU usage though. Did you tweak that with your X710 card? I tried with the X520 card.

Since I posted in this thread last time I actually get 9Gbps with my ConnectX-2 but only if CPU is idle. Just closing down software like MPC-HC helps for some reason even if video I'm streaming from LAN is on pause.
Firing up some real CPU demanding stuff like a video game makes throughput drop as low as 3Gbps.

Now I use a Intel X520 card (due to SR-IOV support out of the box) in the server and the ConnectX-2 in my desktop PC.
Hey, thanks for this. As mentioned before I did not tweak anything. I just plugged it into a pci-e slot and rebooted the box. Windows 10 even already had drivers that just worked.

I have a quad core Xeon-E3 Skylake cpu with hyperthreading enabled.

I did some more tests:

Prime95 w/ 8 threads: 3.52gbps
Prime95 w/ 6 threads: 3.39gbps
Prime95 w/ 4 threads: 3.88gbps
Prime95 w/ 3 threads: 6.93gbps
Prime95 w/ 2 threads: 7.28gbps
Prime95 w/ 1 thread: 8.61gbps

While watching a 1080p movie (vlc) from a Samba share (same network path as iperf): 9.16gbps
While Mortal Kombat X (steam) is playing windowed: 8.75gbps


10gb can be cpu heavy so I think these numbers make sense. But it looks like unless you are doing some serious computation that's stressing your cpu to the max, the speeds should not be affected too much. Given these numbers and my use cases, I see no reason to tweak anything.

Just to clarify for everyone else, these numbers are for Intel 710 DA2, not the Mellanox.
 

saivert

Member
Nov 2, 2015
124
13
18
38
Norway
yes. you do have a much more powerful cpu than I do. But I just ordered parts for a Coffee Lake build so I will do some more benchmarks then.
 

ISRV

Member
Jul 11, 2015
69
8
8
41
sorry, have to bump up this thread because i can't google any answer.

what about pci-e lanes?
maybe those who can't reach full speed just don't provide enough lanes to the cards?

what i know currently:
mnpa19-xtr is a pci-e 2.0 8x card
my mainboard have 2 free 16x slots:
1) 8x pci-e 3.0 to cpu (i7-7700k with only 16x lanes total)
2) 4x pci-e 3.0 to pch (z270)

now the question is: what if i put this mellanox card into the last 16x slot (which is actually only 4x pch)?
i won't get full 10g speed in this case?
i really want to keep the middle 16x slot (8x cpu) for a pci-e ssd (micron p420m), which is also 8x :(

so i'm either will slow down the ssd or 10g network?
or i'm missing something and they both will be fine even with only 4x pci-e pch lanes?
 
Last edited:

i386

Well-Known Member
Mar 18, 2016
3,400
1,141
113
33
Germany
Pcie 2.0 x8 ~4 gbyte/s
Pcie 2.0 x4 ~2 gbyte/s ~20 gbit/s

PCIE 2.0 x4 is fast enough for 10gbit ethernet.
 

ISRV

Member
Jul 11, 2015
69
8
8
41
in theory.
but why they making this single port 10g cards in pci-e 8x ?
i'm still not really understand how 2.0 card will work in a 3.0 slot? at 2.0 speed?

i mean, p420m max speed is 3.3gbyte/s, so 4x of pci-e 2.0 is just not enough.
in this case i understand why they made it 8x.
but what if i put it in 4x 3.0 slot? which in theory should provide enough speed (4gbyte/s max), but the ssd is only pci-e 2.0
 

i386

Well-Known Member
Mar 18, 2016
3,400
1,141
113
33
Germany
but why they making this single port 10g cards in pci-e 8x ?
i'm still not really understand how 2.0 card will work in a 3.0 slot?
at 2.0 speed?
I think that with more pcie lanes the cpu works faster (lower latency), but I'm not sure if this is 100% right (TX, RX processing).
Pcie is backwards compatible. If you put a pcie device in the pcie slot it will run at the fastest speed that both ends support and use all lanes that are avaiable in the slot.
 

saivert

Member
Nov 2, 2015
124
13
18
38
Norway
yes PCI Express is backwards compatible. It also means you can put a PCIe 2.0 card in a PCIe 1.1 slot so the reason they went with a x8 connector on the single port card (which is obviously a single port design as there is no unpopulated second SFP slot) means you can use it in an older server with just PCIe 1.1 slots and still get full speed.

Also since upgrading to Coffee Lake I have no slowdown issues with ConnectX-2. It just takes a lot of CPU power to run 10Gbps at the same time as other CPU demanding tasks. On my last build (i7 2600k) just firing up MPC-HC was enough to drop the speed a lot even if video was paused. I thought that was a bit weird though but I guess it doesn't take much to make NIC speeds dip.
 

ISRV

Member
Jul 11, 2015
69
8
8
41
alright, i guess it will be fine in 4x slot.

taking a lot of cpu power is also a surprise for me :(
is this only related to particular connectx-2 card? or any 10g nic will have the same "issue"?
i bought a pair of this (like everyone else :) ) because of the price.

but maybe there's a reason to pay a bit more (under $70 on ebay) for x520-da1 ?
or those are also cpu-hungry?
 

saivert

Member
Nov 2, 2015
124
13
18
38
Norway
10Gb+ networking require more CPU power than a gigabit networking. This is obvious. I also had issues with X520 card but it actually has a setting for Interrupt Throttle Rate which you can tweak to circumvent this. I couldn't find a similar setting on ConnectX-2.

I never had issues on Linux but Linux manages network traffic (or the Linux drivers are better) a lot better and doesn't throttle network traffic overly much when running low on CPU resources like Windows does by default.
 

ISRV

Member
Jul 11, 2015
69
8
8
41
ok, thanks.
guess i should just try later by myself and then decide if it's acceptable or not
 

nikey22

New Member
Feb 19, 2018
14
0
1
50
re: Mellanox cards,

Have you guys tried turning off network throttling in windows 10? Set the DWORD to ffffffff in the registry (HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Multimedia\SystemProfile

If peer-to-peer, enable MTU 9000 on both machines
Interupt moderation: disable
Receiver buffers: max
sender buffers: max
Offload settings : enabled (all of them)

Make sure you are using a PCIe lane 8x on both machines.

8x is more than enough, the raw bandwidth here is 40Gb/s on a PCIe 2.0. After encoding overhead its about 32Gb/s, this translates to 4000MB/s, our goal is 1000MB/s. So this is clearly more than enough. In fact, a 40Gbe single port card should sature the PCIe 2.0 8x lane. For those of you that bought a dual 40Gbe card, you will need a 16x PCIe 2.0 lane, because it maxes out at 8000MB/s.
 

mfgmfg

New Member
Apr 11, 2018
1
0
1
52
Short version: If you're unable to get more than ~7Gbps with a ConnectX-2, ensure your card is running at PCIe x4 and enable jumbo frames.

I wanted to share my experience with Mellanox ConnectX-2 EN MNPA19-XTR cards and Windows 10. Got them off ebay and put one card into my Windows 10 machine running a 6700K on a ASUS Maximus Gene VIII Z170 motherboard. The other card went into a Fedora 27 system and worked without issue (2.9.1000 firmware). They were connected with a Cisco CU3M direct attach cable.

Here is the motherboard slot layout.

Since my video card is in the top slot and covers the middle PCIe slot, I could not put the NIC there so I put it in the open-ended x4 slot even though it is physically a x8 card.

I followed directions from another thread here to update the firmware in the card to 2.10.720. The Windows 2016 WinOF 5.40 drivers from Mellanox appear to work fine (5.4.14004.0).

I was perplexed as to why I was only getting about 3Gbps with a single iperf thread to the Linux box, got up to 5Gbps when I upped the MTU to 9000. Hit about 7Gbps with parallel threads (-P switch in iperf). I then realized the PCIe slot was only running at x2, limiting the maximum theoretical bandwidth to 8Gbps. This was verified with Get-NetAdapterHardwareInfo and the Information tab in the adapter properties.

Code:
PS C:\WINDOWS\system32> Get-NetAdapterHardwareInfo

Name                           Segment Bus Device Function Slot NumaNode PcieLinkSpeed PcieLinkWidth Version
----                           ------- --- ------ -------- ---- -------- ------------- ------------- -------
Ethernet 5                           0   2      0        0                    5.0 GT/s             2 1.1
I moved the video card to the middle slot and the NIC to the top x16 slot in my motherboard and it ran at x4, so I knew it wasn't some kind of compatibility issue. Benchmarks show there was no issue running the 980Ti at x8, but cooling was suboptimal in this setup so I really wanted the card in the bottom slot.

Eventually, I figured out that there is a BIOS setting that trades off 2 lanes to enable the SATA 5/6/Express slots. Since I was not using those slots, I toggled the BIOS setting (can't remember exactly what it was called) to enable PCIe x4 on the bottom slot. Success! With Jumbo Frames enabled (MTU=9000), I was able to hit 9Gbps with a single iperf thread and up to 9.8Gbps when tweaking various iperf parameters. SSH copies are currently being limited by CPU throughput (1.6Gbps) and disk bottlenecks (300-400 MB/s read/write) but I was able to get up to 6Gbps to/from my Windows SSD raid with a ramdisk on the Linux box. It is currently showing as RDMA capable but I have not tested this yet.

Code:
PS C:\WINDOWS\system32> Get-NetAdapterHardwareInfo

Name                           Segment Bus Device Function Slot NumaNode PcieLinkSpeed PcieLinkWidth Version
----                           ------- --- ------ -------- ---- -------- ------------- ------------- -------
Ethernet 5                           0   2      0        0                    5.0 GT/s             4 1.1

PS C:\WINDOWS\system32> Get-SmbServerNetworkInterface

Scope Name Interface Index RSS Capable RDMA Capable Speed   IpAddress
---------- --------------- ----------- ------------ -----   ---------
*          22              True        True         10 Gbps 192.168.2.2
Note: According to this Get-NetAdapterHardwareInfo won't return anything higher than 1.1 for the version, but since it's 5.0 GT/s it must be PCIe 2.0.

 

shrike0064

New Member
Jan 14, 2019
12
3
3
Experiencing the same thing. 1-14-19

Will not connect to my network. Will not acquire an IP nor connect using a static.
Widows 10 V 1809 client
Mellanox driver: 5.50.14643.1
Connectx-2 MNPA19-XSR/XTR SP# 671798-001
FW: 2.9.1000 (Tried to update FW to 2.9.1200 but to no avail.)
Tried deleting Widows 5.50 driver and install an earlier version without success. No matter what 5.50 always in use.
Windows shows card operational, working normally.
I have 4 of these and none will connect including the one from an operational computer which did connect before.
Modules: 850nm 10gb and 8gb all confirmed operational using two routers the using same fiber jumper cable.
Lasers ON on both modules.

Ideas?