Mellanox ConnectX-3 help

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Mkvmike

New Member
Feb 6, 2023
24
2
3
The windows page have no issues ( 8Gb/s = 3.0 and x8 ) .

We have some issues in Unraid ...
Code:
LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited   <-------------------------------------------
                       ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
               LnkSta: Speed 2.5GT/s (downgraded), Width x8                  <--------------------------------------------
                       TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
               DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
Device capable of 8GT/s but in degraded mode at 2.5GT/s . 2.5GT/s at x8 lanes sum 16Gb/s max. That explain the bottleneck.
I have just double checked bios and it's set to gen 3, and I also tried 'Auto'. lspci -vvv gave the same output for both, so I'm not sure why it is being downgraded. I do have a pcie 2.0 SAS card on the second slot (@x8), I will remove this and see if if that is the culprit.

Thank you
 

Mkvmike

New Member
Feb 6, 2023
24
2
3
Some wierd stuff going on....No pcie devices connected to mobo, set to GEN 3 in BIOS and card is still saying downgraded. Thanks for the help though as at least I know what the problem is now. :cool:
 

kapone

Well-Known Member
May 23, 2015
1,095
642
113
Turn off ASPM/L0/L1 type of stuff i.e. PCI-E power saving, and see if that makes a difference.
 
  • Like
Reactions: Mkvmike

Mkvmike

New Member
Feb 6, 2023
24
2
3
Turn off ASPM/L0/L1 type of stuff i.e. PCI-E power saving, and see if that makes a difference.

That was it!! I was just testing as you wrote, I don't ever remember messing with these settings at all :oops:


Just did a quick test: iperf3 -c 192.168.10.3 -P 4


2023-02-14 20_43_21-.png

2023-02-14 20_44_31-.png




A BIG THANK YOU, to everyone who took the time and had the patience to deal with an idiot like me, I'm truley humbled! :)
 
  • Like
Reactions: prdtabim and kapone

Mkvmike

New Member
Feb 6, 2023
24
2
3
Still you should be able to get more than 31 Gbits/s - with tweaks to the settings on the driver and possibly jumbo frames.

Absoloutely, but It's Valentines day here, and my wife may now need some attention, or I may live to regret it! :D
 

Stephan

Well-Known Member
Apr 21, 2017
947
715
93
Germany
Morale is very high on STH, because we just poured around 2k in real-world consulting fees into a "New Member" guy that we have never met.

:D

Heard on Level1 Show that boxes of Ferrero Rocher work good. Asked mine, confirmed. Will try a 2-pack for good measure.
 
  • Haha
  • Like
Reactions: Mkvmike and mach3.2

Bjorn Smith

Well-Known Member
Sep 3, 2019
883
487
63
49
r00t.dk
Heard on Level1 Show that boxes of Ferrero Rocher work good. Asked mine, confirmed. Will try a 2-pack for good measure.
I don't believe in Valentines day - if I lived in the US - obviously - but other places its just a holiday appropriation, so the stores can sell more shit we do not need :)
 
  • Like
Reactions: Stephan

Mkvmike

New Member
Feb 6, 2023
24
2
3
Survived Valentines day, I pretended to play dead, but it didnt work :D

After a couple of days recovery, my back is a bit better now, so did a bit more testing and pulling a consistent 32-34Gbps.

MTU size is 9014, but if there is anything else I need to adjust, I'm all ears :)
 

prdtabim

Active Member
Jan 29, 2022
175
67
28
Survived Valentines day, I pretended to play dead, but it didnt work :D

After a couple of days recovery, my back is a bit better now, so did a bit more testing and pulling a consistent 32-34Gbps.

MTU size is 9014, but if there is anything else I need to adjust, I'm all ears :)
Like you already configurated , MTU 9000+ .
txqueuelen could be 5000-10000 ( default is 1000 )
you could try irq-balance
 
  • Like
Reactions: Mkvmike

Bjorn Smith

Well-Known Member
Sep 3, 2019
883
487
63
49
r00t.dk
There are a lot of kernel options you can set, buffersizes etc.

These are what I used on freebsd, there are probably similar on linux

Code:
kern.ipc.soacceptqueue=2048
kern.ipc.somaxconn=2048
kern.ipc.maxsockbuf=33554432
net.inet.tcp.recvbuf_inc=2097152    # (default 16384)
net.inet.tcp.recvbuf_max=16777216  # (default 2097152)
net.inet.tcp.recvspace=4194304      # (default 65536)
net.inet.tcp.sendbuf_inc=2097152    # (default 8192)
net.inet.tcp.sendbuf_max=16777216  # (default 2097152)
net.inet.tcp.sendspace=4194304      # (default 32768)
net.inet.tcp.sendbuf_auto=1
net.inet.tcp.recvbuf_auto=1


Add these lines to /boot/loader.conf

net.isr.maxthreads="-1"
net.isr.bindthreads="1"
Tweak numbers according to how much memory you have - and you can get more oompf.
 
  • Like
Reactions: Mkvmike

MichalPL

Active Member
Feb 10, 2019
189
25
28
sounds you have the proper cable to do 56GbE instead of 40GbE,
when you turn off autoneg on Linux and set to 56000 (and advertised speed too) windows should connect at 56GbE and display the speed in driver and connection window.

btw. my max speed file transfer on Connext3 using 40GbE was 4.42GB/s (on Xeon 1650 v2 or 1620 don't remember) under Windows, 10gen is slightly faster so no problem to max even on the 56GbE.
 
  • Like
Reactions: Mkvmike and Stephan

tinfoil3d

QSFP28
May 11, 2020
883
409
63
Japan
I actually had exactly the same problem when i asked my first question here. All i needed to do is to look for actual pcie link status(grep for "limited" in dmesg) to figure out what's up.
If i have any issues with speed on any system the first thing i do these days is check pcie established width and gt/s, check cpu usage, frequencies and etc, to find a bottleneck.
 
  • Like
Reactions: UhClem and Mkvmike

UhClem

just another Bozo on the bus
Jun 26, 2012
440
253
63
NH, USA
... All i needed to do is to look for actual pcie link status(grep for "limited" in dmesg) ...
Thanks for this (much better than lspci -vv | grep downgrade; ...).
If you add -B4 to your grep, you might also pick up the [Vendor-Device] of the cripple.
 

tinfoil3d

QSFP28
May 11, 2020
883
409
63
Japan
I've been through a lot of various issues, sometimes this isn't caused by anything obvious, but rather a speck of dust between one of CPU contact pads that coincidentally is responsible for that particular slot lane. And re-sitting CPU solves the issue, after any other reasonable and unreasonable approaches attempted. Had this weird magic happen to me after upgrading CPUs in dual-socket system. I re-sitted just the CPU that was controlling the slot in question of course, and that helped after no amount of googling did.
 

Stephan

Well-Known Member
Apr 21, 2017
947
715
93
Germany
Yes 3647 and later really do need a pretty clean environment for assembly. RAM, NVME and PCIe slots not much better. I always keep a couple cans of electronics grade dust spray handy and use it liberally. Sometimes dirt is stuck between BGA chips' solder balls on DIMMs, causing problems. Have cleaned my share of contacts of refurbished gear with isopropanol and a microfiber cloth too. I can also recommend to get one of those rootable hoover robots with privacy-friendly Valetudo from Github. Running that once a day really lowers overall dust in the house. Also beneficial if you suffer from plant pollen. Some can even wet mop.

Being the resident STH reliability nut, test thoroughly with Memtest86+ and/or stress-ng VM tests to exercise board, RAM, CPU. For dual port Mellanox cards I just loop the connections and use a script to set up network namespaces to make traffic go truly over the connectors, not loopback in the Linux kernel. Let that run for 24 hours to see if anything turns up.
 
  • Like
Reactions: mach3.2

tinfoil3d

QSFP28
May 11, 2020
883
409
63
Japan
set up network namespaces to make traffic go truly over the connectors
Yeah that's what i do to test dual-port nics. Although not for 24 hours :)
one "dusty" RAM caused 128gb(in 32g sticks) out of 512gb to not being detected at POST. I guessed which one on first try, re-sitting that solved the issue. And that was with DDR3.
 
  • Like
Reactions: Stephan