Mellanox ConnectX-3 help

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Stephan

Well-Known Member
Apr 21, 2017
947
715
93
Germany
Look in your screen shot, link speed is 40 Gbps full duplex, so that is not it. FDR10 is a Mellanox Infiniband-ism for their signal modulation at 40 Gbps. Did you try setting link type to pure ethernet already?

mlxconfig -d mt4099_pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

On both cards + power off + power on. Aside from that, running out of ideas.
 
  • Like
Reactions: zero and Mkvmike

Mkvmike

New Member
Feb 6, 2023
24
2
3
Thanks, checked lots of the threads regarding this, and still can't see anything wrong with my set up.
I'm going to double check both machine BIOS settings and swap out the unraid machines CPU. Its currently an i3 10100, so I'll replace it with an i5 10600k. The W11 machine has a 10700k. Both have 32gb ram.
 

Mkvmike

New Member
Feb 6, 2023
24
2
3
Ubuntu live exactly the same, :(

Just to confirm on the unraid side 40g link speed also:

2023-02-13 20_04_51-.png
 
Last edited:

kapone

Well-Known Member
May 23, 2015
1,095
642
113
Something to try? The -P flag may not be enough...You need multiple servers and clients running on both machines, and execute them simultaneously.

iperf3 at 40Gbps and above
Achieving line rate on a 40G or 100G test host requires parallel streams. However, using iperf3, it isn't as simple as just adding a -P flag because each iperf3 process is single-threaded. This means all the parallel streams for one test use the same CPU core. If you are core limited, which is often the case for a 40G or 100G host, adding parallel streams won't help you unless you add additional iperf3 processes which can use additional cores.
Note that it is not possible to do this using pscheduler to manage iperf3 tests, so this is typically better suited to lab or testbed environments.
To run multiple iperf3 processes for a testing a high-speed host, do the following:
Start multiple servers:
iperf3 -s -p 5101&; iperf3 -s -p 5102&; iperf3 -s -p 5103 &
and then run multiple clients, using the "-T" flag to label the output:
iperf3 -c hostname -T s1 -p 5101 &;
iperf3 -c hostname -T s2 -p 5102 &;
iperf3 -c hostname -T s3 -p 5103 &;
Also, there are a number of additional host tuning settings needed for 40/100G hosts. The TCP autotuning settings may not be large enough for 40G, and you may want to try using the iperf3 -w option to set the window even larger (e.g.: -w 128M). Be sure to check your IRQ settings as well.
 

Mkvmike

New Member
Feb 6, 2023
24
2
3
Something to try? The -P flag may not be enough...You need multiple servers and clients running on both machines, and execute them simultaneously.




With 5 instances open, only 15Gbps :eek: using: -c 192.168.10.3 -T s1 -w 32M -p 5101 -t 50 & etc....




2023-02-13 21_01_07-.png


2023-02-13 21_03_11-.png
 

scline

Member
Apr 7, 2016
92
33
18
36
What is your CPU usage on both sides during a long-ran test (where you get 13-15Gbps) - sounds more like the boxes are not able to send that amount of traffic vs cable/nic issue. Netdata is also a great app to install on linux to view potential bottlenecks in real time.
 
  • Like
Reactions: Mkvmike

prdtabim

Active Member
Jan 29, 2022
175
67
28
With 5 instances open, only 15Gbps :eek: using: -c 192.168.10.3 -T s1 -w 32M -p 5101 -t 50 & etc....




View attachment 27151


View attachment 27152
Did you verify the active pcie configuration of the cards ( in both computers ) ?
to achieve 40Gb/s the pcie must operate at pcie 3.0 x8
15Gb/s sounds more like pcie 2.0 x4 . The slot used have enough lanes ? that slot share lanes with other devices or slots ?

You can expect 35-37Gb/s between the cards using a DAC or AOC cable.
 
  • Like
Reactions: Mkvmike

Mkvmike

New Member
Feb 6, 2023
24
2
3
Both motherboards are Z490 so pcie 3.0

Both slots are x8, one shared with a GPU @x8, the other shared with a SAS 6gbps pcie 2.0 card.
 

Mkvmike

New Member
Feb 6, 2023
24
2
3
I'm going to change the CPU to a 10600k tonight when I get home from work, then swap out the cable as a last resort.
 

kapone

Well-Known Member
May 23, 2015
1,095
642
113
I'm going to change the CPU to a 10600k tonight when I get home from work, then swap out the cable as a last resort.
Don't bother. It's not the CPU. I can saturate 40gb links with 10y/o CPUs. It's something else.
 
  • Like
Reactions: Mkvmike

prdtabim

Active Member
Jan 29, 2022
175
67
28
Both motherboards are Z490 so pcie 3.0

Both slots are x8, one shared with a GPU @x8, the other shared with a SAS 6gbps pcie 2.0 card.
If you want 35-37Gb/s you must have 8 pcie lanes @3.0 dedicated to the card..
I use connectx-pro cards in a x8 slot shared with a 16x used by vga card, so both operate at pcie x8.
 
  • Like
Reactions: Mkvmike

Mkvmike

New Member
Feb 6, 2023
24
2
3
If you want 35-37Gb/s you must have 8 pcie lanes @3.0 dedicated to the card..
I use connectx-pro cards in a x8 slot shared with a 16x used by vga card, so both operate at pcie x8.
One card is in the top x16 slot running at x8 speed, the other card is in the second x16 slot running at x8 speed. All are PCIE 3.0
 

Mkvmike

New Member
Feb 6, 2023
24
2
3
I have just noticed one of the cards is infact Rev.A8 and one is Rev.A3 Does this matter? As firmware mentions A2-A5 and I cannot find any other version?

photo_1_2023-02-14_17-12-26.jpg


photo_2_2023-02-14_17-12-26.jpg
 

Mkvmike

New Member
Feb 6, 2023
24
2
3
If all are running pcie 3.0 x8 that's no issue.
Did you verify this using lspci -vvv ?

output on unraid:
01:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
IOMMU group: 1
Region 0: Memory at 90900000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at 6000000000 (64-bit, prefetchable) [size=8M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
Product Name: CX354A - ConnectX-3 QSFP
Read-only fields:
[PN] Part number: MCX354A-FCBT
[EC] Engineering changes: A8
[SN] Serial number: MT1450K00602
[V0] Vendor specific: PCIe Gen3 x8
[RV] Reserved: checksum good, 0 byte(s) reserved
Read/write fields:
[V1] Vendor specific: N/A
[YA] Asset tag: N/A
[RW] Read-write area: 105 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 252 byte(s) free
End
Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
Vector table: BAR=0 offset=0007c000
PBA: BAR=0 offset=0007d000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 116W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x8
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [148 v1] Device Serial Number f4-52-14-03-00-96-fb-00
Capabilities: [154 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [18c v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core

Info from W11:
2023-02-14 18_10_36-.png
 

prdtabim

Active Member
Jan 29, 2022
175
67
28
output on unraid:



Info from W11:
View attachment 27186
The windows page have no issues ( 8Gb/s = 3.0 and x8 ) .

We have some issues in Unraid ...
Code:
LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited   <-------------------------------------------
                       ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
               LnkSta: Speed 2.5GT/s (downgraded), Width x8                  <--------------------------------------------
                       TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
               DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
Device capable of 8GT/s but in degraded mode at 2.5GT/s . 2.5GT/s at x8 lanes sum 16Gb/s max. That explain the bottleneck.