Mellanox-Connectx2 Setup

Paul Toone

New Member
Oct 12, 2018
4
0
1
I recently got two MNPA19-XTR Mellanox-ConnectX2 cards and installed them into my servers. I followed another thread on here and they are both running fw=2.10.0720, the vstat is here:

Code:
        hca_idx=0
        uplink={BUS=PCI_E Gen2, SPEED=5.0 Gbps, WIDTH=x8, CAPS=5.0*x8}
        MSI-X={ENABLED=1, SUPPORTED=128, GRANTED=42, ALL_MASKED=N}
        vendor_id=0x02c9
        vendor_part_id=26448
        hw_ver=0xb0
        fw_ver=2.10.0720
        PSID=MT_0F60110010
        node_guid=0002:c903:0056:a256
        num_phys_ports=1
                port=1
                port_guid=0202:c9ff:fe56:a256
                port_state=PORT_ACTIVE (4)
                link_speed=NA
                link_width=NA
                rate=10.00 Gbps
                port_phys_state=LINK_UP (5)
                active_speed=10.00 Gbps
                sm_lid=0x0000
                port_lid=0x0000
                port_lmc=0x0
                transport=RoCE v1.25
                max_mtu=2048 (4)
                active_mtu=1024 (3)
                GID[0]=0000:0000:0000:0000:0000:ffff:c0a8:011e
                GID[1]=fe80:0000:0000:0000:489c:9283:78cb:162e
My question is how to properly test speed, because it doesn't seem to be transferring at near 10gb/s

I was unsure of the best way to test so I followed this, using ntttpcd.exe and got the following:
Code:
C:\Program Files>NTttcp.exe -s -m 28,*,192.168.1.30 -l 512k -a 2 -t 5
Copyright Version 5.33
Network activity progressing...


Thread  Time(s) Throughput(KB/s) Avg B / Compl
======  ======= ================ =============
     0    4.909         6883.683    524288.000
     1    5.004        52693.845    524288.000
     2    4.988        38287.089    524288.000
     3    5.004        52796.163    524288.000
     4    5.003        16783.530    524288.000
     5    5.004        38266.986    524288.000
     6    5.004        38164.668    524288.000
     7    5.004        52796.163    524288.000
     8    4.988        28227.747    524288.000
     9    5.004        57093.525    524288.000
    10    5.004        57400.480    524288.000
    11    5.004        38266.986    524288.000
    12    5.004        25272.582    524288.000
    13    4.956         9814.366    524288.000
    14    5.004        52796.163    524288.000
    15    5.004        38266.986    524288.000
    16    4.988        11085.806    524288.000
    17    5.004        42768.985    524288.000
    18    4.941         8497.065    524288.000
    19    5.020         6731.474    524288.000
    20    5.004        42871.303    524288.000
    21    5.004       234308.553    524288.000
    22    5.004        51261.391    524288.000
    23    4.988        21658.380    524288.000
    24    5.004        42768.985    524288.000
    25    5.019        38152.620    524288.000
    26    5.004        42768.985    524288.000
    27    4.988        10367.281    524288.000


#####  Totals:  #####


   Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
     5651.500000       5.003       1456.532         1129.622


Throughput(Buffers/s) Cycles/Byte       Buffers
===================== =========== =============
             2259.244       1.366     11303.000


DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)
============= ============= =============== ==============
    56292.824         0.765       73672.796          0.584


Packets Sent Packets Received Retransmits Errors Avg. CPU %
============ ================ =========== ====== ==========
     4068588           215310           4      0      6.741
The Throughput(MB/s) in the ####Totals: #### section shows 1129.622, which is close to the post which was using ntttcp to test, but the highest recorded speed in the upper section is 23408 KB/s, which is much lower than I expected; unless I am misunderstanding the test..

I also tested via iperf and here are the results:
Code:
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.30, port 49810
[  5] local 192.168.1.10 port 5201 connected to 192.168.1.30 port 49817
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   390 MBytes  3.27 Gbits/sec                 
[  5]   1.00-2.00   sec   415 MBytes  3.48 Gbits/sec                 
[  5]   2.00-3.00   sec   406 MBytes  3.41 Gbits/sec                 
[  5]   3.00-4.00   sec   447 MBytes  3.75 Gbits/sec                 
[  5]   4.00-5.00   sec   418 MBytes  3.51 Gbits/sec                 
[  5]   5.00-6.00   sec   396 MBytes  3.32 Gbits/sec                 
[  5]   6.00-7.00   sec   390 MBytes  3.27 Gbits/sec                 
[  5]   7.00-8.00   sec   390 MBytes  3.27 Gbits/sec                 
[  5]   8.00-9.00   sec   387 MBytes  3.25 Gbits/sec                 
[  5]   9.00-10.00  sec   396 MBytes  3.32 Gbits/sec                 
[  5]  10.00-10.01  sec  3.02 MBytes  3.93 Gbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.01  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.01  sec  3.94 GBytes  3.39 Gbits/sec                  receiver
Is there another, better way to test the speed between the two cards? I have seen a few things mentioning ethernet vs IB, wondering if I need to change the card type? I couldn't find a lot of information on this.

INFO:
2 SuperMicro servers, running Windows Server 2016
Both servers are using the MNPA19-XTR Mellanox-ConnectX2 cards
These two servers are connected via a Ubiquiti ES-16-XG 10gb/s switch.

Thank you in advance for your input!
 
Last edited:

Paul Toone

New Member
Oct 12, 2018
4
0
1
As a side-note, I just hooked these two cards directly (removing the switch) and got virtually the same statistics:
Code:
C:\Program Files>NTttcp.exe -s -m 28,*,192.168.1.30 -l 512k -a 2 -t 5
Copyright Version 5.33
Network activity progressing...


Thread  Time(s) Throughput(KB/s) Avg B / Compl
======  ======= ================ =============
     0    5.022        11214.656    524288.000
     1    5.037         1728.013    524288.000
     2    5.100         1706.667    524288.000
     3    5.068         1717.443    524288.000
     4    5.006         1738.714    524288.000
     5    5.006        27819.417    524288.000
     6    5.037        11892.793    524288.000
     7    5.006        98288.454    524288.000
     8    5.006         7670.795    524288.000
     9    4.912         3335.505    524288.000
    10    5.006        53388.734    524288.000
    11    5.006        10432.281    524288.000
    12    5.006        27819.417    524288.000
    13    5.006        73332.801    524288.000
    14    5.006       169064.323    524288.000
    15    5.006        53491.011    524288.000
    16    5.038        11280.667    524288.000
    17    5.006        98493.008    524288.000
    18    5.006        53491.011    524288.000
    19    5.006        73435.078    524288.000
    20    5.006        73230.523    524288.000
    21    5.006        27819.417    524288.000
    22    4.990        27908.617    524288.000
    23    5.006        27819.417    524288.000
    24    5.006        53491.011    524288.000
    25    5.038        27845.971    524288.000
    26    5.006        27819.417    524288.000
    27    5.006        99311.227    524288.000


#####  Totals:  #####


   Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
     5655.500000       5.005       1456.630         1129.970


Throughput(Buffers/s) Cycles/Byte       Buffers
===================== =========== =============
             2259.940       1.333     11311.000


DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)
============= ============= =============== ==============
    54794.206         0.745       73326.074          0.557


Packets Sent Packets Received Retransmits Errors Avg. CPU %
============ ================ =========== ====== ==========
     4071194           204389           0      0      6.582
 

Paul Toone

New Member
Oct 12, 2018
4
0
1
Each computer has 2 processors with 10 cores each. So, 20 cores per server.

Is there a certain command I should use to make my copy-item command use more threads? Also, each computer has a 6gb/s sas controller. Will that slow it down?

I did a copy-item on a 10 GB file and it took 40 seconds