Mellanox ConnectX-3 VPI RDMA extremely intermittent connection and drops on both ethernet and Infiniband

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

churipputori

New Member
Jan 16, 2023
4
0
1
Hi, I've been trying to setup a diy NAS box in which my main computer and the nas box (running windows server) both have Mellanox ConnectX-3 VPI (CX354A-QCBT flashed to FCBT) cards, connected directly to each other with a Mellanox branded FDR DAC cable. I've been trying to use Starwind VSAN + Starwind NVMe-of initiator to leverage the RDMA that the connectx-3 has, but so far any attempt to initialize and format the disk that appears after attaching the target results in windows cancelling the format and marking it as read only. Using any other protocol that leverages RDMA results in issues as well. SMB Direct will transfer at over 1GB/s for maybe half a second, before suddenly completely halting for a few seconds and later continuing at 10MB/s or less, or stopping altogether and writing an error to the event log about how either an rdma or wsk connection either was disconnected or had a device timeout. Examining the performance monitor reveals that it is in fact making active RDMA connections, however when beginning a transfer the bytes/sec stats will shoot up for just a moment, before completely cutting out to 0 bytes/sec for another 20-30 seconds. Responder CQE errors don't seem to go up until terminating the connection, regardless of it being SMB Direct or NVMe-of. I've tried both Infiniband and RoCE v1 with PFC configured. In either case, I can get a connection with rping, however rperf fails, and nd_send_bw will work with the default settings. but errors out when ran with the -a flag (returns error c000003e after the 64k test). Without using rdma, I'm able to get a pretty consistent 600MB-700MB/s over iSCSI, but I know my storage space can do more than that, and I'm hoping I can figure out what might be wrong with rdma so that I can get the maximum out of my setup.
 

Stephan

Well-Known Member
Apr 21, 2017
920
698
93
Germany
Leave RDMA out for the moment. Is any adapter reporting transmission or reception errors? Try to go back to 40 Gbps. Try ntttcp to benchmark the link first before using services over it. A single stream should approach 10 Gbps at 40 Gbps or 14 Gbps at 56 Gbps. Disable Windows Defender.
 
  • Like
Reactions: fops

churipputori

New Member
Jan 16, 2023
4
0
1
I forgot to mention that due to running out of pcie lanes on my main computer's motherboard (I intend to upgrade to a motherboard that'll let me use the full x8 pcie 3.0 lanes the card offers) the max bandwidth I'm able to get is abt 12Gb/s. Regardless, running ntttcp with RoCE disabled I am able to get between 260MB/s and 390MB/s per stream, for a total of usually around 1200MB/s. I'm not getting any TCP errors, nor am I getting any packet drops or CRC errors in performance monitor.
 

churipputori

New Member
Jan 16, 2023
4
0
1
Bit of an update. I flashed the adapters back to their original firmware in hopes that that may have been what was causing the issues, no dice. At the very least, now I know the bottleneck was not the issue. I ran verbose rperf a few times and each time it would end in error C00000b5 (DEVICE_IO_TIMEOUT) but it could get a few pings out. Sometimes it's a little as one but the most I've gotten is 13. However many its able to get out seems completely random but a pattern I noticed is that the first ping has a relatively low time value of 23, and the subsequent ones will be at the minimum 900ish and at most 1900. Presumably it times out if the latency goes over 2000.

I'll have to do more testing with performance with RDMA disabled. I didn't see any TCP error counters go up in performance monitor when I tested it, however I am unsure if that is simply due to the nature of TCP.

I'm kind of stumped as to what the issue may be. Could it be a faulty cable? I don't have a spare on hand, so if there's some way to test it other than just use a known working cable would be great.


Code:
Client:

StarWind RDMA performance and latency test tool v1.0.2
verbose
RPingBindClient - NdResolveAddress successful
created cq 0000023A7D5F0860, len 2
created connector 0000023A7D5F0FF0
created endpoint 0000023A7D5F1070
Connecting ...
Client sends RDMA op type 1, qlen 1, size 64
Server's RDMA SEND, negotiated queue len 1, RDMA buf: size 0, addr 0x0000000000000000, rkey 0x0
INDEndpoint::CompleteConnect successful
RPingSetupBuffers called on cb 0000023A7D20CCA0
allocated & registered buffers...
Starting RDMA SEND test for 1000 iters with bufsize 64, queue 1...
post rdma req 0
rdma send 0 completion
1) time (us): time 23
rdma send 0 completion
2) time (us): time 914
rdma send 0 completion
3) time (us): time 914
rdma send 0 completion
4) time (us): time 1596
rdma send 0 completion
5) time (us): time 920
rdma send 0 completion
6) time (us): time 962
rdma send 0 completion
7) time (us): time 1077
rdma send 0 completion
8) time (us): time 1949
rdma send 0 completion
9) time (us): time 1698
rdma send 0 completion
10) time (us): time 923
rdma send 0 completion
11) time (us): time 947
rdma send 0 completion
12) time (us): time 949
rdma send 0 completion
13) time (us): time 926
GetResults returned result with c00000b5 (pending reqs: 0).
rping client failed: -1073741643
Waiting for disconnect...
RpingFreeBuffers called on cb 0000023A7D20CCA0
nd_ring end c00000b5

Server:

incoming request for 000001D3A5D30FD0
Client's RDMA op type 1, qlen 1, size 64
created connector 000001D3A5D31060
ServerThread for cb 000001D3A58CE580
created cq 000001D3A5D312D0, len 4
created endpoint 000001D3A5D31340
RPingSetupBuffers called on cb 000001D3A58CE580
allocated & registered buffers...
Server sends RDMA op type 1, qlen 2, rdma buf size 0, addr 0000000000000000, rkey 0x0
recv 0 completion
recv 1 completion
recv 0 completion
recv 1 completion
recv 0 completion
recv 1 completion
recv 0 completion
recv 1 completion
recv 0 completion
recv 1 completion
recv 0 completion
recv 1 completion
recv 0 completion
Server stopping hr 0x0
Waiting for disconnect...
RpingFreeBuffers called on cb 000001D3A58CE580
 

fops

New Member
Jan 29, 2023
4
0
1
You can try to configure the TCP connection and test it by iPerf. If the results are fine then it seems to be the problem on the RDMA level. In this case, check out drivers or firmware. If the results are not fine then you need to fix the cord, network card, or transceiver.
 

Connorise

Member
Mar 2, 2017
75
17
8
33
US. Cambridge
Another heavy plus for drivers and firmware. Many times, things do not work quite right because of drivers/firmware. MTU settings or misalignment might also be the key.
Speaking of RDMA, StarWind NVMe-oF, if I remember it right, works over TCP. Give it a try.
Finally, consider reaching out to the support team.
 

churipputori

New Member
Jan 16, 2023
4
0
1
I decided to switch from a windows server + storage spaces setup to a ZFS zvol. Setup a TCP target and boom, works first try! Currently restoring about 2TB of data without issue. The write speed leaves a bit to be desired for, however unfortunately ZFS doesn't support using SSDs for tiered storage.

Does RDMA offer any significant performance boosts under 10Gbps? Because if not I'd be happy sticking with TCP, however I've heard that there are some improvements in latency.