- Jan 16, 2023
Hi, I've been trying to setup a diy NAS box in which my main computer and the nas box (running windows server) both have Mellanox ConnectX-3 VPI (CX354A-QCBT flashed to FCBT) cards, connected directly to each other with a Mellanox branded FDR DAC cable. I've been trying to use Starwind VSAN + Starwind NVMe-of initiator to leverage the RDMA that the connectx-3 has, but so far any attempt to initialize and format the disk that appears after attaching the target results in windows cancelling the format and marking it as read only. Using any other protocol that leverages RDMA results in issues as well. SMB Direct will transfer at over 1GB/s for maybe half a second, before suddenly completely halting for a few seconds and later continuing at 10MB/s or less, or stopping altogether and writing an error to the event log about how either an rdma or wsk connection either was disconnected or had a device timeout. Examining the performance monitor reveals that it is in fact making active RDMA connections, however when beginning a transfer the bytes/sec stats will shoot up for just a moment, before completely cutting out to 0 bytes/sec for another 20-30 seconds. Responder CQE errors don't seem to go up until terminating the connection, regardless of it being SMB Direct or NVMe-of. I've tried both Infiniband and RoCE v1 with PFC configured. In either case, I can get a connection with rping, however rperf fails, and nd_send_bw will work with the default settings. but errors out when ran with the -a flag (returns error c000003e after the 64k test). Without using rdma, I'm able to get a pretty consistent 600MB-700MB/s over iSCSI, but I know my storage space can do more than that, and I'm hoping I can figure out what might be wrong with rdma so that I can get the maximum out of my setup.