So I'm bamboozled here, I've been debugging this stuff for several hours now and still don't know if I'm doing something outrageously wrong or if there's something wrong with my cards/setup.
I have two PCs, an Ubuntu linux machine with:
CPU: AMD Ryzen 3400G
RAM: 16GB DDR4 2933Mhz
DISK: 1TB NVMe Samsung 970 Evo
And a Windows 10 Enterprise machine with:
CPU: AMD Ryzen Threadripper 1950x
RAM: 16GB DDR4 2933MHz
Disk: 2TB NVMe Samsung 970 Evo
Network Share: 1TB NVMe Samsung 970 Evo
The cards are attached directly to each other via a MC2207310-010 cable, there is no switch/router in the current setup. In Ubuntu the port is shown as type "eth" when using the Mellanox config tool. Both NICs were assigned an IP manually.
I've also updated the firmware of both cards to their current version. Both cards are in a PCI-E 3.0 x8 slot in their respective motherboards (technically the linux machine is PCI-E 4.0 but it's limited by the CPU which is Zen+ rather than Zen 2),
I've been attempting to debug the performance of these cards for quite some time, the cards themselves are from eBay so knowing that they can actually perform as claimed is kind of a necessity.
During the following tests I completely detached the outgoing network connections for these two machines, they shared only the 40Gbe connection.
IPERF v2:
I've tried iperf v2 with -P set to variations of 1,2,4 and 8 and -w values of 1M,2M,4M, and 8M.
I only seem to get 40Gb/s speeds when the window size (-w) aligns with (-n) and with multiple threads (-P)
It doesn't seem to make any difference if the client is the linux machine or if it is reversed.
I tried with various MTU values (1500, 9000, 9014) and have also used the mlnx_tune program ("HIGH_THROUGHPUT" profile) as well as tuning the card in the device properties on windows for Single Port).
I followed all of the instructions in the "Performance Tuning Guidelines for Mellanox Network Adapters" for both the linux and Windows machines. No virtual machines or virtualization is in use, both are baremetal.
ntttcp + ntttcp-for-linux:
When using ntttcp on Windows and ntttcp-for-linux on Linux, I used 8 cores on both the receiver and sender, otherwise I left the options as they were in the performance guide under "Performance Testing" I did have to add the -ns flag to windows and the -N flag to the linux command in order to avoid an interop issue.
Rate: ~6Gb/s
I'm seeing 5-7 errors and 20-22 retransmits when performing this test, CPU usage is between 3-5% on the Windows size and between 12-15% on the Ubuntu size. The test sent 11GB of data and ran for 15 seconds, this seems to be about normal for this test.
dd file copy benchmark to windows NVMe network share:
The dd file copy test which used the following command:
Rate: ~100MB/s
File copy from ramdisk to ramdisk:
I created a Linux RAM Disk and a Windows RAM Disk (for windows I used AMDs utility, for linux I used what was baked into the OS), and then I copied a 4GB file between the two.
Rate was ~1.35GB/s which is about 10.8Gb/s.
Screenshots of the adapters in their respective operating systems:
I've been reading posts on this forum for the last couple of weeks leading up to the delivery, and it's been a huge help, but now I'm at the end of my rope as I have no idea what my next step should be.
Any help would be hugely appreciated!
I have two PCs, an Ubuntu linux machine with:
CPU: AMD Ryzen 3400G
RAM: 16GB DDR4 2933Mhz
DISK: 1TB NVMe Samsung 970 Evo
And a Windows 10 Enterprise machine with:
CPU: AMD Ryzen Threadripper 1950x
RAM: 16GB DDR4 2933MHz
Disk: 2TB NVMe Samsung 970 Evo
Network Share: 1TB NVMe Samsung 970 Evo
The cards are attached directly to each other via a MC2207310-010 cable, there is no switch/router in the current setup. In Ubuntu the port is shown as type "eth" when using the Mellanox config tool. Both NICs were assigned an IP manually.
I've also updated the firmware of both cards to their current version. Both cards are in a PCI-E 3.0 x8 slot in their respective motherboards (technically the linux machine is PCI-E 4.0 but it's limited by the CPU which is Zen+ rather than Zen 2),
I've been attempting to debug the performance of these cards for quite some time, the cards themselves are from eBay so knowing that they can actually perform as claimed is kind of a necessity.
During the following tests I completely detached the outgoing network connections for these two machines, they shared only the 40Gbe connection.
IPERF v2:
I've tried iperf v2 with -P set to variations of 1,2,4 and 8 and -w values of 1M,2M,4M, and 8M.
I only seem to get 40Gb/s speeds when the window size (-w) aligns with (-n) and with multiple threads (-P)
It doesn't seem to make any difference if the client is the linux machine or if it is reversed.
I tried with various MTU values (1500, 9000, 9014) and have also used the mlnx_tune program ("HIGH_THROUGHPUT" profile) as well as tuning the card in the device properties on windows for Single Port).
I followed all of the instructions in the "Performance Tuning Guidelines for Mellanox Network Adapters" for both the linux and Windows machines. No virtual machines or virtualization is in use, both are baremetal.
ntttcp + ntttcp-for-linux:
When using ntttcp on Windows and ntttcp-for-linux on Linux, I used 8 cores on both the receiver and sender, otherwise I left the options as they were in the performance guide under "Performance Testing" I did have to add the -ns flag to windows and the -N flag to the linux command in order to avoid an interop issue.
Rate: ~6Gb/s
I'm seeing 5-7 errors and 20-22 retransmits when performing this test, CPU usage is between 3-5% on the Windows size and between 12-15% on the Ubuntu size. The test sent 11GB of data and ran for 15 seconds, this seems to be about normal for this test.
dd file copy benchmark to windows NVMe network share:
The dd file copy test which used the following command:
I also tried with direct rather than dsync and I tried specifying the SMB version to 3.0 in /etc/fstab/ - neither had any impact.dd if=/dev/zero of=~/nvme_share/test_file bs=10G count=1 oflag=dsync
Rate: ~100MB/s
File copy from ramdisk to ramdisk:
I created a Linux RAM Disk and a Windows RAM Disk (for windows I used AMDs utility, for linux I used what was baked into the OS), and then I copied a 4GB file between the two.
Rate was ~1.35GB/s which is about 10.8Gb/s.
Screenshots of the adapters in their respective operating systems:
I've been reading posts on this forum for the last couple of weeks leading up to the delivery, and it's been a huge help, but now I'm at the end of my rope as I have no idea what my next step should be.
Any help would be hugely appreciated!