Has anyone come across this issue?
I grabbed a cheap set of Mellanox MHES18-XTC InfiniHost III Lx adapters off ebay, and they appear to be working perfectly running point-to-point, between two Windows 7 Professional computers.
I installed the MLNX WinOF VPI 2.1.2 OFED package, as it appears to be last version which supported these adapters.
I've run ib_send_bw.exe bandwidth tests between the cards, and the connection seems to be reliable, consistent, and running full-rate.
I've set up IPoIB on them both, and gotten it working... actually quite surprised how well - it'll manage up to 500MB/s SMB file transfers between ram disks, and only seems to eat up about 10-15% of my 3770k Ivy Bridge CPU. Quite pleased so far.
The big problem though, is with a test directory of 20x ~1.5GB files, copying from one Samsung 830 SSD to another, it'll sometimes work fine, but more often than not it'll get part way through a file copy before the connection dies. It really messes up Windows too, it's actually caused Task Manager to hang and become unrecoverable, seems to make the Network and Sharing Center take forever to even load, and various other issues until I do a full reboot. As far as I can tell, it appears to do this only on the receiving computer, not the sending one.
I'm just about to start picking through everything to try and work out what's going on, but thought I'd check online first to see if someone knows the problem and the fix from past experience.
There's a whole bunch of IPoIB settings for me to play with, wondering if Large Send Offload might help, and also wondering whether there's some buffering issue at play, and it can overrun when a transfer becomes disk-write-limited.
Anyone have any suggestions?
Edit:
The adapters were running the 1.2.000 firmware, which appeared to be the latest, but I spotted a much more recent 1.2.940 version in the custom firmware section. Flashed it onto both cards without a problem, even spotted a bug-fix regarding "HCA might get stuck under stress conditions" - but alas, after reboot, they did exactly the same thing and stalled the receiving machine after about 4GB transfered... it mostly seems to be around that point it goes.
I grabbed a cheap set of Mellanox MHES18-XTC InfiniHost III Lx adapters off ebay, and they appear to be working perfectly running point-to-point, between two Windows 7 Professional computers.
I installed the MLNX WinOF VPI 2.1.2 OFED package, as it appears to be last version which supported these adapters.
I've run ib_send_bw.exe bandwidth tests between the cards, and the connection seems to be reliable, consistent, and running full-rate.
I've set up IPoIB on them both, and gotten it working... actually quite surprised how well - it'll manage up to 500MB/s SMB file transfers between ram disks, and only seems to eat up about 10-15% of my 3770k Ivy Bridge CPU. Quite pleased so far.
The big problem though, is with a test directory of 20x ~1.5GB files, copying from one Samsung 830 SSD to another, it'll sometimes work fine, but more often than not it'll get part way through a file copy before the connection dies. It really messes up Windows too, it's actually caused Task Manager to hang and become unrecoverable, seems to make the Network and Sharing Center take forever to even load, and various other issues until I do a full reboot. As far as I can tell, it appears to do this only on the receiving computer, not the sending one.
I'm just about to start picking through everything to try and work out what's going on, but thought I'd check online first to see if someone knows the problem and the fix from past experience.
There's a whole bunch of IPoIB settings for me to play with, wondering if Large Send Offload might help, and also wondering whether there's some buffering issue at play, and it can overrun when a transfer becomes disk-write-limited.
Anyone have any suggestions?
Edit:
The adapters were running the 1.2.000 firmware, which appeared to be the latest, but I spotted a much more recent 1.2.940 version in the custom firmware section. Flashed it onto both cards without a problem, even spotted a bug-fix regarding "HCA might get stuck under stress conditions" - but alas, after reboot, they did exactly the same thing and stalled the receiving machine after about 4GB transfered... it mostly seems to be around that point it goes.
Last edited: