A few bits on this:
- Welcome to STH!
- On the Mellanox adapters, get ConnectX-3, CX-3 Pro if you can as a minimum. You can have one port direct connect via 56Gb Infiniband and the other service a 40GbE network with the right VPI cards. On my phone so I do not have the part numbers handy.
- You can get adapters to step down the second port to SFP+ if you only have 10GbE.
Thank you. Yes, If possible i'd greatly appreciate it if you can quickly detail which Infiniband capable Connectx-3 dual port card you grabbed that I see wedged in between the vidya here :
Also, is there a reason you went with the pro vs the non pro?
For the non-pro, I see :
MCX313A-BCBT Single 40/56GbE QSFP
MCX314A-BCBT Dual 40/56GbE QSFP
To shorten your load, correct me if I'm wrong but it seems as though GPUDirect doesn't work on GTX cards.
Also, for the QSFP to SFP step down, can I use a break-out cable 40->10/10/10/10 DAC?
It seems I need to search for VPI mellanox cards with infiniband capability in order to use infiniband?
- You can use DACs.
- On 3, we use a bunch of different configurations, not just GTX cards. Double-check on NVIDIA forums that the cards you want to use will do this feature. I know it is an area that they are very picky on. Apparently, some of the big guys have made their own feature that functions similarly but I am sparse on details there since it is their IP
- If you need CX-3 adapters inexpensively, ask for help in the networking forum here after checking great deals. They can be had well under $200 and $500 for CX-4 100Gb cards. Sometimes you need to hunt for inventory but people here use them often.
Excellent. I spent a good amount of time learning about DAC vs FIBER cables.
As for the different cards supporting GPUDirect, it seems as though Nvidia gimped the line.
Only supported on professional line cards not the GTX line :
>
NVIDIA GPUDirect™ for Video
>
NVIDIA GPUDirect
So, it doesn't seem infiniband will provide any benefit for a GTX line powered stack.
Indeed, it appears to be a software/driver disable to drive sales for more expensive cards :
GPUdirect is only enabled on fermi/kepler
GeForce GPUs do not support GPU-Direct RDMA. Although the MPI calls will still return successfully, the transfers will be performed through the standard memory-copy paths. The only form of GPU-Direct which is supported on the GeForce cards is GPU Direct Peer-to-Peer (P2P). This allows for fast transfers within a single computer, but does nothing for applications which run across multiple servers/compute nodes.
Tesla GPUs have full support for GPU Direct RDMA and the various other GPU Direct capabilities. They are the primary target for these capabilities and thus have the most testing and use in the field.
So, as it doesn't seem as though GTX cards can take advantage of Connectx-3 feature-sets it seems I will work with cheaper non-featured Mellanox cards for now.
Apparently, some of the big guys have made their own feature that functions similarly but I am sparse on details there since it is their IP
As it seems like a software/driver disable and likely not a hardware feature, it seems I am going to have to get creative and do some re-plumbing. I looked at the competition which seems to market a similar tech :
GitHub - RadeonOpenCompute/ROCnRDMA: ROCm Driver RDMA Peer to Peer Support
But then find :
AMD DirectGMA | BitFlow
for which they only enable it on their professional line of cards as well.
So, it seems like Radeon and Nvidia have decided to make this a little fun for me.
Off hand, what do you think the savings of piping across Infiniband vs ethernet will be?
And who might some of these big guys be who wrote custom work arounds?
I'm assuming you mean folks like this :
News ?
And lastly, am I shut out of infinband if I went with (MNPA19-XTR)? :
Cheap 10Gb SFP+ $19 Mellanox ConnectX-2 EN Cards
Thank you so very much in advance !