Movin' on up...to 40GbE

talis

New Member
Sep 30, 2017
10
3
3
56
. I'm using 2 brocade "fiber channel" cables with qsfp connectors between two connect-x3 and it works flawless. Running ib-diag (a tool from mellanox) and it shows support for up to qdr infiniband support: fantastic .. i bought 10 ..
Vendor: BROCADE
Length: 1 m
Type: Copper cable- unequalized
SupportedSpeed: SDR/DDR/QDR

I think the connectx-3 (and newer) don't work very well with 3rd party active cables & transceivers for 40gbe/infiniband. I have tested so far fdr finisar (all the same model) and various cisco 40gbe transceivers + fibers.

No links between the hosts (host<->host). ??? ??? ibDiag works with OpenSM ?

With the qsfp+ to sfp+ adapter (Using a 40GbE (QSFP+) NIC with a 10GbE Switch (SFP+)) I could get all transceivers for 10gbe to work, no matter if they were mellanox, finisar or cisco branded.
this is my 1st small cluster project (without a great Unix background) hoping to break records with IB OFED support.. is it OK to ask from a range of accumulating issues, now that purchased Mellanox switches / ConnectX NICs are sitting in the US ..waiting for a bright Closure / Go / node.js Admin..

I expect IB switching to be (internal) dominant host to host .. ie: dual Server http// socket management to IPoIB DB access (small 6 node.js cluster w/global RAM db) ( aiming for 100% diskless requests) sure, RoCE provides a state-wide distributed datacenter file access , but what i want is fast datagram UDP-RTP access within an internal IB SDN .. that appears to be restricted to X3-pro cards in RoCE (2) mode .. Can anyone point to UDP - IB RDMA functionality using standard X3 NICs ? dont need the Eth encapsulation, as its internal (private network) or even via ConnectX2 ?

With mention of FDR and QSFP+ i read here (some time back) was a way to reduce switches by splitting a 40Gb into 4 x 10Gb lanes ..the cables are cheap enuf.. even tho a 5030 is overkill for managing a 2 + 6 node (2 back-end session servers + 6 IB isolated DB servers ) it seems a 40 / 4 x 10Gb split from each 5030 active port may work as a crossover into 4 10Gb X2 / X3 NICs (with 8 nodes, 8 dual port NICs is 16 10Gb/S IB ports .. IS there something to gain by using a 40Gb / split 4 x 10Gb cable ? a link is appreciated on how 4 lane 10Gb split links from (or into) a 40GB switcher port does or does not reduce switches.. the right and wrong way : ) also the excellent 5030 setup shown was just the trick to get me started ( i bought 8 !! ) Thanx , that was a booming post here on STH .. as are many rigs .. how are we going to make this happen ? http// request loader (500,000 sessions with 4 packet inspection and return / client per Sec .. lets see where the mem leaks are..

Mellanox excellent ASIC breakthru's enhance design of SDN access / load balancing options.. moving so fast in a blur of updates, making it difficult for less seasoned home-brewers to set a well defined course, now that OFED opens the doors. For example, sometimes the more basic drivers now indicate dependency where prior managed hardware worked without such , as OpenSM ..

"opensm is an InfiniBand compliant Subnet Manager and Administration, and runs on top of OpenIB.
.. what about IbDiag .. how does it fit into OFED ..

"opensm provides an implementation of an InfiniBand Subnet Manager and Administration. Such a software entity is required to run for in order to initialize the InfiniBand hardware (at least one per each InfiniBand subnet). " .. is that as separate daemon for each subnet / or as a single process ?? and what about for managed switches (they are redundant for RoCE ) but what about IPoIB / IB RDMA..

I read that OpenSM must run alongside OFED .. has anyone found a comprehensive dependency list along with expected overhead of binary library support ? Does OFED Centos7.4 have a more relaxed requirement than FreeBSD 11 .. and what of the Cache binaries and their dependency ? in short , Ldconfig builds a library cache at boot-time and many older Aout binaries are part of FBSD. Netscape / SOCKS modules have Aout binaries.. Where is a comprehensive list ? what overhead does XFree86 compatibility bestow ? Why do i need XFree86 to run Aout binaries with ELF and what crazy load does the linker Rtld impose on the 229 binaries ? Are these added to by OFED ? or bypassed completely ?
The whole reason to go OFED is kernel bypass .. (at least reduced kernel microservices) ..

That's a bit far of field, yet how does XFree86 interoperate with OFED drivers / libraries, and do i want XFree86 and Aopen binaries at all ? ( NO !! ) How much Aopen / ELF baggage rides along with OFED and is there a way to replace the old binaries ?? ( like use Firedragon , or Centos 7+ distro ? )

opensm also now contains an experimental version of a performance manager as well. OK sounds good but how much extra load is this in the Network Stack ? the whole idea of ASICs is to reduce stack chatter (i believe the Counters in the X4 NICs go a long way .. pity they are $1000 .. any advice on the above quandry (s) is keenly appreciated .. great stuff brewing..
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,774
869
113
39
I am soo lost...not to offend but I do believe you win for most scattered and disconnected thread response in quite some time round these parts. Sorry I can only piece together minute bits and pieces of your comments and certainly not into any useful response. Try again maybe with a more direct and cohesive reply thats all I can suggest.
 
  • Like
Reactions: Blinky 42 and i386
Reads like IB confusion, which is understandable since it is confusing.

OpenSM = you don't need to run that if you have a IB switch, since a subnet manager will run on the switch.

IPoIB = that's all you need for your cluster. XFree86, socks, node, web, UDP don't use IB at all, they use TCP/IP, this the reason you need IPoIB. IPoIB makes your Infiniband network look and act like an IP network.

10/40 splitters = I don't know. If you only want a 10gb network, just get 10gb IB gear, it's REALLY cheap (but really old), and (I think) it still works.
 

talis

New Member
Sep 30, 2017
10
3
3
56
Reads like IB confusion, which is understandable since it is confusing. ..and a brief patchwork of issues , no time to expand into noodle soup.. for those insisting on continuum.. Chinatown do long noodles

Thank you for insightful response ... may i in-line issues ? Pity about lazy readers, go award chuckles elsewhere


OpenSM = you don't need to run that if you have a IB switch, since a subnet manager will run on the switch. .. yes but Mellanox insist OpenSM is loaded with LinuxOFED (maybe all platforms)
here is the quote : OpenSM is an InfiniBand-compliant Subnet Manager, and it is installed as part of Mellanox OFED1 . and repeated as mandatory to set up IB paths in RDMA .. am aware managed IB sw dont NEED to be SW managed .. what gives ? OpenSM may act as a framework ?


IPoIB = that's all you need for your cluster. XFree86, socks, node, web, UDP don't use IB at all, they use TCP/IP, this the reason you need IPoIB. IPoIB makes your Infiniband network look and act like an IP network. yes, thats why im excited .. a SDN gives the ultimate flexibility .. provided the OS kernel and system calls do not mess with the flow .. What about node.js ? How am i going to run AeroSpike without node.js ? u are saying i can dump all ELF binaries? what about dependency for stdio.h and hundred other linked binaries ? UDP is not TCP it is also part of RoCE (v2)

U are also saying i can dump Aout binaries and XFree86 ? FreeBSD cannot function without em
My concern here is a collection of old libraries i may not need .. in OFED , maybe i need to install OFED on DragonFly to avoid Aout / XFree86 legacy ??


10/40 splitters = I don't know. If you only want a 10gb network, just get 10gb IB gear, it's REALLY cheap (but really old), and (I think) it still works.
i bought and shall use 5030 sw maybe IBM 1631 sw , if i can find someone using these waywards ..Both are 56Gb, but most NICS are 10Gb

will try to find the ref to the 40Gb > 4 X 10Gb splitters said to reduce rack switch count by 1 .. writer implied the splitters worked as a fixed subnet from a 4 lane 40Gb port .. when crossed over 4 x 10Gb ports ( no elaboration given ) As dual port X3 cards have only 20 + 20Gb theory (typ 10Gb / port perf) seems logical to aim at <10Gb per port thus the question of aggregating 4 x 10Gb lanes into a 40Gb port of a managed sw. .. of course better to ask that post . tho it looked like common knowhow .. as i said , was months ago .. writer may spot this rather vague question, or an experienced network eng.. i cannot clarify this further .. id rather use X2 cards if IPoIB SDN can be built successfully from em.. RAM based IPoIB header addressable storage network .. that can return no more than 8 (64byte) packets per client request .. not mindless SSL crap, i take care of authentication ninja style .

What may the best client session management ( http// is able to keep 600,000 open sessions) does anyone recomm nginx for low packet rate inquiry .. Every IB transfer example seems to be giant packet .. have yet to see a minimal buffer of 4 - 8 packets /client dont need SSL (someone will complain again) Anyone recomm a custom RTP server that keeps a barebones packet profile ?
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
5,653
1,248
113
Just make sure you get a managed switch - unmanaged might not contain Subnet manager. IIRC the 5030 should be managed (and the -5 series not).

40-10 splitter is a QSFP to SFP+ cable which allows you to connect 4 sfp+ devices on a (compatible) QSFP port. On 36 port switches not all are compatible. Not compatible with QSFP cards.
There is also a QSFP to SFP+ adapter available to use existing SFP+ cables to connect a QSFP switch or card. Depends on use case.
 

talis

New Member
Sep 30, 2017
10
3
3
56
Just make sure you get a managed switch - unmanaged might not contain Subnet manager. IIRC the 5030 should be managed (and the -5 series not).

40-10 splitter is a QSFP to SFP+ cable which allows you to connect 4 sfp+ devices on a (compatible) QSFP port. On 36 port switches not all are compatible. Not compatible with QSFP cards.
There is also a QSFP to SFP+ adapter available to use existing SFP+ cables to connect a QSFP switch or card. Depends on use case.
Thanx Rand .. very annoying.. spent an hour only to lose text .. is there no context where browsers screw users? we are in 2017 and still work in little text boxes that have no persistence.. pls convert to in-browser autosave, not this primitive sloth.. forget the quotes here's the raw gut feeling..

the S5030 is a managed sw.. haven't we jumped over this yet?? got subnet firmware on board. Hello..?
the whole issue is WhyTF does Mellanox Linux OFED state OpenSM baggage is loaded as default, with no option to offload to hardware.. noone has seen this yet?

On the QSFP issue.. 5030 has QSFP ports (now can someone shine a light on how many pins, is it 10 or 20 , or does FDR have 20 and work as a 4 lane splitter ? not one mentions the most obvious pin count..
Now X2 has SFP+ that allows me to plug 16 x 10Gb splitters from 4 QSFP ports on the 5030?

i know i dont need X3 cards to work IPoIB in a cluster, but until i get confirmation, i keep buying X3 when X2 s are laying around for $17 with matching 3m cable.. not a splitter 4 x 10Gb , that so far evades resolution.. OF course Ofir will say X2 is no longer supported (and rightfully so ..) thus no point in asking the expert.. how else is progress made ? Supposedly from a good gene pool ..

Depends on use case ? what about the one that has been specifically asked ? dodgem cars..
what else.. oh here's a quote from the forum : where are the right peop when u need em ?

to be clear, the X2 is 10Gb not FDR , in a dual port X2 does a 4 x 10Gb split from a 5030 (one to each of 4 nodes of 8 nodes total , each with a dual port X2 , that's 16 x 10Gb ports going into 4 FDR ports on the 5030 .. may as well use a 1631 and not waste ports ..(apart from the unmanaged 1631) IS there any advantage in the 4x split?

" Asked friend who has couple Connectx-2 cards in the prod and dev networks about SFP+ gbic support.
Answer simply is: He has not found a SFP+ gbic that would NOT work on mellanox cards.
Afaik he got quite wide variety of gbics, Finisar, Avago, Force10, HP, Cisco. Everything works. "
 
Last edited:

i386

Well-Known Member
Mar 18, 2016
2,681
794
113
32
Germany
I'm still ocnfused :D

I think you are confusing some things.
The is5030 is a managed infiniband switch (it runs the subnetmanager for infiniband fabrics) and supports up to qdr infiniband (40gbit/s).
It doesn't support the newer fdr infiniband (56 gbit/s) and it doesn't support ethernet either.

Qsfp, QSFP+, QSFP14 and QSFP28 use all 38pins.

X-2 with sfp+ are ethernet only cards (I don't know if there any with vpi and sfp+ ports), they won't work with the is5030 switch.
 
  • Like
Reactions: _alex

talis

New Member
Sep 30, 2017
10
3
3
56
OK the 5030 has QDR (40Gb ports) that is identical to QSFP .. agree not 56Gb (i dont care for 56)
As i want to have an ALL IB ( with packet filter steering) internal cluster only the web connects via http// (hoping to keep 100,000 client sessions open, and serviced at a rate of 4 - 8 packets (64byte) per second ( many each 5 sec ) Thus as far as ethernet, its a cable going into 2 x 1gb NICs that pass packets into a pair of session manager Servers, that can keep 100,000 sessions serviced (128GB dual node ( 4 X 2650 v1 ) From that 128GB , packet filters on either GPUs or X3- pro adapters / or both shall address translate IB access to an 8 node database held in RAM, also runs kNN decision filters for packet ordering / client updates / response back via IB to the front session manager Servers.
i may starve less, if the X2 cards work as IB handlers to and from the session server X-3 or X3pro all via the 5030 switch .. ?

Maybe noone has heard of such an arrangement, but everything i do is totally unique (including eat) most opinions i read are shallow depth , pointless and reminds me of a Whacky Plak i used to collect as a 7 yo .. it said : IF u got nothing to do .. DONT do it here.. as a highest form of American humor, along with National Lampoon. Not much changes.

Now there are a lot of IBM / HP / Dell / EMC and china X2 cards that are useless for VPI ( IB ) they mostly have SFP+ ethernet function.. but there are QDR X2 cards ( rev A2 ) with QSFP ports , that really ought to function well with IS5030 IB fabric switch. I know the X3 VPI cards do.. only i cant afford 16 X3-pro cards, at this point in time (when the bank has foreclosed on $6mil prop)
Thus maybe someone can see the sense of economy where i spend my entire monthly check on Servers, and Mellanox cards , trying to restart what the GFC ( bankers) took away ...

Whats not VPI IB about this X2 ? despite its June 2011 i realize it wont do OFED but i keep reading about great 40G IB fabrics using X2 cards ..ok the 38pin connectors are the double row shown at QSFP cable pic , are these called 4 x QSFP quad data rate QDR= 40Gb
thats another way of saying 4 x 10Gb SFP+ would be less confusing as QSFP or 4xSFP
making things worse there is QSFP // 4 x QSFP // QDR all 40Gb , then DDR = 20Gb and the below card is usually marked as DDR or 20Gb // it looks like QSFP may be a fake offering ..
http://www.mellanox.com/related-docs/products/oem/RG_Dell_SKU.pdf

Here is a mellanox cable MCC4Q28C-003 dont know if it works as a 1 : 1 4 x QSFP ( QDR )
i think the 4 x QSFP is a QDR so will connect the X2 QDR adaptor to 5030 QDR IB port
HP MELLANOX 4X QSFP INFINIBAND COPPER CABLE, in 1:1 40Gb IB cluster?

Now why wont QDR X2 cards work with QDR 5030 switches ? forgetting the 4 x 10Gb splitter on a port for port basis ? that is another thing ( how do u think this line up wont work in IB ?

the cable MCC4Q 28C / as 1 : 1 between this X2 QDR adaptor and ports of a QDR 5030 ?
only reservation is buggy *dated) drivers (2011 ) Vs the VPI X3 adaptors (that work in OFED
just dont kno how buggy this combo ( X2 / 5030 ) may be compared to X3 / 5030

Mellanox ConnectX-2 QDR 40G IB Dual Port Network Adapter
Part Number: MHRH2A-XSR
Mellanox ConnectX-2 QDR 40G IB Dual Port

ConnecX-2 VPI/InfiniBand Firmware Download Center

Version
(Current)
OPN
PSID
Download/
Documentation

2.9.1000
MHZH29B-XSR/XTR
MHZH29-XTR
MHZH29-XSR
MHRH2A-XTR
MHRH29C-XSR/XTR
MHRH29B-XTR
MHRH29B-XSR/XTR
MHRH29B-XSR
MHRH19C-XSR/XTR
MHRH19B-XTR
MHRA19-XTR
MHQH29C-XSR/XTR
MHQH29B-XTR
MHQH29B-XSR/XTR
MHQH29B-XSR
MHQH19C-XTR
MHQH19B-XTR
MHQH19B-XSR
MHQH19B-XNR
MHQA19-XTR
MHGH29B-XTR
MHGH29B-XSR
MT_0F90120008

Looking at the current driver selection across different OS/cards/medium (IB vs. ETH) it looks like the only space consistently supported by Mellanox is Linux. Indeed, for Linux you have everything:

  • every card is supported (all the way from Connect-X2 to Connect-X5)
  • IB and ETH and the possibility to switch from one to another for the cards that support it (VPI), and you can even use IB on one port and ETH on another
  • iSER initiator is supported across the board and iSER target is supported with both LIO and SCST
  • SRP initiator is supported across the board and SRP target is supported with SCST
IM NOT USING HYPERVISOR ,, but for completion

So, if you use KVM as your hypervisor , there is no problem.

However, if you want to use Mellanox IB technology in conjunction with currently the most popular hyperwisor (VMware ESXi), you're in trouble:

  • there is no official support for ESXi 5.5 and up for any card older than Connect-X3
  • the only VPI cards supported in IB mode are Connect-X3/Pro
  • Connect-IB cards are not supported at all
  • Connect-X4 cards are supported only in ETH mode
  • dual-port VPI cards support only the same protocol (IB or ETH) on both ports, not a mix
  • SRP initiator is no longer available
  • iSER initiator is available only with 1.9.x.x drivers only over ETH and only for Connect-X3/Pro cards

ConnectX-2_VPI_Firmware: fw-ConnectX2-rel-2_9_1000-MHRH2A_A2

Release Date: 09-Jun-11
 
Last edited:

_alex

Active Member
Jan 28, 2016
874
95
28
Bavaria / Germany
sorry, but you really mix-up everything possible, what makes it extremely hard to follow.

ofed is the software-stack, QSFP is just the form factor of the Plug, this has nothing to do with actual speed or protocol.
you dont need ofed at all, nowadays in-tree usually works.

DDR, QDR, FDR, FDR10 are the speeds, and they depend on what the switch, hca and cable is capable. usually the lowest common works.

i still haven't figured out if you want to run IB or Ethernet, and at what speed.
this would maybe be the first thing to clarify.

for the browser issue, why not use notepad or something else to write a longer post? doing so can also help with structure ;)

edit: also the speeds are not necessary tied to a protocol, you could do Ethernet or infiniband with QDR or FDR. forgot SDR, what equals 10Gbps.
 
Last edited:

0xbit

New Member
Aug 7, 2017
23
4
3
I have a simpler packaged inquiry and it relates to moving to 40GbE (connectx-3) cards while also having two 10GbE Connectx-2 Nics in other computers. I will have zero router/switch hardware involved.

The plan is to put one Connectx-3 40GbE (FDR) QSFP+ card (dual or single port) (VPI or standard eth) in one computer and use a breakout cable (40GbE QSFP+ to 4x10GbE SFP+) to connect it to the other Connectx-2 MNPA19-XTR cards.

My inquiry after reading about this ad naseum and all the little quirks/gimps across the cards is :
  • Will I have zero problems doing so?
  • If I hope to experiment with RocE, will i have issues? It is stated that RoceV2 is only supported on Connectx-3 PRO cards, is this only when routing/switching hardware is in the loop or can I run hardware RoceV2 on connectx-3 with direct attachment to Connectx-2 cards running software Roce?
  • If there really is a restriction that Connectx-3 cards can't do RoceV2 even with direct connections, am I stuck using softroce on all end points unless I buy expensive Connectx-3 PRO hardware or connectx-4?
  • Is there any benefit to getting a VPI connectx-3 card over a standard ethernet connectx-3 card?
 

0xbit

New Member
Aug 7, 2017
23
4
3
OK the 5030 has QDR (40Gb ports) that is identical to QSFP .. agree not 56Gb (i dont care for 56)
As i want to have an ALL IB ( with packet filter steering) internal cluster only the web connects via http// (hoping to keep 100,000 client sessions open, and serviced at a rate of 4 - 8 packets (64byte) per second ( many each 5 sec ) Thus as far as ethernet, its a cable going into 2 x 1gb NICs that pass packets into a pair of session manager Servers, that can keep 100,000 sessions serviced (128GB dual node ( 4 X 2650 v1 ) From that 128GB , packet filters on either GPUs or X3- pro adapters / or both shall address translate IB access to an 8 node database held in RAM, also runs kNN decision filters for packet ordering / client updates / response back via IB to the front session manager Servers.
i may starve less, if the X2 cards work as IB handlers to and from the session server X-3 or X3pro all via the 5030 switch .. ?

Maybe noone has heard of such an arrangement, but everything i do is totally unique (including eat) most opinions i read are shallow depth , pointless and reminds me of a Whacky Plak i used to collect as a 7 yo .. it said : IF u got nothing to do .. DONT do it here.. as a highest form of American humor, along with National Lampoon. Not much changes.

Now there are a lot of IBM / HP / Dell / EMC and china X2 cards that are useless for VPI ( IB ) they mostly have SFP+ ethernet function.. but there are QDR X2 cards ( rev A2 ) with QSFP ports , that really ought to function well with IS5030 IB fabric switch. I know the X3 VPI cards do.. only i cant afford 16 X3-pro cards, at this point in time (when the bank has foreclosed on $6mil prop)
Thus maybe someone can see the sense of economy where i spend my entire monthly check on Servers, and Mellanox cards , trying to restart what the GFC ( bankers) took away ...

Whats not VPI IB about this X2 ? despite its June 2011 i realize it wont do OFED but i keep reading about great 40G IB fabrics using X2 cards ..ok the 38pin connectors are the double row shown at QSFP cable pic , are these called 4 x QSFP quad data rate QDR= 40Gb
thats another way of saying 4 x 10Gb SFP+ would be less confusing as QSFP or 4xSFP
making things worse there is QSFP // 4 x QSFP // QDR all 40Gb , then DDR = 20Gb and the below card is usually marked as DDR or 20Gb // it looks like QSFP may be a fake offering ..
http://www.mellanox.com/related-docs/products/oem/RG_Dell_SKU.pdf

Here is a mellanox cable MCC4Q28C-003 dont know if it works as a 1 : 1 4 x QSFP ( QDR )
i think the 4 x QSFP is a QDR so will connect the X2 QDR adaptor to 5030 QDR IB port
HP MELLANOX 4X QSFP INFINIBAND COPPER CABLE, in 1:1 40Gb IB cluster?

Now why wont QDR X2 cards work with QDR 5030 switches ? forgetting the 4 x 10Gb splitter on a port for port basis ? that is another thing ( how do u think this line up wont work in IB ?

the cable MCC4Q 28C / as 1 : 1 between this X2 QDR adaptor and ports of a QDR 5030 ?
only reservation is buggy *dated) drivers (2011 ) Vs the VPI X3 adaptors (that work in OFED
just dont kno how buggy this combo ( X2 / 5030 ) may be compared to X3 / 5030

Mellanox ConnectX-2 QDR 40G IB Dual Port Network Adapter
Part Number: MHRH2A-XSR
Mellanox ConnectX-2 QDR 40G IB Dual Port

ConnecX-2 VPI/InfiniBand Firmware Download Center

Version
(Current)
OPN
PSID
Download/
Documentation

2.9.1000
MHZH29B-XSR/XTR
MHZH29-XTR
MHZH29-XSR
MHRH2A-XTR
MHRH29C-XSR/XTR
MHRH29B-XTR
MHRH29B-XSR/XTR
MHRH29B-XSR
MHRH19C-XSR/XTR
MHRH19B-XTR
MHRA19-XTR
MHQH29C-XSR/XTR
MHQH29B-XTR
MHQH29B-XSR/XTR
MHQH29B-XSR
MHQH19C-XTR
MHQH19B-XTR
MHQH19B-XSR
MHQH19B-XNR
MHQA19-XTR
MHGH29B-XTR
MHGH29B-XSR
MT_0F90120008

Looking at the current driver selection across different OS/cards/medium (IB vs. ETH) it looks like the only space consistently supported by Mellanox is Linux. Indeed, for Linux you have everything:

  • every card is supported (all the way from Connect-X2 to Connect-X5)
  • IB and ETH and the possibility to switch from one to another for the cards that support it (VPI), and you can even use IB on one port and ETH on another
  • iSER initiator is supported across the board and iSER target is supported with both LIO and SCST
  • SRP initiator is supported across the board and SRP target is supported with SCST
IM NOT USING HYPERVISOR ,, but for completion

So, if you use KVM as your hypervisor , there is no problem.

However, if you want to use Mellanox IB technology in conjunction with currently the most popular hyperwisor (VMware ESXi), you're in trouble:

  • there is no official support for ESXi 5.5 and up for any card older than Connect-X3
  • the only VPI cards supported in IB mode are Connect-X3/Pro
  • Connect-IB cards are not supported at all
  • Connect-X4 cards are supported only in ETH mode
  • dual-port VPI cards support only the same protocol (IB or ETH) on both ports, not a mix
  • SRP initiator is no longer available
  • iSER initiator is available only with 1.9.x.x drivers only over ETH and only for Connect-X3/Pro cards

ConnectX-2_VPI_Firmware: fw-ConnectX2-rel-2_9_1000-MHRH2A_A2

Release Date: 09-Jun-11
I like your posts.. Spicy and scattered ;)
It takes after my style but is indeed far more scattered. That being said, I follow you and I like your highlights.
They are quite informative.

Things that are informing my purchase :
  • dual-port VPI cards support only the same protocol (IB or ETH) on both ports, not a mix
    • Wow, I was unaware of yet another bait and switch that you don't find until you go into the details. This makes me enthused about going for IB capable VPI cards vs just regular ETH.
  • Connectx-3 cards are equivalent to Connectx-2 cards in price when it comes to speeds above 10Gbps : 40/56
  • Pro cards are far more expensive than regular cards and it appears this is where all of the nonstandard features are baked.
  • Mellanox like many other companies creates an artificial segmentation between pro/standard hardware and tries to take you to the cleaners on the yuge margin price increase between the two. This artificial segmentation is often sloppy. So, you find yourself needing a basic feature that you can only get on a Pro card that isn't worth the price.
I'm trying to decide what connectx-3 card to get. VPI or just standard Eth. Pro cards are out of the question due to the absurd price hike. I wanted to stay on Connectx-2 cards but they're playing games w/ phase out in the newer OS releases. I want to experiment with ROCE but now I am coming to find out that there are two versions ROCEv1 and ROCEv2. ROCEv1 seems like a beta level implementation that everyone is moving away from. ROCEv2 is stated as only being supported on Connectx-3 PRO cards. I have no clue what the support for ROCE is on connectx-2 because they're not clear on it. They market RDMA/RCE and yadda yadda yadda on just about every card and then they pull a bait in switch w.r.t to the details. So, it looks like I have to use SoftROCE on both Connectx-2/3 cards unless I want to pony up for connectx-3 Pro/4/5/6 Mellanox cards. I am reading that traffic on a 10Gps line will drop down to 1Gbps using SoftROCE. Now, I wonder what this will do to non-ROCE traffic that is flowing across? It will likely be subject to the same restrictions.

So, I feel I am centering on Connectx-3 VPI? cards for beefy nodes w/ some software hacks.
Connectx-2 for light weight nodes w/ software hacks.

I'm really getting tired of manufacturers doing this across the board. There isn't a piece of hardware I'm touching currently that a company hasn't played silly games in drivers/etc to produce artificial segmentation. It's sloppy and it makes the whole effort a frustrating mess.

I'm experimenting with what can be done with various combinations of hardware/software to root out some of this.
 
Last edited:
  • Like
Reactions: talis

Rand__

Well-Known Member
Mar 6, 2014
5,653
1,248
113
The plan is to put one Connectx-3 40GbE (FDR) QSFP+ card (dual or single port) (VPI or standard eth) in one computer and use a breakout cable (40GbE QSFP+ to 4x10GbE SFP+) to connect it to the other ...
Breakout cables are only supported on switch ports.
 
  • Like
Reactions: 0xbit

0xbit

New Member
Aug 7, 2017
23
4
3
Breakout cables are only supported on switch ports.
Is there a particular technical reason why that you can quickly link me to?
I'm speaking about Copper DAC Breakouts (QSFP+ 40GbE -> 4x10 SFP+ GbE).
Can another user confirm the accuracy of Rand's reply and also thank you Rand for the reply.
I was also wondering :
  • I this why the Connectx-3 QSFP (40/56 GbE) are equal in price frequently and more plentiful compared to 10GbE Connectx-3 cards?
  • Can I use non breakout QSFP 40GbE cables to connect two computers directly?
 

i386

Well-Known Member
Mar 18, 2016
2,681
794
113
32
Germany
I this why the Connectx-3 QSFP (40/56 GbE) are equal in price frequently and more plentiful compared to 10GbE Connectx-3 cards?
The Connext-X 3 are cheap and available in quantities becuase many enterprises switch to 25/50/100gbe.
 

0xbit

New Member
Aug 7, 2017
23
4
3
The Connext-X 3 are cheap and available in quantities becuase many enterprises switch to 25/50/100gbe.
I was referring to lack of price difference between used Connectx-3 (40GbE/56IB) QSFP+ and Connectx-3 (10GbE) SFP+ cards and wondering if it had to do w/ the expense/restrictions on QSFP+ namely that you can't use breakout cables for directly connecting two computers (QSFP to SFP) - still need another user to confirm this and or needing an adapter from QSFP+ to SFP+ if that's your intended endpoint.

At the same price point, I can get either a dual 10GbE SFP+ card or a dual 40GbE/56Gbps QSFP+ VPI connectx-3 card and I am wondering if its worth the hassle. So, wondering why these two cards tend to be equal in price.
 

Rand__

Well-Known Member
Mar 6, 2014
5,653
1,248
113
You can directly connect two cards with a dac cable, both sfp+ and qsfp.
Why get 10gb if you can get 40? Unless you need to connect to a switch and don't have one with qsfp.

4:1 breakout- FAQ on mellanox website iirc
 

talis

New Member
Sep 30, 2017
10
3
3
56
sorry, but you really mix-up everything possible, what makes it extremely hard to follow. Even harder if not replied IN-LINE ..

ofed is the software-stack, QSFP is just the form factor of the Plug, this has nothing to do with actual speed or protocol. ... you dont need ofed at all, nowadays in-tree usually works. Oh ?

Pls let me in on an in-tree framework allows wire-speed R/W access to packet records in IB cluster RAM while providing byte inspection/decision in stateless UDP at leasurely 2mil packets/ sec
If i use MPI (prefer) then im tempted to go with Intel Cluster (then what impact on IB does that have : http://www.hpcadvisorycouncil.com/pdf/CP2K_Best_Practices.pdf
the average will say why delve in Fortran libraries for simple file access? what if i have kNN algos to run 10% of the time , i can use those stack heavy CPUs or GPUs thus i need SMP Fortran libraries

Everything in computing is a software stack ( or hundreds of em) i like the idea of microservices and virtual kernels , going to make kernels ephemeral.. the OS is outdated.. which is why FB conferences are taken over by immutable objects, only they change , but we keep changes in a stack , and can roll them back in a single mouse click.. Ha ha, he he .. er fine for simulations in MATLAB but not very useful in facial recognition. or counting fish , i kno an illustrious Java guru counts fish using drones

I like OFED for RDMA within a cluster. in-tree is more automated routing , i need total control of packet filtering ) from the http// client input , as it formats into IB .. funny that RoCE is remote DMA IB encapsulated over Eth .. yet RoCE (2) deals with UDP IB encap over Eth.. I need the UDP IB part shipped across an internal IB fabric .. Ethernet is TCP and slows delivery by Ack or QP ,, its not realtime , it holds a buffer for delayed traffic , not efficient for 4 - 8 packet transfers (dont need buffer)
Of course, OFED may not give me byte accurate filtering .. in reality the filtering is done in the two X3 Pro cards sitting in the 2 http// session servers, each of these accept client sessions via standard 1Gb NICs nothing to do with the X3 Pro adaptors (that deal with packet filter at IB level ) i havent decided if i change the X3 Pro kernel until i see the ASIC maths capability ( subtract / add // compare ) i really need wirespeed XOR , apart from the X4 FPGA solution there are options to remove this from the CPU .. I plan to turn the switch into a 16 level XOR stack .. or build that into a Brocade.


DDR, QDR, FDR, FDR10 are the speeds, and they depend on what the switch, hca and cable is capable. usually the lowest common works.

i still haven't figured out if you want to run IB or Ethernet, and at what speed.
this would maybe be the first thing to clarify.

Thank you but i thought i clarified i want to run IB past the Websock management (havent figured what to use for incoming http// requests yet - due to Aopen Cache binaries that old i/O seems to rely on)

(very dense admittedly) tho 2 http// servers manage >100,000 open sessions with 4 -8 packets per client stored each few seconds, via IB .. via a IS5030 switch into 6 to 16 nodes, each with dual port VPI X2 (both infiniband) IPoIB basic packet (non TCP as that has standoff conjest control) I need a UDP style (unencrypted) packet handler so 2 http// servers send that via ConnectX 3 -pro ( 1 X3 pro per http// server, of 2 nodes.

There are many way to configure the fabric thru IS5030 , it uses 4 QDR ports from the X3 pro cards // to achieve addressing of correct client entries via IB fabric (packet filter) then anther 32 QDR ports of 5030 connect to 16 nodes via 16 dual QDR 40Gb X2 cards ..this is the central question : the best way to utilize the fabric to address each client database ( held in RAM on 6 - 16 nodes ) we start with 6 nodes + 2 node http// web I/F


for the browser issue, why not use notepad or something else to write a longer post? doing so can also help with structure ;) why other when no one replies in-line .. i must be the only one

Agree on structure, except i hate to work on a sheet that has no access to the forum .. switching between = errors . the structure improves as others reduce chaos , like the learned next post i looked at prior to attending what seems frustration in comprehension .. i understand speed is not protocol centric .. im a hardware eng ,, i hate software stacks / but am prepared to embrace OFED if only to modify it to suit a fast RDMA .. i love hardware stacks (great shortage) and passing tokens or events in real time, not via mutex ops , i dont have data structures that lock CPUs , i try to avoid libraries (impossible today of course) thus my earlier fears of Aopen and ELF that was said does not enter into IB fabrics (but i disagree, it must enter when http// sessions are employed , im hoping someone has worked around the old SOCKS by now .. SO in dealing IPoIB there are binary libraries ,, who has compiled a list of dependencies ?
i would imagine that needs no elaboration .. just sensible answers
A list of these would be greatly appreciated .. in order to attempt change in that dreaded kernel. The whole idea of access a DB via IB is to divorce the OS kernel .. of course that means embrace another software stack .. OFED or IB Tools .. what other abstractions exist to enable the simplest internal forced management , as in RTP datagram level ?
i have been away from this mess for 20 years, thus some clear pointers , please.


edit: also the speeds are not necessary tied to a protocol, you could do Ethernet or infiniband with QDR or FDR. forgot SDR, what equals 10Gbps.
OF all X2 cards out there most are DDR or SFP ( 20 or 10Gb) there are VPI X2 with QDR (40Gb) in my thoughts a QSFP connector will mate best with another QSFP cable wise.. 1 : 1 .. such a cable is avail.. i agree . "Why connect a 40Gb switch to a 10Gb port ? unless by logical reason, only SFP adaptors avail. If QSFP X2 IB cards squeek with IS5030 ports synchronous to client requests, IB translated from a separate server (in want of a more researched terms) thats a great start, next will be internal IB UDP packet routing from the X3 pro cards (only 2 in failsafe) one per http// client server (this is asking for a crystal ball) only half need to arrive so thats only 2 -4 packets ( 128 - 256 bytes) per Sec / per client (of >100,000 live) Only IB all0ws such tiny payloads efficiently.. Correct this pls ..

I like OFED for RDMA within a cluster. in-tree is more automated routing , i need total control of packet filtering ) from the http// client input , as it formats into IB .. funny that RoCE is remote DMA IB encapsulated over Eth .. yet RoCE (2) deals with UDP IB encap over Eth.. I need the UDP IB part shipped across an internal IB fabric .. Ethernet is TCP and slows delivery by Ack or QP ,, its not realtime , it holds a buffer for delayed traffic , not efficient for 4 - 8 packet transfers (dont need buffer) .
Fairly sure i can transfer 4 or 8 packets thru IB out of an X3 Pro (if i want UDP) into a cluster database via a single 5030 switch that does the RDMA in IB ,, for 100,000 clients per second. The the question was if an X2 QDR (QSFP port) IB card can route each packet (at least half of em ie: 2 - 4) into the correct client DB position and update an index pointer for each client mem block. No i dont want to use SQL or Postgres.

Correct me in understanding that IB can be regimented down to a packet by packet basis unlike PF_sense that juggles timeslots in a crowded client request , packets are dropped when CPU resources run out .. SMP in Windows assigns 5o% to the desktop .. Fine for amateurs watching Netflix while building websites and writing emails, and maybe background debugging of mem leaks .. like most immutable,js framework coders .. trying to impress how important FIXED DATA can be, in our day we used EPROM @##!!! then Ti brought out adaptive processors TMS7000 . today XILINX fabrics allow 8 bit programmable delays at each pin down to 3pS resolution. i digress

Oh, dont forget to look at most of this reply IN-LINE (in the quote box above :)

what can X2 QSFP connect0r cards do when plugged into an IS5030 using 1 : 1 QSFP cables .. so far ive heard that wont work .. then shown X2 IB cards with QSFP ..
Lets not worry about the QSFP > 4 x SFP+ 1 : 4 splitters im not using chinese X19 SFP+ Eth port cards (that wont work with 5030 switches) or any other IB switches .
Damn flippant text box threw another 40 min input ,, whats the use ? like mid screen the damn thing rolled over ,, refreshed like a shot duck no a WP only makes sense in private mode.. how many posts ger ruined by dumb js event programmers.. no wonder nothing is progressed ., the java clowns are in town
 
Last edited:

talis

New Member
Sep 30, 2017
10
3
3
56
I like your posts.. Spicy and scattered ;)
It takes after my style but is indeed far more scattered. That being said, I follow you and I like your highlights.
They are quite informative.

Things that are informing my purchase :
  • dual-port VPI cards support only the same protocol (IB or ETH) on both ports, not a mix
    • Wow, I was unaware of yet another bait and switch that you don't find until you go into the details. This makes me enthused about going for IB capable VPI cards vs just regular ETH.
  • Connectx-3 cards are equivalent to Connectx-2 cards in price when it comes to speeds above 10Gbps : 40/56
Best price on new genuine Mell ConnectX 3 is $80 (yet my issue is its not RoCE (2) which is UDP IB connectionless or stateless (yet there are pseudo paths poss) the main thing is its TCP buffer free meaning its unreliable real time (not as bad as it sounds) the other part of this is the Ether encapsulation ( oCE ) i like the RDMA part of it, but dont need the overhead of Ethenet encap.. Im not sending it remotely, and certainly when i am , the 4 packets are going to a mobile client that cant deal with RoCE , besides RoCE cant send 4 packets , Ether cant send 4 packets , more like 300 just to get the ball rolling,
  • Pro cards are far more expensive than regular cards and it appears this is where all of the nonstandard features are baked.
If u want to play with X3 pro mezzanine can source 10 (one needs OpenSF nodes) or rewire PCIe i keep some in NY state, shipping is way too dear from the US
  • Mellanox like many other companies creates an artificial segmentation between pro/standard hardware and tries to take you to the cleaners on the yuge margin price increase between the two. This artificial segmentation is often sloppy. So, you find yourself needing a basic feature that you can only get on a Pro card that isn't worth the price.
I'm trying to decide what connectx-3 card to get. VPI or just standard Eth. Pro cards are out of the question due to the absurd price hike. I wanted to stay on Connectx-2 cards but they're playing games w/ phase out in the newer OS releases.
X2 are not supported except by Forum and limited but useful downloads, the real trick is avoid 3rd party X2 and X3 they must be Mell genuine. Otherwise its hit or miss
Besides, the firmware bugs are obvious real issue that no one here has much experience with.. and then Dell / HP / EMC / IBM all return nightmare issues that are all one off thus useless , as binary drivers differ with OS / hardware combo. Thus advice is useless unless u clone that exact topology


I want to experiment with ROCE but now I am coming to find out that there are two versions ROCEv1 and ROCEv2. ROCEv1 seems like a beta level implementation that everyone is moving away from. ROCEv2 is stated as only being supported on Connectx-3 PRO cards. I have no clue what the support for ROCE is on connectx-2 because they're not clear on it. They market RDMA/RCE and yadda yadda yadda on just about every card and then they pull a bait in switch w.r.t to the details. So, it looks like I have to use SoftROCE on both Connectx-2/3 cards unless I want to pony up for connectx-3 Pro/4/5/6 Mellanox cards. I am reading that traffic on a 10Gps line will drop down to 1Gbps using SoftROCE. Now, I wonder what this will do to non-ROCE traffic that is flowing across? It will likely be subject to the same restrictions.

So, I feel I am centering on Connectx-3 VPI? cards for beefy nodes w/ some software hacks.
Connectx-2 for light weight nodes w/ software hacks.

If u get the right X2 (40Gb genuine QSFP) and cable the only enemy is ASIC firmware bugs i like the idea of rewiting the kernel but for that u need a full BSP ( the quad port KU115 fabric BSP is $3000) .. if u can prove some kernel proficiency i buy the BSP.. and the C tools in Vivado that u write to fabric with. Not many out there.. meet on G+
talisin
Meanwhile i play with X3 pro VPI and X3 VPI (who needs EN, not me) but to equip 256 cores with IB i think i can strip out the 0CE from the RDMA keeping it local i can stuff 16 X2 cards into the cluster for pure IB experience via a 5030 .. that is IF the X2 ASIC firmware is stable , same with X3 they are all a work in progress. id rather throw away 16 X2 cards after burning out contention states on the QSFP with my 4 packet turning points .. heh heh. ports get hot as it is they need fans .. is why they go into hot-swap fan crates .. Rus


I'm really getting tired of manufacturers doing this across the board. There isn't a piece of hardware I'm touching currently that a company hasn't played silly games in drivers/etc to produce artificial segmentation. It's sloppy and it makes the whole effort a frustrating mess. Its buyer beware and dreamer come in spinner .. lets burn some cable drivers Anyone at kernel level

I'm experimenting with what can be done with various combinations of hardware/software to root out some of this.
X2 are not supported except by Forum and limited but useful downloads, the real trick is avoid 3rd party X2 and X3 they must be Mell genuine. Otherwise its hit or miss .. which binary is overloading the timeslots ? oh its completely fair scheduler
Besides, the firmware bugs are obvious real issue that no one here has much experience with.. and then Dell / HP / EMC / IBM all return nightmare issues that are all one off thus useless , as binary drivers differ with OS / hardware combo. Thus advice is useless unless u clone that exact topology .. almost impossible


Best price on new genuine Mell ConnectX 3 is $80 (yet my issue is its not RoCE (2) which is UDP IB connectionless or stateless (yet there are pseudo paths poss) the main thing is its TCP buffer free meaning its unreliable real time (not as bad as it sounds) the other part of this is the Ether encapsulation ( oCE ) i like the RDMA part of it, but dont need the overhead of Ethenet encap.. Im not sending it remotely, and certainly when i am , the 4 packets are going to a mobile client that cant deal with RoCE , besides RoCE cant send 4 packets , Ether cant send 4 packets , more like 300 just to get the ball rolling..


RoCE is designed for distributed datacenters . databases like Hadoop & Elastic Search all the statistics are on hundreds of remote servers that need secure Ethernet access (TCP) or https// banks for example.. Why would any private researcher need this unless dealing with the genome pool or banking.. both of these planned, a gene transplant lab in Costa Rica and deposit on a First bank in Durham City

My interest is in cutting apart the CE and exposing the UDP IB part for internal DB packet storage, over IB .. I was looking at FPGA boards to so this, like the X4 060 at $2800 ea but found quad port cards with KU115 PGA fabric .. once a ser-vice takes off its great to have a forward plan to expand past one datacenter.
 
Last edited:
  • Like
Reactions: 0xbit