Infiniband setup. Advice needed.

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

grandmasterneil

New Member
Jan 21, 2014
23
12
3
Hello all,

I'm interested in setting up an Infiniband network. While I know there are significant hurdles, I'm just so enamored by the possibilities.

I'm just getting around to understanding the basics so I'm hoping STH will be kind. I know Infiniband is in decline and 10gbe is probably a better way to go for future proofing but from cursory pricing, an Infiniband setup seems to be achievable for much less, currently.

The set up will be intra-office connectivity to a main storage unit. It's for media processing and I want access to the files to be as fast as possible as many people will be streaming really large files simultaneously.

My main question is, can you roll your own cables? I've never dealt with fiber so I'm wondering if I can run channels to a patch panel and then into a switch.

I'm sure I'll have more questions as the process continues. Thanks in advance for any suggestions!
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,417
468
83
Infiniband uses 2 types of cables. CX-4 (old 10 gb IB) and QSFP. both of those setups do not lend well to DIY cabling.

with the Quantum switches so cheap and 10 GB adapters also cheap. I would not even go to Infiniband and just do ROCE

Chris
 

grandmasterneil

New Member
Jan 21, 2014
23
12
3
Chris,

Thanks for the advice! I'd be interested to see links for switches and adapters. Everything I've found is still several times as expensive as an IB setup if I can sort out the cabling.
 

frogtech

Well-Known Member
Jan 4, 2016
1,482
272
83
35
Chris,

What Quantum switches are you referring to, did you mean the brand Quanta?
 

RobertFontaine

Active Member
Dec 17, 2015
663
148
43
57
Winterpeg, Canuckistan
I'm confused... not unusual... ;)

the LB4M is a 10gbe uplink with 2 sfp+ and 48 1 gbe ports (not terribly useful for a 10gbe network except as a bridge to a 1 gbe network.

the LB6M is a 24 port 10gbe switch with gbe ports. (this makes more sense).

But why would I want to buy 10gbe cards and a switch rather than purchase qdr 4036 and infiniband cards at 40gb or the connectx 3's at 56gb for less money if I have control over my own network? Paying more for less confuses me.
 

markarr

Active Member
Oct 31, 2013
421
122
43
Yes you would get more speed out of an infiniband setup, but you would start to fall outside of the bounds of KISS for most people. Having to run 2 separate networks for the extra speed out of infiniband vs just using ethernet for all is a balance.

Yes you can IPoIB but that adds yet another layer of complexity.
 

grandmasterneil

New Member
Jan 21, 2014
23
12
3
I was curious about the Quanta setup as well as the LB4M has just 2 sfp ports. Using that config for just the server has some purpose but I'm trying to maximize speed across all nodes, if possible.

The idea is to run two separate networks anyway. I don't really want the internal IB network to do anything but handle file transport.
 

frogtech

Well-Known Member
Jan 4, 2016
1,482
272
83
35
Let's be realistic here too though. Unless the aforementioned main storage unit in the OP is either A. absolutely loaded with spinners or B. has solid state storage(either lots of low capacity ones in an array or a few high capacity drives), the speed "gain" of 40 Gbps or even 56 Gbps is pretty negligible. Infiniband is actually pretty simple to setup in a Windows environment since the drivers mostly "just work", even more so when you have a switch with a subnet manager.

Even in Windows the 40 Gbps links are negotiated at 32 Gbps after overhead. And even THEN, after latency and other overhead, you're likely only going to see 20 Gbps tops. I'm lucky to even be able to get 15-17 Gbps throughput with a simple iperf test using ConnectX-2 cards and a QDR switch. It's better than 10 GbE speeds but the tweaking I had to go through to even achieve that was less than ideal.

I'd agree, the LB4 isn't really a "10 gig switch", it's honestly just an uplink/10 gig bridge. The LB6 is probably a solid buy. Honestly, setting up an IB fabric is probably as simple as setting up a 10 GbE fabric. If I had to guess I'd say the purposes of even using a faster-than-gigabit network would be for the purposes of storage access and reading/writing to the storage server(s) which for all intents and purposes would both make either an IB or 10GbE solution a storage area network(SAN) which isn't uncommon at all. You most likely aren't using a modem or ISP that has a device with an SFP+ port(though I could be wrong)so there's also that to rule out.

If I had to make a recommendation based on anything I'd say it's easier to integrate 10 GbE devices and switches(even when they're SFP+ ports)with other established devices in a stack. And it does ethernet frames and IP natively.
 

grandmasterneil

New Member
Jan 21, 2014
23
12
3
I don't anticipate being able to make full usage of the 40Gbps but it's a nice idea. I'm building out my storage array with SSDs and I want all the nodes to be able to have access to the files on that array. Yes, it's basically a SAN.
 

Chuckleb

Moderator
Mar 5, 2013
1,017
331
83
Minnesota
It's great getting different ideas, but I'll focus on answering your original questions.

Most of the IB deployments were designed for within one rack or one datacenter, where short patch cables would work. You'd generally have a top-of-rack (TOR) switch and feed the compute nodes within the rack, connecting the TOR to core switches.

Now they do sell 40G QSFP+ transceivers and you can buy fiber patch cables to make your own length, but I wouldn't bother crimping my own cables. Going this route is expensive.

Let's map this out a bit. Let's say that your file server and gear are in a different room, you could run a QSFP+ cable from the server into a switch in the other room, and plug your computers into that switch. Mind you the IB switches are loud. That would be cheapest option as you only need one long cable and multiple short cables. If you needed to keep the gear together and noise down, you could run multiple long cables and leave the switch elsewhere, but that adds up cost.
 

TuxDude

Well-Known Member
Sep 17, 2011
616
338
63
I'll be watching this thread for any interesting information, as I'm also planning to get a 40Gbps IB fabric setup at home sometime in the not-too-distant future (right around when I find a small switch at a price I like). My compute nodes at home (4 of them) already have a QDR-IB port onboard, so getting a switch and learning a new tech seems the obvious thing to do.
 

RobertFontaine

Active Member
Dec 17, 2015
663
148
43
57
Winterpeg, Canuckistan
I'm building out a series of compute nodes... I am going start with locally duplicated data on nvme on the first single node as it is cheaper but once I get to node 2, 3, 4 and the data grows in size and it will become cheaper to aggregate ssds in a sc216 and get a 4036 ($500 Canadian) switch than making duplicates of the data locally to each compute node. I don't know what the break even point is yet... will burn that bridge when I get to it.

4036E is sexiest as it has an ethernet bridge ($1k Canadian) but I'm willing to go rdma/NFS for the file system and run redundant ethernet cables or ipoib.

Ddr switch is about $100 Canadian... Voltaire 9024d...

Fan noise will be an issue in a residential environment... Alternate cooling will be required without a sound proofed room... This has been done.

Bonding works with IB so several cables running from the fileserver to the switch can potentially improve serial performance.

Depending on data size though it may be cheaper, faster, less latency etc.. to raid 0 some nvme on the compute node, use redundant data, and point a firehouse of data at your compute card.
Then just build a big spinny disk zfs array for the master database and don't worry about bandwidth / latency over the wire. (less cool but potentially a better solution depending on use case).
 

Chuckleb

Moderator
Mar 5, 2013
1,017
331
83
Minnesota
Yes, IB is infinitely easier to scale. You just add additional links and it will use it without worrying about LACP and other things. You also get to use larger MTUs... Think 9k frames are large? 64k frames in IB. Etc.
 
  • Like
Reactions: Jon Massey

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
My thought is for the $$ (~$1400 USD total) you can't really beat that setup (4036E + LB6M + LB4M).

ConnectX2 & ConnectX3 deals ($20-100/Each)
Cables: $10-30 (Fiber/DAC)

Obviously not everyone would want all 3 switches, so even cheaper, but really nice setup for high speed SAN.
 

RobertFontaine

Active Member
Dec 17, 2015
663
148
43
57
Winterpeg, Canuckistan
Bandwidth vs Latency... pushing data through even 2 layered switches likely triples latency (physics plus overhead). For a networking lab it makes sense to throw all sorts of things into the mix. For a mixed environment I would guess you would run your compute/rendering traffic on one pipe and your desktop clients/business services on another. I haven't been in the world of serving up vm's or tranferring them to other machines fast so I don't know where that might fit in, in the grand scheme of things latency wise I would suspect jitter is bad when vms are quasi real time or interactive and less important if they are just running smtp sort of thing.

I would think again that once out of a lab environment that a network with this level of complexity has sla's, service contracts and a required hardware specification. Anyone with a service provider isn't going to have this kind of freedom and anyone that has a mission critical network this complicated probably has a service provider.

Again use case is important... the OP said intra-office and rendering so fastest file copy is a huge benefit to 4k editing. Users often have to plan lunches and coffee breaks around copying files to workstations in the youtube 4k editing realm. 30 Seconds is an eternity when you are waiting to do work. Latency is likely not important to the user in the grand scheme of things (big stream file copy) vs. database lookups. I don't know how video gets chunked out to the nodes of a render farm and whether network latency is an issue after the initial edit but I suspect that if the render farm has adequate scratch disk that again a big file copy and then let it run locally makes as much sense as anything else (again bandwidth rather than iops or latency). Or I could be completely wrong I'm just guessing at inter-office/media processing. assumption... Render farm (discrete or virtual across workstations), n workstations, shared file server.