$1300 to go 4x 40GbE w/ adapters and 48 1GbE Ports

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
Snagged 2 of those to flash... hoping the 2 ConnectX3 I have work, so I can have 4x40Gbe ready for a switch for now direct connect playing around I hope :)
 

i386

Well-Known Member
Mar 18, 2016
4,221
1,540
113
34
Germany
@T_Minus All mellanox connectx 3 cards use the same silicone and support the same features. Only differences are the speeds (fdr, fdr10 or qdr) and port counts.
I would say the most interesting feature is Rdma Over Converged Ethernet. With roce the nics can directly write to the ram without involving the cpu or os, freeing those ressources.
 
  • Like
Reactions: T_Minus

_alex

Active Member
Jan 28, 2016
866
97
28
Bavaria / Germany
@T_Minus All mellanox connectx 3 cards use the same silicone and support the same features. Only differences are the speeds (fdr, fdr10 or qdr) and port counts.
I would say the most interesting feature is Rdma Over Converged Ethernet. With roce the nics can directly write to the ram without involving the cpu or os, freeing those ressources.
think they should all do RoCE, as long as they are capable of Ethernet at all and not IB - only.
Personally, for storage i wouldn't put them in EN - mode and then go RoCE for i.e. srp until there is a good reason. that could be that parts of the bandwidth should be passed into VM's in a mixed-use system that does storage and compute/hypervisor, as it's close to impossible to put a network-bridge on ipoib. also, ib-switches are much cheaper and often use less power than 40GBe switches. also, staying on IB doesn't require ridiculous high priced licences on them.

i really like the idea with the 4300, but it also implies a limit on the number of nodes that can access the fast (storage) network that is hard to overcome. furthermore, getting redundancy on the switches/network-layer can become tricky, i'd say you loose at least 1 40GBe Port for an ISL. if one is (and will be) fine with 3x HV + 1x Storage or maybe even 4x HC storage/hypervisor nodes this is certainly a good and reasonable way to go.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Slightly heretical question - do you guys really push that much data or is that mostly *want* (which I totally understand;)) ?
 
  • Like
Reactions: _alex

_alex

Active Member
Jan 28, 2016
866
97
28
Bavaria / Germany
i have ib/srp in production, running within my pve nodes, assembled over 2 chassis. this backs a bunch of vm's where i'm not so much concerned about throughput but latency (databases). also, with srp i see close to zero load on the CPU for the storage, what helps to keep latency low and also saves some power compared to other solutions ;)
 
  • Like
Reactions: T_Minus

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Yes IB & latency I totally get but the switch here is an Ethernet switch is it not?
 

_alex

Active Member
Jan 28, 2016
866
97
28
Bavaria / Germany
yes, the 4300 is EN, so RoCE would be the way to go on these for rdma. haven't compared RoCE to IB yet as EN switches with higher port counts are quite rare. but in theory RoCE should be offloaded to the nic and be very close to IB in terms of latency, performance and overhead on the CPU.
 
  • Like
Reactions: T_Minus

fossxplorer

Active Member
Mar 17, 2016
554
97
28
Oslo, Norway
Yes, RoCE + SR-IOV make it awesome. With SR-IOV, PF and VF, there are hardware registers for the VFs to access directly.

@T_Minus All mellanox connectx 3 cards use the same silicone and support the same features. Only differences are the speeds (fdr, fdr10 or qdr) and port counts.
I would say the most interesting feature is Rdma Over Converged Ethernet. With roce the nics can directly write to the ram without involving the cpu or os, freeing those ressources.
 

fossxplorer

Active Member
Mar 17, 2016
554
97
28
Oslo, Norway
With regard to "latency, performance and overhead on the CPU" i should be able to shed some light in about 7-8 months as i'm writing a thesis about this topic. I have access to 2 such adapters in 2 servers i'm using to practical experiments.
One important aspect with 40GbE RoCE is the IRQ affinity. This can be tuned using vendors own utility.
Other stuff like NUMA, Intel's HT, Turbo etc also affect the performance (to be confirmed).


yes, the 4300 is EN, so RoCE would be the way to go on these for rdma. haven't compared RoCE to IB yet as EN switches with higher port counts are quite rare. but in theory RoCE should be offloaded to the nic and be very close to IB in terms of latency, performance and overhead on the CPU.
 
  • Like
Reactions: _alex

_alex

Active Member
Jan 28, 2016
866
97
28
Bavaria / Germany
With regard to "latency, performance and overhead on the CPU" i should be able to shed some light in about 7-8 months as i'm writing a thesis about this topic. I have access to 2 such adapters in 2 servers i'm using to practical experiments.
One important aspect with 40GbE RoCE is the IRQ affinity. This can be tuned using vendors own utility.
Other stuff like NUMA, Intel's HT, Turbo etc also affect the performance (to be confirmed).
Cool toppic for a Thesis!
will you compare plain IB vs. RoCE ?
for what type of application?
(i would be strongly interested in SRP and iSer)

Also, it would be really cool to know how latency is affected when
a.) two nodes / hca's direct connect
b.) two nodes, connected via switch, both IB and EN / RoCE

in the end, it's a shame EoIB seems to be dead, would have eliminated the ipoib oddities regarding bridges...
 

fossxplorer

Active Member
Mar 17, 2016
554
97
28
Oslo, Norway
Sorry for the late reply!
No, i'm not gonna compare RoCE with IB. The topic is concentrated on analyzing CPU usage, interrupt distribution/affinity with 40GbE RoCE SR-IOV enabled network adapter in connection with KVM VMs (with OpenStack).
So it's generally for application requiring high througput and low latency running inside a VM.
I'm not gonna go into SRP and iSer :(
a) Yes, that's my first part of the experiment actually !
b) For my thesis, there is no switch in the picture. It's baremetal to baremetal, then baremetal to VM, and VM to VM and similar. Still all connected directly between 2 physical servers.

I'll let you know what i find out, but takes some time. Just started writing background( it's not that fun and takes too much time ).


Cool toppic for a Thesis!
will you compare plain IB vs. RoCE ?
for what type of application?
(i would be strongly interested in SRP and iSer)

Also, it would be really cool to know how latency is affected when
a.) two nodes / hca's direct connect
b.) two nodes, connected via switch, both IB and EN / RoCE

in the end, it's a shame EoIB seems to be dead, would have eliminated the ipoib oddities regarding bridges...
 

Yarik Dot

Active Member
Apr 13, 2015
220
110
43
47
With regard to "latency, performance and overhead on the CPU" i should be able to shed some light in about 7-8 months as i'm writing a thesis about this topic. I have access to 2 such adapters in 2 servers i'm using to practical experiments.
One important aspect with 40GbE RoCE is the IRQ affinity. This can be tuned using vendors own utility.
Other stuff like NUMA, Intel's HT, Turbo etc also affect the performance (to be confirmed).
Cool topis for a thesis. Don't hesitate to share it with us once it's done. We deal with high traffic video servers pushing tens of Gbps each and ability to push several Gbps more from each of the servers is valuable for us. Of cource, the biggest problem at the moment are IRQ.