Advice on vSAN networking : Infiniband

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

XeonSam

Active Member
Aug 23, 2018
159
77
28
I'm setting up a vSAN network with 3 Nodes.

I have set up a vSAN cluster with 10G Ethernet (base-T) 2 years ago and wanted to delve into infiniband for a new test environment. I've done some reading up on this but wanted to get some tips from people who have successfully implemented a similar lab.

I'm thinking about Mellanox ConnectX-3 40GbE HBAs and Mellanox SX6025 36port 56Gb/s Unmanaged switch for the vSAN network.
  1. vSAN supports the hardware but is this for IB or just for Ethernet?
  2. What is the performance like if limited to Ethernet?
  3. I read that IB switches are extremely noisy. Having seen them in IDC's can anyone recommend a quiet one?
  4. Can anyone recommend a better HBA? If so, why?
  5. Being a homelab/development environment I'm concerned about noise and also heat, would it be better to just go with the proven 10GbE route?
Any other recommendations or advice would be great.
 

XeonSam

Active Member
Aug 23, 2018
159
77
28
infiniband is ancient and dying, in fact esxi 6.7 doesn't even support it anymore, ethernet only
Right... so I would have to switch the HBA's to ethernet mode, right?

Or is it not worth it... stick to 10GbE?
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
6025 is an unmanaged switch, I dont think it will do Ethernet...
 

i386

Well-Known Member
Mar 18, 2016
4,220
1,540
113
34
Germany
I read that IB switches are extremely noisy. Having seen them in IDC's can anyone recommend a quiet one?
Recent infninband (think fdr and newer generations) are very loud. There is an old, half width ib switch that can be run fanless, but it's quite old and I think it only supported up to sdr bandwidths.
 

XeonSam

Active Member
Aug 23, 2018
159
77
28
Recent infninband (think fdr and newer generations) are very loud. There is an old, half width ib switch that can be run fanless, but it's quite old and I think it only supported up to sdr bandwidths.
I found this on ebay. It's unmanaged so I'm assuming it doesn't support ethernet.
But it doesn't look like it would make much noise at all.

Is it safe to say, with the performance I'm thinking (connectx-3 and above) all my options lead to switching fans?
 

Attachments

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Yes. The build is an all flash setup. 10Gbs would be a bottleneck.
That will actually depend on the number of nodes, disk groups and amount of users - All Flash does not necessarily make it use >10Gbs ...

The Ebay 5022 is quite old, and was never able to do Ethernet in the first place I think.

But all in all "extremely noisy" is relative - I have an 6036 and while I wouldnt want to have it next to me its fine in the basement - o/c its not under heavy load but its no screamer compared to the servers next to it.

If we are talking about an environment which actually saturates >10Gbs (by amount of servers or users) you probably have a dedicated area; if we are talking about lab - it wont be under heavy load for most of the time anyway.
 
Last edited:

XeonSam

Active Member
Aug 23, 2018
159
77
28
That will actually depend on the number of nodes, disk groups and amount of users - All Flash does not necessarily make it use >10Gbs ...

The Ebay 5022 is quite old, and was never able to do Ethernet in the first place I think.

But all in all "extremely noisy" is relative - I have an 6036 and while I wouldnt want to have it next to me its fine in the basement - o/c its not under heavy load but its no screamer compared to the servers next to it.

If we are talking about an environment which actually saturates >10Gbs (by amount of servers or users) you probably have a dedicated area; if we are talking about lab - it wont be under heavy load for most of the time anyway.
@Rand__ You seem to know your stuff! (I love this forum)
3 nodes, 8 capacity drives at 800G~1.2T striped. The cache will be nvme. You don't think 10G will be a bottleneck? I was looking into 25GbE but all the switches I see is overkill.

No isolated sound proofed room, just a noise proof rack that will probably have to be open during the summer. I'm targeting 30dB but most of the IB switches are 50dB and more.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Performance will depend on number of vms and their workload. vSan will *not* utilize full theoretical capabilities for just a few VMs.
I run an all nvme vsan (4 nodes, 1 cache (900p), 1 capacity (p3600/4510); 56G networking), about 20 vms and barely scratch 10G even when running benchmarks (high QD/T on single vm).
If I were to run multi VM benchmarks I probably could be limited by 10G, but its just not there in my daily usage.

Noise - under normal operations they are not louder than my supermicro 2u boxes - o/c boot up is different, but after that ...
So depends on what else you run, if you have fairly silent servers then it would be audible, if they are regular ones then its just one more or slightly louder
 

markpower28

Active Member
Apr 9, 2013
413
104
43
IB is not supported for vSphere 6.7. mellanox stop making IB drivers for vSphere. The Arista 7050QX switch a very popular switch at STH for 40Gbe.
 
  • Like
Reactions: fohdeesha

XeonSam

Active Member
Aug 23, 2018
159
77
28
Performance will depend on number of vms and their workload. vSan will *not* utilize full theoretical capabilities for just a few VMs.
I run an all nvme vsan (4 nodes, 1 cache (900p), 1 capacity (p3600/4510); 56G networking), about 20 vms and barely scratch 10G even when running benchmarks (high QD/T on single vm).
If I were to run multi VM benchmarks I probably could be limited by 10G, but its just not there in my daily usage.

Noise - under normal operations they are not louder than my supermicro 2u boxes - o/c boot up is different, but after that ...
So depends on what else you run, if you have fairly silent servers then it would be audible, if they are regular ones then its just one more or slightly louder
It's a test environment, atleast 20-40 VM's, a kubernetes cluster with containers etc. Is nvme better? I scratched using HGST 12G SAS SSD's for NVMe for cache and reduced the disk groups from 3 to 2 but the capacity I was going to stick to either SAS3 or SATA SSD's. Is the performance noticeable? Is optane really as amazing as people say? What about a 900P paired with SATA SSD's for capacity? Big performance difference? As you recommended, I will stick with 10G... try to LACP it to improve bandwidth.

I will be using R730XD's with broadwell CPUs. My home lab used HP DL360 g9's, so the lab will have larger fans. My homelab was quiet enough and didn't top 30dB even on heavy loads. I assume the R730s will be more quiet. I can not afford a switch that is louder than the 3 of my servers. I have experience with cooling and have fans of all shape, sizes, watts so am able to modify the switch but am concerned about having to do so.

If it dies on me, the whole dev team will hate me.

IB is not supported for vSphere 6.7. mellanox stop making IB drivers for vSphere. The Arista 7050QX switch a very popular switch at STH for 40Gbe.
IB isn't the goal, ethernet (VPS) would be the goal. What's the noise like for the Arista? This looks very interesting. Is there any configuring for the switch or is it basically just an L2?

My thanks to everyone giving me such valuable advice!
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Well for all flash you will write with the max the speed of a single cache drive (depending on disk groups and ftt settings) [actually significantly reduced due to the way vsan keeps performance back for other potential clients).
All Reads will come from the capacity disks directly (not cache) (again depending on disk groups and ftt setting).

Long story short - vSan is optimized for larger numbers of VMs doing concurrent access (eg VDI) - it will reserve performance for non active VMS, i.e. it does not take all performance and divides it by running vms.
Faster hardware means each VMs "reserve" will be higher so better performance.

Don't expect blazingly fast even with nvme for individual VMs.

The pro for vSan is its ease of use and (if run with 4+ hosts) the resilience - it really provides HA (O/c you gotta be prudent with ESXi updates, need to rejoin dvSwitch every now and then and so on, but normally it just runs near invisible in the background)