Switchless 10GbE Point-to-Point Connection between ESXi Servers [how?]

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Ok, so you couldnt get passthrough to work with
"8086 2700 3900 false"?
Same error (crash as before)? If oyu have a FN account you might want to add that as info to the bug report :)


And since you couldnt get passhthrough to roll you used RDM? In the end the result is not much different than creating a second disk on it and passing that on o/c, and after all if you only use it for slog there is little harm done if sth goes amiss.

Btw, with the newest esxi hw upgrade (v14 i think) Freenas U6 complains about the nvme controller, so don't do that. Have not looked into it tbh since it does not seem to impact anything (and going back would mean rebuild):

upload_2018-9-18_7-51-33.png
 

svtkobra7

Active Member
Jan 2, 2017
362
88
28
Ok, so you couldnt get passthrough to work with
"8086 2700 3900 false"?
  • I added "8086 2700 d3d0 false" to the passthru.map
  • You noted "8086 2700 3900 false"
  • Where the file format = vendor-id device-id resetMethod fptShareable
  • So you are using a reset method of 3900, which isn't a possible value (flr, d3d0, link, bridge, or default) per this document ... would you mind explaining, please?
Same error (crash as before)? If oyu have a FN account you might want to add that as info to the bug report :)
  • I'm not sure I follow, what do you mean by same crash as before? (no sarcasm)
  • My fiddling induces lots of crashs (sarcasm / honest however)
  • Passthru.map updated = same crash reported in the bug report (if that is what you mean).
And since you couldnt get passhthrough to roll you used RDM? In the end the result is not much different than creating a second disk on it and passing that on o/c, and after all if you only use it for slog there is little harm done if sth goes amiss.
  • Correct, RDM + NVMe Controller worked without issue. VMware purports this controller to be superior, but I've seen differing real world results and plan to compare.
Btw, with the newest esxi hw upgrade (v14 i think) Freenas U6 complains about the nvme controller, so don't do that. Have not looked into it tbh since it does not seem to impact anything (and going back would mean rebuild):
  • This was tested on 6.5.0 Update 2 (Build 9298722) + FreeNAS 11.1 U6 ...
  • That is the latest ESXi patch and I tried to figure out what you meant by "esxi hw upgrade" looking at the ESXi 6.5 Matrix, but couldn't ...
  • Can you explain further, please? (as with the latest version of ESXi (6.5 at least) and FreeNAS 11.1 U6 as referenced, I didn't encounter any issues with NVMe controller?)
[apologies for the delayed reply and as always, thank you very much for your continued download of wisdom :) ]
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
  • I added "8086 2700 d3d0 false" to the passthru.map
  • You noted "8086 2700 3900 false"
  • Where the file format = vendor-id device-id resetMethod fptShareable
  • So you are using a reset method of 3900, which isn't a possible value (flr, d3d0, link, bridge, or default) per this document ... would you mind explaining, please?
Hm, did not delve into the logic any further, wanted you to try another device ID (3900) instead but looks like I misunderstood that and that is the sub id so likely not going to work.
But thinking about it again it might indeed be related to you having a non passed through device as well.
Not sure its worth the effort to change your setup to try to find whether thats the causing factor...

  • I'm not sure I follow, what do you mean by same crash as before? (no sarcasm)
  • Passthru.map updated = same crash reported in the bug report (if that is what you mean).
Yes, just thought it might be good to get feedback on that Bug report:)

Can you explain further, please? (as with the latest version of ESXi (6.5 at least) and FreeNAS 11.1 U6 as referenced, I didn't encounter any issues with NVMe controller?)
I was referencing the virtual machine Hardware Version which should be 13 for esxi 6.5 and changed to 14 for 6.7. I don't think my above error happened with hw version 13 either, thats why i wanted to mention it.
 

svtkobra7

Active Member
Jan 2, 2017
362
88
28
Hm, did not delve into the logic any further, wanted you to try another device ID (3900) instead but looks like I misunderstood that and that is the sub id so likely not going to work.
But thinking about it again it might indeed be related to you having a non passed through device as well.
Not sure its worth the effort to change your setup to try to find whether thats the causing factor...
  • Appreciate the analysis and it was certainly worth looking at.
  • I too think it is the non-passed through device, the fact that you suggest the same gives me some hope for my IT acumen increasing a couple decades from now ...
That pass through map work around doesn't seem to work for me. As to why? Absolutely no clue, but perhaps because I have another NVMe connected and /or passed through to FreeNAS.
Yes, just thought it might be good to get feedback on that Bug report:)
  • Good call - I'm going to ensure I can replicate it on the second system, and then add the info accordingly.
  • My inference is that one of the following is the case regarding that bug ticket [pick one]: (a) a three toed sloth is looking after it, or (b) nobody at all.
I was referencing the virtual machine Hardware Version which should be 13 for esxi 6.5 and changed to 14 for 6.7. I don't think my above error happened with hw version 13 either, thats why i wanted to mention it.
  • Aha - thanks for the clarity. More importantly, thank you for the heads up, as once I figure out which of the VMUG downloads to install, and in which order, I do plan to move from 6.5 to 6.7. [kidding]
On an unrelated note, yet related note, which brings the discussion full circle - I managed to blow up my pool again when I was benching SLOGs the other day. I can't import from the gui at all thanks to an unavailable SLOG(I swear I removed it before shutdown) and have to import using zpool import -m tank, but the gui won't automount ... time to start rsycning to box #2 ... I'm going to put a lock on that darn closet, lock it, and throw the key away. :rolleyes:

Have a good one and thanks again for the replies.
 

LaMerk

Member
Jun 13, 2017
38
7
8
33
"And now, I introduce the world's worst network diagram ..."

Back to the diagram, is it possible to install at least 1Gpbs NIC in both servers? Then you should not connect ISP directly to the Server A and use a switch instead.
 
  • Like
Reactions: svtkobra7

svtkobra7

Active Member
Jan 2, 2017
362
88
28
"And now, I introduce the world's worst network diagram ..."
  • LOL (didn't feel like installing Visio for a single graph ;)
Back to the diagram, is it possible to install at least 1Gpbs NIC in both servers? Then you should not connect ISP directly to the Server A and use a switch instead.
  • Its somewhat ironic that you mention this option as I thought it may be what is needed to desire the desired end result. But ...
  • I prefer not to do so if another option (PITA to service Server A as being "racked" vertically, I need to remove server B, and I frankly, I like not having a HBA (use intregrated RAID controller flashed to IT mode), i.e. its clean and PCIe slots are only occupied by NVMe. o/c, I'll ultimately do what I need to achieve the end result.
But as as @nerdalertdk suggested, I think the only option to achieve the objective without additional hardware is as detailed below.
Prior State (really stating again for myself as I walk through this).
  • WAN vSwitch = GLAN0 NIC = Direct to ISP
  • LAN vSwitch = GLAN1 NIC => Switch
EDIT for clarity: I initially followed a guide initially to set up pfSense and those guides suggested a "WAN" NIC and "LAN" NIC (thus the naming convention), with two NICs needed; however, I believe the "router on a stick" model can be used (I think that is what it is called where 1 NIC is used) and pfSense can work with two virtual NICS (i.e. a "WAN" Portgroup and a "LAN" Portgroup, which coexist on a "WAN / LAN" vSwitch, dedicated to its own physical NIC (as I dive into below) ...
Suggested Solve
A Server
  • "WAN / LAN" vSwitch = GLAN0 NIC => Switch Port 2
  • "Point to point" vSwitch = GLAN1 NIC => B GLAN1 NIC
B Server
  1. "LAN" vSwitch = GLAN0 NIC => Switch Port 3
  2. "Point to point" vSwitch = GLAN1 NIC => A GLAN1 NIC
Switch
  • Port 1 = ISP
  • Port 2 = "WAN / LAN" vSwitch from Server A
  • Port 3 = "LAN" vSwitch from Server B
VLANs
  • 1 = Port 1 & 2
  • 2 = Port 2, 3, etc
I think this solution works conceptually; however, I've having trouble thinking this through (and I know for most it is rather simple).

Your thoughts as to viability / can you offer any guidance on how to deploy / resources for creation of VLANs etc on a L3 switch?
And I still cannot figure out how to do P2P precisely ...

And finally, a stupid question, while host names don't matter, I'm OCD and can't think of anything better than "ESXi-01, ESXi-02, FreeNAS-01, SMCI-01, etc". What naming convention do most people use at home for something like this?
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Why do you want to direct connect GB lans now? I thought you wanted to connect 10GBe only?
 

svtkobra7

Active Member
Jan 2, 2017
362
88
28
Why do you want to direct connect GB lans now? I thought you wanted to connect 10GBe only?
  • Yes, that is correct.
  • I don't want to direct connect anything 1 GbE, rather: (a) GLAN0 on both to switch (10 GbE NIC connected at 1 Gb due to switch) + (b) GLAN1 on both servers to each other (10 GbE NIC connected at 10 Gb).
Edit for clarity: (a) being requisite for Internet connectivity and (b) being a "hack" to achieve a significantly faster link.

Maybe my post was not very clear? Or maybe I misunderstood the question? Either way, I'm happy to clarify.
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Nah, the GLAN label made me think it was Gigabit Lan.

So you dont have 2 x 1 GB Lan + 2 x 10 GB Lan in each box but only 2x10 GB Lan which you call GLan... now i get it.
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Ok, then your proposed solution should work, except o/c that vlan #1 should not be VLAN1 since thats usually the all access/no VLAN vlan, so use 1 for general usage (IPMI, other boxes), 2 for ISP traffic, 3 (or none) for box2box traffic
 

svtkobra7

Active Member
Jan 2, 2017
362
88
28
Nah, the GLAN label made me think it was Gigabit Lan.
  • Sorry for that ... Just using the naming convention on the SMCI quick reference guide ...

So you dont have 2 x 1 GB Lan + 2 x 10 GB Lan in each box but only 2x10 GB Lan which you call GLan... now i get it.
  • Yes sir.
  • Reference above (if there is a better naming convention, please advise, as I'm more than happy to use it).
  • I told you the diagram was the world's worst, but tried to call out 2 x 10 GB in my diagram, i.e. "10G"
 

Myth

Member
Feb 27, 2018
148
7
18
Los Angeles
does ESX use SMB to communicate between the two servers? What protocol would it use? TCP/IP, anything under that?

If you used Windows you could use SMB directo 3.0 or higher and it would multipath automagically. so your clients could get 2x 1GigE LAN ports into the switch and it would multipath to the server to double there performance. If you installed a x4 1GE PCIe card you could get 4GigE into the switch from the server. But only if SMB RDMA 3.0 or higher is used, basically all ports need to have an ip address on the same subnet as the main port, but everything will be mapped to the main port.
 

svtkobra7

Active Member
Jan 2, 2017
362
88
28
Ok, then your proposed solution should work, except o/c that vlan #1 should not be VLAN1 since thats usually the all access/no VLAN vlan, so use 1 for general usage (IPMI, other boxes), 2 for ISP traffic, 3 (or none) for box2box traffic
  • NICE!!! At least I have an elementary understanding of networking :)
  • Just not the technical expertise to deploy :(
  • Literally, every time previously I've tried to configure VLANs on that Layer 3 Cisco switch, I've ended up locking myself out and having to reset the damn thing and I've spent hours researching / attempting to deploy. I eventually get too frustrated to continue and give up.
  • Is there any advice / guidance you can offer (or resources to get me started)? The switch is a Cisco SG300-20 20-Port Gigabit Managed Switch
And since you asked (you didn't I know), here is everything finally "racked" which I finished yesterday (save for addition of a few more Optane drives which are still boxed).
  • 2 x 4U vertical racks = emulates a 4 post rack = allows for "upside down mount" = 5.2°C 7200 RPM HDD temp favorability vs. "rightside up mount"
  • 1 x 2U vertical rack = 1U for switch + 1U for future
Would you believe that if you or I were strong enough, and all the cables were disconnected, you or I could lift that plywood panel off the wall and carry it away? I engineered it for ease of removal (for when I deconstruct my datacenter and sell) and that plywood panel is only held in place by gravity. (with the help of a french cleat ... but of course behind the drywall, I created a strong "framework" by inserting wood 2 x 4s verticals into the flimsy metal 2 x 4s and added 2 x 6 as horizontal blocking).

 
  • Like
Reactions: Myth

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Re diagram - yes, that looked like GLAN + 10G. If you had put a 1G at IPMI it might have been clearer (or its clear to everybody but me, might also be).
Re Backpanel nomenclature - thats the same for 1GB and 10GB nics;)
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Nice and tidy. And are those NVME bacplane enabled boxes? (gleaming from the other pic in your gallery) - makes me jealous:)
 

svtkobra7

Active Member
Jan 2, 2017
362
88
28
Re diagram - yes, that looked like GLAN + 10G. If you had put a 1G at IPMI it might have been clearer (or its clear to everybody but me, might also be).
  • but IPMI is 1/10 G, i.e. "Fast" Ethernet ;)
  • Point noted - I threw it together quite quickly.
Re Backpanel nomenclature - thats the same for 1GB and 10GB nics;)
  • So I had it right? That is a nice change of pace!!! :):):)
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Does this help?
Edit: <<< REMOVED useless screenshots >>>

I suppose you also need the ESX side, but I have 4 interfaces per Box with a mix of dvSwitches and regular ones.
Basically my ISP interfaces are directly attached to the switch (Vlan 2) and all else is done via Nic1 (fallback management via vmk20) or highspeed Mellanox cards (VLAN1, all above 4) mapped to a dvswitch with one vmk per vlan)
 
Last edited:

svtkobra7

Active Member
Jan 2, 2017
362
88
28
Nice and tidy. And are those NVME bacplane enabled boxes? (gleaming from the other pic in your gallery) - makes me jealous:)
  • I'm OCD what can I say?
  • I wish they had NVMe backplane ... I think you are referring to the pic with the red arrow, I was asking somebody what that dingleberry was. Nothing to make you jealous here: System: SMCI SuperStorage 6027R-E1R12T | Chassis: SuperChassis 826BE16-R920LPB | Motherboard: X9DRH-7TF | Backplane: BPN-SAS2-826EL1. Since each has 16 x 16 GB RAM, until DDR4 comes down in price (I don't want to even think what that would cost if DDR4 - but now I'm curious $2k - $2.5k???), X9 / v2 CPUs / ECC DDR3 will have to work.
  • I added 12G SAS / 6G SATA hotswap (2 x 2.5") as I was originally booting off INTL S3500s prior to moving to booting from Optane, but thought about adding the NVMe 2 x 2.5" hot swap kit. Not that I need the hot swap (or the kit at all at this point), but I really like the kit (considering 12 x HDD bays are full). It was more cost effective to go NVMe AIC as that kit has 2 x Oculink connectors and by the time you go one of several routes to connect Oculink to the mobo, you have spent ~$100/each on cables/adapters ...
 

svtkobra7

Active Member
Jan 2, 2017
362
88
28
I dont think that IPMI links at 100Mbit ?
Both of my X9 boards (current & prior have) ... maybe X10+ uses faster NICs?? What are you running?



O/C with my luck - I wouldn't be surprised if I cut some really crappy cables ...