Switchless 10GbE Point-to-Point Connection between ESXi Servers [how?]

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

svtkobra7

Active Member
Jan 2, 2017
362
88
28
I wanted to ensure I end up using your suggestions (I plan to put an end to the tinkering), so just seeking clarity in regards to the below comments (sorry to ask I read them over and over and couldn't figure out what you meant):
Well i do prefer mirrors over stripes or or RAID-Z's thats true so i'd have used those
(c) - Create two identical sized vdisks and create a mirror'ed partition in FN
So 2 partitions in FN from 2 vDisks would mean giving up the storage partition comprising the balance of the drive ... is that what you meant? Last option in image ...
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
1. No need to always follow my recommendations;) weigh the options and make your own choice. If you're wrong I can do the old 'told you so' else I learnt sth new. Win win for me;)

What I meant was to create an identically sized vdisk on each optane, pass them to FN and create a mirror on them. You loose space and writespeed but get safety and read speed.
 

svtkobra7

Active Member
Jan 2, 2017
362
88
28
1. No need to always follow my recommendations;) weigh the options and make your own choice.
  • OK fine, maybe I tried to be a little too pleasing with my comment. ;)
  • Maybe you have a point, I followed you a little too deep down the NVMe rabbit hole and now can't pay my credit card bill.
weigh the options and make your own choice.
  • 100% agree. I'm perhaps spending a little too much time looking at this now as (1) I've never had slogs that actually work before and (2) I really don't intend to do too much fiddling going forward. Point = I want to make the best decision I can with the data I have (and if I need to change it, no issue).
  • Perhaps part of the disconnect ... I wanted to understand what this new option you proposed was ... its tough to evaluate something when that something is to you undefined/unclear. Fair? [rhetorical]
If you're wrong I can do the old 'told you so' else I learnt sth new. Win win for me;)
  • If I were a betting man, I'd put my money on you and we can leave it at that.
  • And I'd never do such a thing! OK I would, since I'm jealous of how much you know.
What I meant was to create an identically sized vdisk on each optane, pass them to FN and create a mirror on them. You loose space and writespeed but get safety and read speed.
  • We are 100% in sync until "create a mirror" ...
  • Ok so create a mirrored slog. Forget creating the tank from the spare capacity. Create a zvol for blockstorage on the HDD tank as I will have no more local storage in ESXi.
  • Maybe it is the way you phrased it. You gain write speed (you are adding a slog), but agree lose it against a mirrored slog.
  • How do you get read speed from a slog? Personally, my benchmarks have always shown addition of a slog to add read speed (I've never been able to figure out why), but you are saying there is either an absolute read speed gain, or a read speed gain vs. a striped slog.
I'm sorry, I feel silly for not following you. I've been more dense than usual lately.

This is what you are suggesting:

with effectively 280GB - 16GB of over provisioning per Optane (forget about boot for a sec)
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
One step back - I am here:
(a) would you slice in ESXi, i.e.
  • nvme0 = 20GB vDisk (SLOG1 for HDD pool) + 220 GB vDisk for ISCSI (slightly smaller due to ESXi + FreeNAS boot);
  • nvme1 = 20 GB vDisk (SLOG2 for HDD pool) + 240 GB vDisk for ISCSI;
  • so I end up with two mirrored SLOGs for the HDD pool, but they are on different devices;
  • and I end up with 460GB of mirrored storage for iSCSI, again different devices; so
  • provided both devices are taking a beating and only one or the other is, I believe this is my optimal play. And if neither are getting hammered, it definitely is more performant than not striping;
(b) or would you simply attach 2 vDisks (240GB + 260GB) and slice in ZFS gpart create / add etc. to end up with the same slices?
So you what you did was to take a 20GB slice each and add that as mirror'ed slog to the HDD pool - good.
Then you took a 240GB +220 GB slice for iscsi and created a 460 GB stripe (I assume also in Freenas) that is basically a Raid0.
Here I say don't do that, create two 22GB slices. pass to FN and create a 220 GB mirror'ed iscsi disk (where you gain resilience & readspeed but loos writespeed and size)
 

svtkobra7

Active Member
Jan 2, 2017
362
88
28
One step back - I am here:
  • I'll point the finger at myself, but I'm really glad you followed up as I think your recommendations, now that I properly understand them are absolutely the right path.
So you what you did was to take a 20GB slice each and add that as mirror'ed slog to the HDD pool - good.
  • Correct except I used 16GB slices.
  • Also, I didn't previously test a mirrored SLOG configuration (still learning, but logic suggests it is obviously the way to go and the numbers back it too).
  • In figure 1 below, there is the pool with on disk ZIL @ far left + benchmarks with a single slice and then two two slice variations, (a) "stripe" (although the aren't really striped, they are just sloggin together), and (b) mirrored. (a) is what I previously presented not (b).
  • Current testbed in 6.7 (moved from 6.5 as your argument was correct), produces sync=disabled HDD / no SLOG writes of 605 MB/s and with a mirrored slog sync writes are 589 MB/s.
  • Being inside of 3% from sync=disabled speed, seems amazing to me.
  • I'd say SLOGS are conquered (man they were a lot of trouble prior) and we can lock that configuration twice over and feel great about it.
Then you took a 240GB +220 GB slice for iscsi and created a 460 GB stripe (I assume also in Freenas) that is basically a Raid0.
Here I say don't do that, create two 22GB slices. pass to FN and create a 220 GB mirror'ed iscsi disk (where you gain resilience & readspeed but loos writespeed and size)
  • Correct, initially the sizes were offset due to space required for ESXi and FreeNAS boot on one of the drives; however, your advice was taken and now they are of equal size.
  • In hind sight it is perfectly clear what you intended me to hear. I missed it because I some how got confused and thought the + read / - write comments were related to the slog.
  • My unconveyed intent = play around and quantify the performance of a couple of raid layouts, but primarily ensure Optane can take the parallel use scenarios (we know it can), deploy it as a slog and move on, to actually create iSCSI / NFS mount later.
  • As to the specifics you dive in to, I think I'm perfectly in sync with your recommendation ...
  • The drives are physically local to ESXi 6.7 and presented as virtual disks to FreeNAS 11.1U6.
  • Still speaking to presence in ESXi, each of the two 240 GB virtual disks (480 GB in aggregate) were thick provisioned, eagerly zeroed and attached to a LSI Logic SAS SCSI Controller.
  • Now attached to the FreeNAS VM, inside of that environment, each drive was partitioned into two: (1) a 16 GB partition labeled Opt[x]log[x] to be tested in multiple SLOG configurations and (2) a 224 GB partition labeled Opt[x]block[x] to be tested in multiple zPool and then block storage configurations. So we end up with four partitions.
  • My initial concern was only that mirroring results in half the size of a stripe, but honestly it should be enough for my needs, especially when you consider I'll have a mirrored pool on each server. As soon as I recalled that, I completely agree with taking the redundant approach. Should be good enough - see Figure 2.
  • I installed vCenter Server finally, but haven't played with it too much, but I think it allows you to share local datastores and now it is so easy to move VMs. I've been missing out!
Figure 1
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Sounds like you are getting somewhere - glad to hear it:)


I installed vCenter Server finally, but haven't played with it too much, but I think it allows you to share local datastores and now it is so easy to move VMs. I've been missing out!
And not sure what you mean with this?
You can't share out local datastores afaik, but o/c you can move vms between 2 server's local datastores
 

svtkobra7

Active Member
Jan 2, 2017
362
88
28
Sounds like you are getting somewhere - glad to hear it:)
  • Thanks ... but really only thanks to you! (seriously).
  • Its bizarre that I had so many prior issues with NVMe and then it just starts, and keeps, working one day. I wonder if my old board didn't like it for some reason (rhetorical).
  • I'm done now, AFAIK, save for one item ... the network ...
It was so easy to directly connect the servers thatI wonder if I really need a switch. Maybe I keep the 10GbE link between the two and make myself figure this VLAN BS out (I found numerous reports of the same issue researchng for a solve) [In need a WAN + LAN port for pfSense] Or just buy another NIC (but I like clean and that isn't the clean approach).
more hours looking into it and a lot of other people reported the same issue as me.
  • I wonder if I could do it in NSX ... WAN in NIC1 on 1st server ... 10 GbE link between the two on NIC2 ... LAN out NIC1 on 2nd server ... hrmm ... Back to reality ... I've never used any of that before.
And not sure what you mean with this?
You can't share out local datastores afaik, but o/c you can move vms between 2 server's local datastores
  • You probably don't know what I mean because I don't know what I mean!
  • Which means I'm probably going to break something soon.
  • I had issues installing it initially, so gave it a try again and got it to work. And made myself stop playing around with it as it isn't a priority atm. So I'm speaking completely unintelligently, as yesterday was the first time, I had ever been in vSphere Server (literally).
  • But yes, what you said, or what they said => vSphere Documentation Center
  • Its going to be so nice to clone VMs ...
This question may seem super elementary to you, but when you were trying to sell me another host for HA/DRS, you implied that it had to be on site, but you could potentially put the witness off site in a stretched cluster (given a number of requirements are met, right)? The piece I'm missing is the least expensive of the 3, so while it may cost more than a $5/month VPS, it could potentially be hosted int the cloud, right?
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Lol, I am not selling you anything :p I just inform you about possible options based on my extensive knowledge of unsuccessful attempts to get it running the way I want it.
Yes witness can be offsite, no biggie, requirements are fairly low nowadays. Please note: don't put your pfsense box on the vsan in that case or you might be in a catch 22 situation if vsan is down (yes been there).

I think in your situation you could get a small third box for pfsense attachted to internet router & switch (which would alleviate your 10G switch issue) and then run a vsan cluster on your two esx boxes with a cloud witness or dedicated hosted server whatever is to your liking
 

svtkobra7

Active Member
Jan 2, 2017
362
88
28
[ Target audience = 10G/40G networking noobs (like myself) who may search for this later ... I had some concern that using a NIC that could not be passed through to FreeNAS (2 port card, 1 port needed for ESXi, all ports or none passed through) would present complexity; however, it was a piece of cake.
Networking pros = Please feel free to enhance as you see fit, but this is way above your paygrade :) ]

It ended up being stupidly simply to create a direct connection between ESXi hosts and then providing that virtualized networking to FreeNAS. In summary, on each host: (1) create virtual switch, (2) create port group, (3) assign port group to FreeNAS VM, and (4) finally in FreeNAS, assign a static IP address.

[I had been overthinking this in regards to vmkernels / TCP/IP stacks / routing a non-routable address ... blah blah and none of that need cause concern ... the only place you even assign any IP address is in the VM]

Config
  • On each ESXi host: MCX354A-FCBT (HP 649281-B21), flashed w/ 2.42.5000 fw
ESXi-01
  1. Create standard virtual switch - e.g. "40G"
    • MTU = 9000
    • Uplink = Designated Physical NIC (defined above)
    • Security = Accept MAC address changes + Accept Forged transmits
  2. Create Port Group - e.g. "40G"
    • Virtual Switch = 40G (as created in #1)
  3. Assign port group "40" to "FreeNAS-01" VM on "ESXi-01"
    • Network Adapter = "40G"
    • Adapter Type = VMXNET3
FreeNAS-01
  1. Network > Interfaces > Add Interface
    • Interface Name = 40G
    • IPv4 Address = 10.2.0.100
    • IPv4 Netmask = /16
    • Options = mtu 9000
ESXi-02
  • Repeat same steps for ESXi-01
FreeNAS-02
  • Repeat same steps for FreeNAS-02, changing the IP address to something different, yet on the same subnet.