Tshooting a borked network setup in esxi8.

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Railgun

Active Member
Jul 28, 2018
150
57
28
Not too long ago, I added a second HBA into my host to split my drives for my NAS. As I didn't pay too close attention to what I was actually doing, I think I moved my NIC to another slot. Oddly, SR-IOV became disabled in BIOS, discovered that as I was having a helluva time getting esxi to re-enable SR-IOV after the fact, which borked my FW and other things. I can't explain why that would have become disabled, but in the grand scheme, that's neither here nor there for the moment. Suffice to say, it's been re-enabled, and all my VMs are humming along...sort of.

It's an H12SSL-i board with an EPYC 7543. NIC is an Intel E810-XXVDA2. As noted, SR-IOV is enabled and I create the full whack VFs across both ports. All switch HW is Unifi, and in this case, I'm testing across this host connected to a USW-Pro-Aggregation via 25Gb MMF, which is in turn connected to a USW-Enterprise-8-PoE via 10Gb copper, to my desktop, also 10Gb copper.

I'm testing via iperf3 both on the same segment from desktop to host, across segments from desktop to host, and within the host across VMs on different segments.

The issue lies with the fact that I'm seeing a boatload of retransmissions from the context of iperf itself. I'm less concerned with testing to/from the desktop as I'm getting a ridiculous amount between VMs.

Going to the network side, FEC is enabled on the switchport in question. And while I do see some discarded and RX errors, the errors are on the order of .00001% so I'm taking that with a grain of salt.

This whole effort was brought about by the fact that my NAS, running as a Truenas core VM, dropped in sequential transfers from about 1GBps to 400MBps, even if in ARC. This is across 11 VDEV mirrors and being written to one of three NVMe SSDs on my desktop. Previous setup was 7 VDEVs of 3-disk raidz1 and sequential read/writes were near 1GBps. Disk performance is ~4.5GBps.

Happy to take suggestions in what to look at here.

Across a single connection, I'm seeing 14% CPU utilization and nearly 6Gbps throughput with ~500 retransmissions.
Across three connections, I'm seeing about 44% utillization and about 16.5Gbps with ~1200 retransmissions.
Across five, I saw a peak of 74% and ~22Gbps with anywhere from 4500 to 20k retransmissions.

Across five from VM to VM in the same segment, I can get ~32Gbps with 0 retransmissions.
Across one from VM to VM in the same segment, I can get abot the same but it's inconsistent WRT retransmissions.

So, part of this could simply be an artifact of iperf, but given some issues I see with my NAS and slower performance since the change, something leads me to believe there's a misconfig somewhere, or some other issue.

I'm not a vmware expert, and determining where the issue lies, on the assumption there is one is what I'm trying to find.
 
Last edited:

DavidWJohnston

Well-Known Member
Sep 30, 2020
295
254
63
When using something like iperf3 where lots of packets are being sent from a faster segment to a slower segment, seeing drops isn't going to be unusual. It may fill the buffer of the receive side and might be sending back PAUSE frames. QoS rules when sending to your 10G network and buffer tweaking might help.

If you use the -R flag to reverse the packet flow, do the retransmits stop?

Have you checked the MTU settings? (All of them - they are in several places) If you reduce the size of the iperf3 packets, does the problem go away?

Can you do some performance testing from your NAS to something on the same network segment? You don't even have to write it to disk, just send it to /dev/null and see how fast you can download.

You could also try plugging your NAS into your 10G network temporarily for a test. Even with 10G you should get more than 400MB/s.

Also in ESXi, if you are using "Promiscuous Mode" on any of your vSwitches or port groups, this will cause much higher CPU load.
 

Railgun

Active Member
Jul 28, 2018
150
57
28
When using something like iperf3 where lots of packets are being sent from a faster segment to a slower segment, seeing drops isn't going to be unusual. It may fill the buffer of the receive side and might be sending back PAUSE frames. QoS rules when sending to your 10G network and buffer tweaking might help.
Strictly speaking, yes, though I'd expect TCP to control this to a degree, I guess iperf doesn't quite work that way; just happens to use that protocol. No QoS rules in play here as it's unnecessary as on this 10Gb path, it's only me and on the 25Gb path, I'll never saturate it with the current setup.

If you use the -R flag to reverse the packet flow, do the retransmits stop?

Have you checked the MTU settings? (All of them - they are in several places) If you reduce the size of the iperf3 packets, does the problem go away?
1500 all the way through. Unfortunately the windows client doesn't report retransmissions in this view, only from the server side sending does it report that. That said, from an iperf perspective, I was able to improve that as for whatever reason I cannot recall, hardware offloading for the NIC was disabled in TrueNAS.

Can you do some performance testing from your NAS to something on the same network segment? You don't even have to write it to disk, just send it to /dev/null and see how fast you can download.

You could also try plugging your NAS into your 10G network temporarily for a test. Even with 10G you should get more than 400MB/s.

Also in ESXi, if you are using "Promiscuous Mode" on any of your vSwitches or port groups, this will cause much higher CPU load.
As mentioned, I'm trying both in and across segments. Promiscuous was enabled, but disabling showed no difference. Oddly, before I did some of the additional tests, inexplicably, I'm automagically getting ~700-800MBps on sequential reads now...with zero other changes (and this was before changing the hardware offload setting above).

I'm going to be partly blaming Unifi here as their switches leave much to be desired in my experience.

Running more than 2 streams from server to client does in fact overrun the egress port from the aggregation switch to the office switch. However, normal file transfers do not exhibit this behavior.

That all being said, from VM to VM in the same segment, I'm effectively capping at 10Gb. Only because the Windows VM I'm testing to is using the vmxnet driver. I can't find a VF driver for Win10.

EDIT: After a reboot of the host, it drops to crap again. So something is definitely borked here.
 
Last edited: