Not too long ago, I added a second HBA into my host to split my drives for my NAS. As I didn't pay too close attention to what I was actually doing, I think I moved my NIC to another slot. Oddly, SR-IOV became disabled in BIOS, discovered that as I was having a helluva time getting esxi to re-enable SR-IOV after the fact, which borked my FW and other things. I can't explain why that would have become disabled, but in the grand scheme, that's neither here nor there for the moment. Suffice to say, it's been re-enabled, and all my VMs are humming along...sort of.
It's an H12SSL-i board with an EPYC 7543. NIC is an Intel E810-XXVDA2. As noted, SR-IOV is enabled and I create the full whack VFs across both ports. All switch HW is Unifi, and in this case, I'm testing across this host connected to a USW-Pro-Aggregation via 25Gb MMF, which is in turn connected to a USW-Enterprise-8-PoE via 10Gb copper, to my desktop, also 10Gb copper.
I'm testing via iperf3 both on the same segment from desktop to host, across segments from desktop to host, and within the host across VMs on different segments.
The issue lies with the fact that I'm seeing a boatload of retransmissions from the context of iperf itself. I'm less concerned with testing to/from the desktop as I'm getting a ridiculous amount between VMs.
Going to the network side, FEC is enabled on the switchport in question. And while I do see some discarded and RX errors, the errors are on the order of .00001% so I'm taking that with a grain of salt.
This whole effort was brought about by the fact that my NAS, running as a Truenas core VM, dropped in sequential transfers from about 1GBps to 400MBps, even if in ARC. This is across 11 VDEV mirrors and being written to one of three NVMe SSDs on my desktop. Previous setup was 7 VDEVs of 3-disk raidz1 and sequential read/writes were near 1GBps. Disk performance is ~4.5GBps.
Happy to take suggestions in what to look at here.
Across a single connection, I'm seeing 14% CPU utilization and nearly 6Gbps throughput with ~500 retransmissions.
Across three connections, I'm seeing about 44% utillization and about 16.5Gbps with ~1200 retransmissions.
Across five, I saw a peak of 74% and ~22Gbps with anywhere from 4500 to 20k retransmissions.
Across five from VM to VM in the same segment, I can get ~32Gbps with 0 retransmissions.
Across one from VM to VM in the same segment, I can get abot the same but it's inconsistent WRT retransmissions.
So, part of this could simply be an artifact of iperf, but given some issues I see with my NAS and slower performance since the change, something leads me to believe there's a misconfig somewhere, or some other issue.
I'm not a vmware expert, and determining where the issue lies, on the assumption there is one is what I'm trying to find.
It's an H12SSL-i board with an EPYC 7543. NIC is an Intel E810-XXVDA2. As noted, SR-IOV is enabled and I create the full whack VFs across both ports. All switch HW is Unifi, and in this case, I'm testing across this host connected to a USW-Pro-Aggregation via 25Gb MMF, which is in turn connected to a USW-Enterprise-8-PoE via 10Gb copper, to my desktop, also 10Gb copper.
I'm testing via iperf3 both on the same segment from desktop to host, across segments from desktop to host, and within the host across VMs on different segments.
The issue lies with the fact that I'm seeing a boatload of retransmissions from the context of iperf itself. I'm less concerned with testing to/from the desktop as I'm getting a ridiculous amount between VMs.
Going to the network side, FEC is enabled on the switchport in question. And while I do see some discarded and RX errors, the errors are on the order of .00001% so I'm taking that with a grain of salt.
This whole effort was brought about by the fact that my NAS, running as a Truenas core VM, dropped in sequential transfers from about 1GBps to 400MBps, even if in ARC. This is across 11 VDEV mirrors and being written to one of three NVMe SSDs on my desktop. Previous setup was 7 VDEVs of 3-disk raidz1 and sequential read/writes were near 1GBps. Disk performance is ~4.5GBps.
Happy to take suggestions in what to look at here.
Across a single connection, I'm seeing 14% CPU utilization and nearly 6Gbps throughput with ~500 retransmissions.
Across three connections, I'm seeing about 44% utillization and about 16.5Gbps with ~1200 retransmissions.
Across five, I saw a peak of 74% and ~22Gbps with anywhere from 4500 to 20k retransmissions.
Across five from VM to VM in the same segment, I can get ~32Gbps with 0 retransmissions.
Across one from VM to VM in the same segment, I can get abot the same but it's inconsistent WRT retransmissions.
So, part of this could simply be an artifact of iperf, but given some issues I see with my NAS and slower performance since the change, something leads me to believe there's a misconfig somewhere, or some other issue.
I'm not a vmware expert, and determining where the issue lies, on the assumption there is one is what I'm trying to find.
Last edited: