VMware 6.5 and A2SDi-16C-HLN4F (cluster 2-node)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Scoped

New Member
Apr 14, 2018
2
0
1
42
i tested with CIFFs traffic and got 110MB/s transfer on a 1GBe link which is about what you will expect considering all other vmkernel traffic was going over this NIC

using version 6
 

omega

New Member
Apr 4, 2018
2
0
1
44
Craig thank you so much! works great for me. I have a supermicro E200-9A and after injecting the drivers into esxi 6.7, I was able to get it installed. Its been working for almost 2 weeks now. Speeds seem very good, close to a gigabit when transfering data.
 

Kev

Active Member
Feb 16, 2015
461
111
43
41
Isn't x553 a 10GB capable mac? Can anyone do performance tests with this driver in ESXI 6.5 or 6.7?
 

JJ Duru

New Member
Sep 15, 2018
14
3
3
Well, it's finally ready for release.

I finished the main code merge about two weeks ago. Since then I've been testing and tweaking to ensure the driver loads and operates properly.

I have named the driver ixgbe_x553_7 to indicate that it's the ixgbe driver but specifically for Intel X553/7 devices. In the attached vib, I've mapped the driver to load only for the device IDs listed below.
Code:
8086:15c2, 8086:15c3, 8086:15c4
8086:15c6, 8086:15c7, 8086:15c8
8086:15ca, 8086:15cc, 8086:15ce
8086:15e4, 8086:15e5

I have tested the driver with ESXi 6.7 on my Supermicro A2SDi-16C-HLN4F motherboard which has 4 x X553 NICs (device ID 8086:15e4). I successfully tested the following configurations:
  • ESXi 6.7
    • As a VMkernel NIC
  • VM CentOS 7.4 x64
    • Standard NIC connected to a virtual switch
    • PCI passthrough device
  • VM Win 7 x64
    • Standard NIC connected to a virtual switch
    • PCI passthrough device (device was seen by OS but no Windows driver available)
  • VM Win 10 x64
    • Standard NIC connected to a virtual switch
    • PCI passthrough device (device was seen by OS but no Windows driver available)
[ ... ]
Hi Craig,

I started using this driver you compiled, in the following setup:
- mobo: Supermicro A2SDi-2C-HLN4F - same ixgbe NICs as on your motherboard
- esxi 6.7: customized to the lastest patchset (ESXi670-201808001/release date 14AUG2018/build 9484548)
- the ixgbe driver you attached to this thread
- 3 VMs:
- 2 x Centos 7 x64, for DNS/DHCP and authoritative DNS
- 1 x pfSense x64, internet gateway

The load average on each individual VM is negligible: each of the above machines is running with the load average between 0.09 and 0.45.
However, when monitoring the esxi host from native hmtl5 client, the CPU usage hovers betwen 20% and 55%, which is really high.
By comparison, I do have another esxi 6.7 host which has 4 virtual machines, with host's CPUs having a lower frequency and I can tell that the overall CPUs usage is at max 30%.

So, something in the networking area of this combo, x553 NICs with the driver attached to this thread and ESXi 6.7 is not functioning correctly.
I did not notice low throughput: when testing internet connectivity I am able to obtain the stated speed of 120Mb/s, a sign that the x553 nic is doing its job. How it's doing its job, it may be another matter.

Do you experience high CPU usage on esxi host? Have you since your last post started using a different driver?
Any help is appreciated. Thank you.
 
Last edited:

Craig Thomson

New Member
Mar 5, 2018
18
15
3
... the CPU usage hovers betwen 20% and 55%, which is really high.
... So, something in the networking area of this combo, x553 NICs with the driver attached to this thread and ESXi 6.7 is not functioning correctly.
Hi JJ,

I'm sorry to hear you're experiencing problems. I'm still using build 6 of my driver (attached to post #37 of this thread). I am not (and have not) experienced any CPU load problems. In fact, I've not experienced problems of any kind with this driver.

My setup:
  • Supermicro A2SDi-16C-HLN4F
  • Driver net-ixgbe_x553_7-4.5.3-6.x86_64
  • ESXi 6.7 customized to ESXi-6.7.0-20180604001-standard (release date 25 June 2018)
  • 6 VMs (2 x Solaris 11, 2 x CentOS 7 x64, 1 x CentOS 7 x32, 1 x Ubuntu 16.04 x64)
The load average on each individual VM is, most of the time, negligible (same as you). When I monitor the ESXi host itself (via the HTML5 embedded host client) the CPU usage is 0.31% (min 0.3%, max 7.7%, avg 0.59%).

Even when my VMs are working hard (1 is a web/db server, 2 are compilers, 1 does video encoding) my load average never gets above 25% (but remember, I'm working with 16 cores).

Can I ask, what makes you suspect the load is caused by the network and/or the driver?

The only time I've experienced ESXi load that did not appear to be attributable to a guest VM was with Solaris 11 (11/11). The cause was an interrupt storm due to an incompatibility between ESXi and the Solaris interrupt timing mode. It was a problem unique to that specific version of Solaris and did not show up as load in the Solaris VM (i.e. ESXi load 100%, Solaris VM load 0%). Adding the following line to /etc/system on the Solaris VM solved the problem.
Code:
set pcplusmp:apic_timer_preferred_mode = 0x0
I mention this only to show that finding the cause of ESXi load can sometimes be tricky. I suspect you'll have to do much more investigation to get to the root cause.

As a starting point, what does the output of esxtop show?
 
  • Like
Reactions: JJ Duru

JJ Duru

New Member
Sep 15, 2018
14
3
3
Hi JJ,

I'm sorry to hear you're experiencing problems. I'm still using build 6 of my driver (attached to post #37 of this thread). I am not (and have not) experienced any CPU load problems. In fact, I've not experienced problems of any kind with this driver.

My setup:
  • Supermicro A2SDi-16C-HLN4F
  • Driver net-ixgbe_x553_7-4.5.3-6.x86_64
  • ESXi 6.7 customized to ESXi-6.7.0-20180604001-standard (release date 25 June 2018)
  • 6 VMs (2 x Solaris 11, 2 x CentOS 7 x64, 1 x CentOS 7 x32, 1 x Ubuntu 16.04 x64)
The load average on each individual VM is, most of the time, negligible (same as you). When I monitor the ESXi host itself (via the HTML5 embedded host client) the CPU usage is 0.31% (min 0.3%, max 7.7%, avg 0.59%).

Even when my VMs are working hard (1 is a web/db server, 2 are compilers, 1 does video encoding) my load average never gets above 25% (but remember, I'm working with 16 cores).

Can I ask, what makes you suspect the load is caused by the network and/or the driver?

The only time I've experienced ESXi load that did not appear to be attributable to a guest VM was with Solaris 11 (11/11). The cause was an interrupt storm due to an incompatibility between ESXi and the Solaris interrupt timing mode. It was a problem unique to that specific version of Solaris and did not show up as load in the Solaris VM (i.e. ESXi load 100%, Solaris VM load 0%). Adding the following line to /etc/system on the Solaris VM solved the problem.
Code:
set pcplusmp:apic_timer_preferred_mode = 0x0
I mention this only to show that finding the cause of ESXi load can sometimes be tricky. I suspect you'll have to do much more investigation to get to the root cause.

As a starting point, what does the output of esxtop show?

Darn it, now I realized what happened: I did use your driver from post #32, not post #37.

Back to the drawing board. I'll do the reinstalls and provide updates.
 

Craig Thomson

New Member
Mar 5, 2018
18
15
3
Darn it, now I realized what happened: I did use your driver from post #32, not post #37.

Back to the drawing board. I'll do the reinstalls and provide updates.
Before you go and reinstall everything, I should tell you that I don't believe build 6 of the driver will make any difference to your problem.

Both build 5 and build 6 are very similar. Build 6 differs only in some performance tweaks relating to throughput. I can't see that the differences between build 5 and build 6 would solve this issue. I suspect you'll encounter the same result.

Also, I know some people are still using build 5 and are not experiencing your load problem.

Rather than reinstall everything, I would focus on investigating the problem. From what I gather, the issue is not impacting services, it's just overworking your CPU, so you have time on your side.
 
  • Like
Reactions: JJ Duru

JJ Duru

New Member
Sep 15, 2018
14
3
3
Craig,

That's the thing: time is not on my side. With a needy family that consumes internet with bread, I have to keep the interwebz available as much as possible (hence the highly available internal network architecture).

I performed the reinstall: I get lower CPU usage overall. With all the HA relevant services moved over to the A2SDi-2C-HLN4F host, with netflix streaming to one of the machines, the CPU graph shows usage between 10% and 18% with spikes going up to 26-30% at times.

When running the speedtest.net test and attaining the 120Mbit/s download and 11-12Mbit/s upload, the CPU usage jumps as following:
- one core jumps to 63.38%
- one core jumps to 29.59%
The spike happens on the download phase.
I expected them not to be equally impacted because the PF filtering system inside pfsense is running on one CPU only, as far as I know.

Given the fact that the overall CPU usage is down, and the box is not performing any internal VLAN routing (therefore I do not get high latencies for the inter VLAN routing), I declare the operation a stunning success.

And all I can say is a big THANK YOU for creating this driver.

P.S. I suspect that whatever buffer numbers you changed, it worked. I declare myself disappointed that I chose a dual core mobo - never again.
 
  • Like
Reactions: Craig Thomson

Craig Thomson

New Member
Mar 5, 2018
18
15
3
Given the fact that the overall CPU usage is down... I declare the operation a stunning success.
Hi JJ,

I'm really glad performance has improved for you, but I have to say I'm also quite surprised. You've made me really curious now. Over the next few weeks I'll do some tests of my own.

My previous testing was all done using a private internal LAN and simple file transfers. I never tested any kind of networking software (like pfSense) nor did I test any kind of Internet traffic. I'm now wondering - could a different traffic profile, or a different type of load, yield different results?

I'll report back here with my findings.

And all I can say is a big THANK YOU for creating this driver.
You're welcome. :)
 

IT33513

New Member
Mar 14, 2018
6
0
1
32
UK
Hello guys
I want to create a laboratory environment using No. 2 A2SDi-16C-HLN4F and VMware 6.5 (cluster). I wanted to know if any of you have found compatibility problems with this combination.
I have read that the x553 network cards are not recognized.
another thing the cluster should handle around 20 vm. the cpu is able to handle these loads?

Thank you
May I wonder for what need you are willing to build 2 node Ha cluster?
 

Stril

Member
Sep 26, 2017
191
12
18
41
Hi!

Are there any news about x553-support?

I want to buy a board with 2 x553 NICs and 10 GBaseT.

Does it work with vSphere 6.7U1?

Best wishes