HP t740 Thin Client as an HP Microserver Gen7 Upgrade (finally!)

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
Okay, folks - just a quick FYI -

I was able to land a refurbished hp t740 thin client from eBay at a fairly substantial discount from quoted pricing (and as such, my existing order from my usual channel partner has been cancelled).

Downsides?
- It only comes with 8GB of DDR4 (to be upgraded to 64GB later, but I will be "borrowing" DIMMs from broken machines @ work to push it to 16 for now).
- If it doesn't do ARIFwd/ACSCtl (SRIOV), it'll just be a faster t730 but with one less Ethernet port
- What the heck am I going to do with the t730 now? (probably as an HTPC to replace my 2011 MacMini, but it'll need a Bluetooth/Wifi combo card to play that role)

Shipping is quoted for next Wednesday (April 22nd, 2020).
Right now the plan is to use it for testing purposes first, i.e. Windows 10 IoT, Debian Linux 10, Proxmox and then eventually VMWare ESXi 6.5U3, 6.7 or possibly 7.0. I am not expecting myself to replace the t730 until most things are "squared away" and to the point where it's ready as a drop-in replacement. I would want to upgrade the MSG7/N40L on the other end to use TrueNAS or something else.

Expect the following:
Rationale for replacement
Pricing analysis (versus its contemporaries)
Detailed dmesg/lspci listings and lstopo diagram
Power consumption/noise profiling (versus the t730)
Compatibility/suitability with accessories (various PCIe cards in my inventory)
Discussion regarding the t740's Windows 10 IoT build
Possibility of turning it into a Microsoft Teams Room appliance (like the Elite Slice G2)
Gaming performance analysis (I did one for the t730, why not for the t740?)

Stay tuned, folks. This could potentially be a fun one.
 
Last edited:

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
Update:

The machine miraculously showed up at my doorstep early this afternoon. Turns out that the machine was shipped from Nassau County, which is about 40 miles away from my home in NYC - the estimate from Unionized Package Smashers (UPS) was a bit pessimistic. So yes, the Great Horned Owl (the code name for the Ryzen embedded V1000) has landed.

Here are some early observations and photos:

- The power supply lead has been switched from the HP 7.4mm (black ring tip) to the 4.5mm (blue ring tip), so if you are coming in from the t620/t620 plus/t630/t730 machines and your t740 do not ship with a power brick, you'll want to pick up an adapter for it. The adapter are all generic (since HP does not seem to make a 7.4mm to 4.5mm adapter, just the other way around).

Interesting side-note: HP and Dell seem to use the same dimensions and general polarity for their older chargers (7.4mm for the old stuff, 4.5mm for the newer stuff, center positive polarity), however, their voltage is off by about 0.5v in general - they are theoretically interchangeable, but I am not going around plugging my HP Elitebooks to Dell PA12 power bricks just to see what happens (it'll work but the battery will refuse to charge). If you want a more future friendly way to deal with multiple brick types there are USB-PD to various barrel adapters on eBay, just for operational flexibility.

Here's the power brick (HP model 710473-001, which uses the HP 4.5mm x 3.0 center positive polarity tips. Below is a comparison between the old and new power leads.





Some seller silliness is evident here - the device is advertised as "refurbished but in perfect cosmetic condition"...except when they bundled the wrong stand for it. The newer model uses one which sits a bit taller and should be octagonal in shape. Not really a big deal. Either they can send me one, or I'll buy one later.

I paid only 400 USD in Q1 2020 (including taxes and shipping). The cheapest I've seen it go for is between 650 to 750 USD new at the time, so that's some savings for you right there, and in my opinion, compared to the 200-300 USD pricing on eBay for the t730, this is definitely worth the money at that price point. Of course, we do have to keep in mind that the t740 will be considered "current" for the next 4-5 years (much like the t730 when it came out in 2015), so don't expect a replacement any time soon.


The machine serial number is in a latch on the bottom (which is also where you mount the VESA100 stand, or if you prefer the device to sit horizontal. I plan on having this one be horizontal once I get the right stand.



One of my pet peeves is that due to the rounder contour of the t740, the power lead is at a 10 degree angle from the vertical, which looks really, really odd.



Opening it isn't difficult as the instructions are printed on the inside edge of the top cover (yes, they actually made the bottom stationary and the top removable...which is opposite of the t730.



So what does it look like inside?



The t620/730 are DDR3/3L units, while the t540/630/640/740 are DDR4 Notebook DIMM units. Looks like the machine has 2 4GB DIMMs as starters, and those will have to be replaced. I have questions on whether the RAM limit is 32GB (reported for the Great Horned Owl platform) or 64GB (common to Raven Ridge machines). The seller omitted the RAM shield, which I am not all that happy about (it's a passive heatsink/EM shield, and I don't see an FRU part listing so I can't third party order this).

I am expecting significantly better performance (compared to the GX415GA, GX420CA on the t620 or the RX427BB on the t730), as rough estimates of performance (based on Passmark) has this machine equal to the Ryzen 5 2400GE (the specs are closer to the Ryzen 5 2600H). We'll have to see about that.

The boot media is on the M2 slot You figure HP would've been able to just buy some cheap M.2 SATA SSDs. But nope...



This boot media unit (the Mothim SD7 eMMC) looks custom made. Here it is next to my SATA SSD (Intel 600p?). Remember, Key-BM is SATA while Key-M is NVMe.



For those who are wondering about the t740's abilities, here's the earliest lspci -vv, dmidecode and the dmesg dumps. Note that these are taken from within PartedMagic and on the initial v1.04 BIOS.

Quick summary for those who are not about to dig through the logs -
Is the hardware SRIOV capable? Yes (but with a caveat to be covered later)
Can it boot NVMe, SATA and SD7? Yes on all counts
What does power consumption look like when you spin VMs up? Need to be tested.
What about noise? Rough estimates based on spinning the machine to 80 Celsius via stress-ng says that the t740 is about 20% noisier than the t730. I am not sure whether that is due to the missing stand, missing DIMM cover or something else. More testing is needed.

Here's an lstopo graph I made representing the machine after BIOS update to the latest, Proxmox upgraded to 6.1 (latest) and some stuff...enabled. The Vega 8 used in the thin client is allocated 1GB by default, and hence the RAM count of 6866MB.

@arglebargle, remember the fun of trying to pass 7 Solarflare VFs into the t730...? Here's the t740 passing 254 VFs (that's the max of 127 VFs per port, 2 ports.
Note: I have NO idea whether this is working or not. I'll need to fire up a VM to see how this is being consumed)



...more to come later.
 
Last edited:

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
656
235
43
Heyyyy, this is looking great. NVMe plus SATA plus 10 or 40Gb in the PCIe slot has me very interested in picking up a handful of these when I can get them <$400 or so. Let me know what happens with SR-IOV, if VF passthrough actually functions (and you can release VFs when shutting down VMs) I might just spend some money and jump on these early.

That custom NVMe eMMC module is pretty slick, if they offered one in~64-128GB that would be perfect low power storage for something like a Pinebook.
 
Last edited:
  • Like
Reactions: Samir

PigLover

Moderator
Jan 26, 2011
3,012
1,314
113
Heyyyy, this is looking great. NVMe plus SATA plus 10 or 40Gb in the PCIe slot has me very interested in picking up a handful of these when I can get them <$400 or so. Let me know what happens with SR-IOV, if VF passthrough actually functions (and you can release VFs when shutting down VMs) I might just spend some money and jump on these early.

That custom NVMe eMMC module is pretty slick, if they offered one in~64-128GB that would be perfect low power storage for something like a Pinebook.
The comment about the Pinebook confuses me - pinebook already has an eMMC capability and it is replaceable up to 128gb (and fairly cheap). Not sure how an eMMC-on-NVMe would help for that one.
 
  • Like
Reactions: Samir

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
656
235
43
The comment about the Pinebook confuses me - pinebook already has an eMMC capability and it is replaceable up to 128gb (and fairly cheap). Not sure how an eMMC-on-NVMe would help for that one.
The PBP has an NVMe expansion header but doesn't really benefit from NVMe speeds most of the time (you bottleneck waiting on the CPU rather than storage in most cases.) Lower power storage expansion using eMMC on NVMe should provide better runtime on battery unless the eMMC/NVMe controller is power hungry. It's totally unnecessary, but could be a nice option. This is also the first time I've seen eMMC packaged on NVMe so it brought that to mind.
 
Last edited:
  • Like
Reactions: Samir

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
Heyyyy, this is looking great. NVMe plus SATA plus 10 or 40Gb in the PCIe slot has me very interested in picking up a handful of these when I can get them <$400 or so. Let me know what happens with SR-IOV, if VF passthrough actually functions (and you can release VFs when shutting down VMs) I might just spend some money and jump on these early.

That custom NVMe eMMC module is pretty slick, if they offered one in~64-128GB that would be perfect low power storage for something like a Pinebook.
Well, you are in luck (kinda). I got mine for 400 USD including shipping + handling (the seller is located in Uniondale, NY and supposedly have a few more on-hand). As for VFs? Well, this is an interesting one. I only have the Solarflare SFN5122F as a suitable test card (the SFN7322F Flareon is too hot, and I don't have Intel i520s or an extra Mellanox MCX354A to work on).

I could confirm the following characteristics:

- SRIOV seems to be working (i.e. VFs off the SFN5122F are allocated and can be assigned to guest VFs)

However, guest OSes can't seem to consume the Solarflare VFs, leaving situations like this across the 4 VMs that I have, like this one here (Running Debian 10 x64)

Code:
root@Passthrough01:~# lspci | grep Solar
01:00.0 Ethernet controller: Solarflare Communications SFC9020 Virtual Function [Solarstorm]
02:00.0 Ethernet controller: Solarflare Communications SFC9020 Virtual Function [Solarstorm]

root@Passthrough01:~# lshw -C net -businfo
Bus info          Device      Class       Description
=====================================================
pci@0000:01:00.0              network     SFC9020 Virtual Function [Solarstorm]
pci@0000:02:00.0              network     SFC9020 Virtual Function [Solarstorm]
pci@0000:06:12.0              network     Virtio network device
virtio@2          ens18       network     Ethernet interface

oot@Passthrough01:~# lshw -C net
  *-network UNCLAIMED
       description: Ethernet controller
       product: SFC9020 Virtual Function [Solarstorm]
       vendor: Solarflare Communications
       physical id: 0
       bus info: pci@0000:01:00.0
       version: 00
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msix cap_list
       configuration: latency=0
       resources: memory:fe810000-fe811fff memory:fe800000-fe80ffff
  *-network UNCLAIMED
       description: Ethernet controller
       product: SFC9020 Virtual Function [Solarstorm]
       vendor: Solarflare Communications
       physical id: 0
       bus info: pci@0000:02:00.0
       version: 00
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msix cap_list
       configuration: latency=0
       resources: memory:fe610000-fe611fff memory:fe600000-fe60ffff
No driver claimed the detected devices on the guest VM, and this is indeed the case on Debian 10, Ubuntu Server (latest LTS release) and CentOS 7. Solarflare dropped support for the SFN-5/6xxx series virtual functions in modern distributions. My guess is that the older controllers needed their drivers updated to some newer APIs, and instead of a rewrite they want to push the 7xxxs and above instead (they also operate under a different licensing model). I'll likely need to grab an old Ubuntu 10 or 12 LTS image and spin it up as a guest just to see what happens.

*UPDATE* - Nope, Ubuntu 12 LTS VMs do not recognize the VFs. This could be an issue with how the sfc driver on the PF/hypervisor seems to crap out unrecognized VFs. I might have to grab a SATA M.2 card, install an old Linux distro like CentOS 6 and see how it behaves with SFC PF/VF allocations on the t740.

And then, there is this headscratcher - any VFs beyond the first 6 does not work, and I can't tell if it's unique to this card, or to the t740. Any attempts to start VMs assigned with Solarflare VFs 01:01:00 and above will result in an error where the hypervisor spikes access to the additional VFs by reporting the PCIe device type as type ff, and the guest VM which sees it will not try to initialize it.

Code:
[Tue Apr 21 20:21:45 2020] vfio-pci 0000:01:01.0: Hardware bug: VF reports bogus INTx pin 255
[Tue Apr 21 20:21:45 2020] vfio-pci 0000:01:01.0: vfio_cap_init: hiding cap 0xff@0xff
[Tue Apr 21 20:21:45 2020] vfio-pci 0000:01:01.0: vfio_cap_init: hiding cap 0xff@0xff
[Tue Apr 21 20:21:45 2020] vfio-pci 0000:01:01.0: vfio_cap_init: hiding cap 0xff@0xff
[Tue Apr 21 20:21:45 2020] vfio-pci 0000:01:01.0: vfio_cap_init: hiding cap 0xff@0xff
[Tue Apr 21 20:21:45 2020] vfio-pci 0000:01:01.0: vfio_cap_init: hiding cap 0xff@0xff
[Tue Apr 21 20:21:45 2020] vfio-pci 0000:01:01.0: vfio_cap_init: hiding cap 0xff@0xff
[Tue Apr 21 20:21:45 2020] vfio-pci 0000:01:01.0: vfio_cap_init: hiding cap 0xff@0xff
This effectively leaves me with only 6 supposedly working VFs, but even then since no modern distro recognizes the VFs, it can't be used. Honestly, if it's 6 working VFs at 10GbE line speeds, that's pretty good already. Of course, if it's a quad port 1GbE card, then that's less useful.

At this stage, I can do any of the 4 things:
a) Buy a PCIe slot extender and get the Flareon outside of the chassis, and then put a fan under the Flareon to keep it cool
b) Order a Mellanox MCX354A-FCBT
c) Order an Intel i520-T4.
d) Order a bunch of M.2 SATA cards (128-256GB please) and adapters for working with EIDE (so I can recycle the 128GB models in my retrogaming machines)

I actually did all 4...because useful.

Okay, I am going to postpone further SRIOV testing. Power consumption testing will require me to take my Ekecity Voltson down from the t730 on the rack...which will not happen until the next downtime window on Saturday.

Next up, time to do some perf and noise testing.

Here's the Passmark baseline in Windows 10 IoT. The CPU is around 7600-7700 CPUMarks...which is in the same ballpark as, say, Ryzen 5 2400GE.

Here's quick shot of what stress -c 12 does to your Ryzen embedded machine...



and the temp reading after about 10 minutes of that:



And here it is in 0.26 load (idle) recovering to the low temp within 90 seconds.



...more to come.
 
Last edited:
  • Like
Reactions: Samir

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
So, today's t740 adventure has to do with the "fun" aspect of it...can we use it as a cheap-ish HTPC?
(Keep in mind that they retail for 700 USD but there are "best offer" auctions on evilbay close to 500 USD, and the only direction the pricing will go is down)

As @SwanRonson asked in the previous postings on my t730 misadventures....can the t740 possibly make for a good HTPC?

Well, let's check out the CPUMarks comparison versus the Ryzen 5 2400GE (its closest APU cousin) and the RX427BB (The chip it replaced on the t730)



Wow. The RX427BB used to be quoted at around 3300 - either more test results show that the early numbers were a little higher than assumed, or the Spectre patches really did a number on the performance of the piledriver cores. Anyways, that's almost a 2.5x increase in relative performance. How about some live numbers from an actual Passmark test?



Well, the CPU is slightly better than the by-the-books number up on top. Alright. So thats the CPU. What about GPU featureset?



The Vega 8 has 512 unified shaders, 16 ROPs and 32 Texture mapping units across 8 compute units (hence Vega 8). When clocked to 1333 MHz it should have 1350 Gigaflops of computing power (assuming single precision floats). That's nearly 3x the raw performance on the Radeon R7s of the t730s, and close to the XBox One Durango. The 1GB of VRAM is taken from the main memory.

So how do games run on the t740 in general? Well, I only have Wii/Gamecube and PS2 games at my disposal.

First, Super Mario Galaxy via the latest Dolphin Wii/GC emulator build:



Native resolution/2x FSAA, it runs at 60-80% of native speed on native resolution. Not too brillant, but then there might be settings that needs to be tweaked.

What about Zelda Twilight Princess (GC) on Dolphin?

Cutscene:


40-60% native speed. What about in-game?



About 80%. Note that the CPU and the GPU are not stressed whatsoever, so it could just be poorly optimized code or drivers.

How about Auto Modellista (GC)?



That one is 80-100%.

Huh. The current Dolphin nightly build doesn't look all that optimized for the Ryzen. I'll likely need to do a side-by-side comparison with the t730.

Hmmm...maybe I should re-test this with the ISOs loaded inside the eMMC rather than having them sit on a USB-C thumb drive. It could be a fillrate issue...?

Okay, what about the PS2 via PCSX2?



100% framerate, and both the CPU/GPU doesn't look stressed. Ghostbusters for PS2?



Not bad either. So with an improved Dolphin build we might be able to run some GC/Wii stuff better.

Alright, what about video playback for HTPC use cases?

The Ryzen APUs should all have VCN (video core next) 1.0 encode/decode blocks for H.264 and H.265 (HEVC) at up to 4k resolution. VLC should be able to show us those capabilities.

Here's 140Mbps 10 bit HEVC playback at 4k resoluton:



Here is 250Mbps 10 bit HEVC at 4k resoluton:


Note that while the GPU workload shot up to around 90%, this remained usable for up to 400Mbps HEVC.

What about Google VP9, used by Youtube?



Again, looks like it's somewhat working at 4k resolution - the CPU utilization went to 37% but the GPU is at 15%.

Oh yeah, VCN 1.0 do not like 8k resolution YouTube videos and will crash Google Chrome.

How about video transcoding/encoding using its built-in video coding engine (VCE), considering that Intel has Quicksync and nVidia has NVENC?

Eh...not so good. A transcode from 140Mbps HEVC10 to 1080p30 resulted in about 11 fps, which is likely dismal compared to the Quicksync circuitry on Kaby or Coffee lake.

For video consumption, very likely yes. For transcoding/streaming? Nope.

 
Last edited:

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
Okay, back to the t740 microserver testing malarky.

So I got a private message or 2 pointing out some issues with the Dolphin (Gamecube/Wii emulator) methodology, namely:

a) AMD Vega drivers for OpenGL kinda suck (understandable since it's an obsolete/deprecated graphics API)

b) AMD Vega Vulkan drivers are much better (Oh really?), DX11 drivers are fine, but Dolphin seems to flip out on DX12 when used with AMD Vega (which also seem to be the case on my Dell XPS15 with the Kaby Lake-G SoC)

c) Running the game ISOs off the USB3 drive induces a higher CPU workload

Alright, so I re-ran those tests using Vulkan and by copying the game ISOs to my eMMC SSD, and the results were...enlightening:



Super Mario Galaxy (Wii) now runs full native resolution @100% (2x is still out of the question though, due to some complex lighting effects, but 2x FSAA is totally fine. Note GPU utilization at only 40%...



Zelda Twilight Princess (Gamecube) runs at 2x native resolution (720p) at 100%. Note CPU utilization as being fairly low, and GPU utilization at...50%.



Auto Modellista (Gamecube) runs at 3x native resolution (1080p) at 100%. GPU utilization is at 80% but otherwise, it’s fine.



Now, to clarify video decode, I should splash up the codec, bitrate and GPU utilization. HEVC @ 4k, 400Mbps and the CPU is less than 5%. The video decode circuit is hitting neatly 88%, though...



Google/On2 VP9 (Youtube native encoding) doesn't seem to use a hardware accelerated (VCE/DXVA) codepath with a 4k/125Mbps stream. That could be VLC, though.

And regarding VCE video encode, VCN 1.0 is supposed to be able to encode/decode up to 4k resolution for HEVC/H.264, but I can't tell for sure. Either the codepath shipped with Handbrake doesn't support VCN 1.0, or VCN 1.0 doesn't work very well on Vega 8. Either ways...here's what it looks like re-encoding an HEVC@4k 400Mbps stream to 1080p30 - 10 fps and not much action on the video encoding engine.



This could be a potential issue for Kodi or Plex on this box. More testing is needed, once again.

Next up, I just received the Mellanox Connect-X3 for more SRIOV testing.

More to come...
 
Last edited:

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
Okay, so to get back to the SRIOV capabilities of the t740.

I got the Mellanox ConnectX3 (MCX354A-FCBT) and had it configured and running on Proxmox 6.1. No special trickery was needed for the most part, although I had some headaches dealing with the num_vfs allocation arguments on mstconfig (single number, a bunch of commas, and colons, huh...?)

AFAIK, no issues with the VFs on Proxmox and its associated Debian Linux - both on allocation on VM start, deallocation on VF stop, reassignment to other VFs, and all that mumble jumble. Everything looks clean and squared away. However, I ran into the same issue with assigning more than 7 VFs on the Solarflare 5-series - if you try to do it the bus controller returns a -255 and any further VFs will be "spiked" (as in, it'll come back as an invalid PCIe device). As @arglebargle mentioned numerous times, this is likely the inability for the hardware to do ARI (Alternative Routing-ID Interpretation) forwarding - a feature taken for granted on Xeon or Threadripper/Epyc hardware but not always on the consumer-side of things. It's not well documented from the hardware makers, and for a big thin client that we are hacking to get certain things done, it's also not a major priority.

I am not sure that if this is a hardware issue (the PCIe controller says that it can do ARI Forwarding) or rather an AGESA firmware/BIOS disablement issue. My guess is that when we get our hands on a DFI clone of this board we'll know. Or perhaps a later version of the BIOS will enable it.

Now, onto ESXi - I ordered and obtained 3 inexpensive 64GB SATA SSDs (15 each, not a great deal but cheap/fast delivering enough given the COVID19 logistical insanity outdoors) - one was for 6.5U3, one is for 6.0U3 Patch 22, and one more I have yet to decide. Building a custom ISO (needed due to the lack of native driver support for Realtek NICs past ESXi 5.5) is not difficult at all provided that you have access to PowerCLI 5 (which implies a Win10 machine with working Powershell capabilities) and V-Front's ESXi customizer Powershell script 2.60. All you really need to do is something like:

Code:
ESXi-Customizer-PS-v2.6.0.ps1 -v65 -vft -load net-r8168
And it should create a custom ISO for you in about 10 minutes. Then use rufus to make a bootable USB drive, disable Secureboot on the t740 and go to town.
(Note: The same ISO can be used on the t730 as well, and possibly any of the thin clients requiring the Realtek drivers).

ESXi 6.5 works for the most part, except for the fact that the native ConnectX3 drivers (nmlx4) does not support SRIOV functionality on ESXi 6.5. In order for this to work we'll likely have to disable the native drivers and use the older, deprecated OFED.

(to be continued...)
 
Last edited:

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
@WANg
Regarding SR-IOV, are we on the t730 level except for the deallocation issue known with t730?
On the t740, it's a bit complicated...

In Linux, you don't need to hack the kernel to allow unsafe data passing between PCIe endpoints (PCIe Access Control Services being present on the Ryzen Embedded SoC), so for something like Proxmox, it works out of the box. VF allocations up to 7 VFs are tested for create, release, reallocate, release, and return-from-allocate. Everything looks fine.

However, PCIe Alternative Routing-ID Interpretation (which allows you to address more than 7 VFs per PCIe endpoint) is either missing or disabled on the firmware/BIOS.

Now, for ESXi (and this is me testing it for the past 2 days), I ran into 2 issues:
a) ESXi 6.0+ native drivers (nmlx4) for Mellanox does not support SRIOV in ConnectX3 cards.
b) Mellanox OFED driver v2.4 for 5.5/6.0 does not work on this machine (it'll fail to allocate enough space for the routing table, fail to initialize the mlx4_core driver and simply not claim the device).
c) Mellanox OFED driver v1.9 for 5.5/6.0 does work on this machine, but for some reason it'll won't initialize VFs either.

Code:
/log/vmkernel.log:2020-05-01T23:26:06.070Z cpu5:65983)<6>mlx4_core: Initializing 0000:01:00.0
/var/log/vmkernel.log:2020-05-01T23:26:06.070Z cpu5:65983)<4>mlx4_core 0000:01:00.0: Enabling SR-IOV with 4 VFs
/var/log/vmkernel.log:2020-05-01T23:26:06.070Z cpu5:65983)LinPCI: vmklnx_enable_vfs:1371: enabling 4 VFs on PCI device 0000:01:00.0
/var/log/vmkernel.log:2020-05-01T23:26:06.071Z cpu5:65983)PCI: 786: Unable to allocate 0x4000000 bytes in pre fetch mmio for PF=0000:01:00.0 VF-BAR[2]
/var/log/vmkernel.log:2020-05-01T23:26:06.071Z cpu5:65983)PCI: 866: Failed to allocate and program VF BARs on 0000:01:00.0
/var/log/vmkernel.log:2020-05-01T23:26:06.071Z cpu5:65983)WARNING: PCI: 1470: Enable VFs failed for device @0000:01:00.0, Please make sure the system has proper BIOS installed and enabled for SRIOV.
/var/log/vmkernel.log:2020-05-01T23:26:06.071Z cpu5:65983)WARNING: LinPCI: vmklnx_enable_vfs:1375: unable to enable SR-IOV on PCI device 0000:01:00.0
/var/log/vmkernel.log:2020-05-01T23:26:06.071Z cpu5:65983)<3>mlx4_core 0000:01:00.0: Failed to enable SR-IOV, continuing without SR-IOV (err = 0).
/var/log/vmkernel.log:2020-05-01T23:26:11.583Z cpu5:65983)<4>mlx4_core 0000:01:00.0: 64B EQEs/CQEs supported by the device but not enabled
/var/log/vmkernel.log:2020-05-01T23:26:12.799Z cpu5:65983)VMK_PCI: 765: device 0000:01:00.0 allocated 22 MSIX interrupts
/var/log/vmkernel.log:2020-05-01T23:26:12.799Z cpu5:65983)MSIX enabled for dev 0000:01:00.0
/var/log/vmkernel.log:2020-05-01T23:26:12.864Z cpu5:65983)PCI: driver mlx4_core claimed device 0000:01:00.0
I actually had to upgrade the machine to ESiXi 6.5....

Code:
Grab the installer vib package at: https://my.vmware.com/group/vmware/patch#search)
Download ESXi650-201912002.zip
scp the file to the box and drop it in /tmp/
esxcli software vib install -d /tmp/ESXi650-201912002.zip
Reboot
blacklist the native nmlx4 drivers...

Code:
[root@dash:~] esxcli system module list | grep nmlx4
nmlx4_core                           true        true
nmlx4_en                             true        true
nmlx4_rdma                           true        true

esxcli system module set -e false -m nmlx4_rdma
esxcli system module set -e false -m nmlx4_en
esxcli system module set -e false -m nmlx4_core

[root@dash:~] esxcli system module list | grep nmlx4
nmlx4_rdma                         false       false
nmlx4_en                           false       false
nmlx4_core                         false       false

just so I can retain OFED and use the new, ESXi 6.5+ parameter of:

Code:
VMkernel.Boot.pciBarAllocPolicy=1
This parameter is used to minimize the initial PCIe BAR allocation (in case it’s a BAR allocation issue)

So by default, the log level for ESXi 6.5 is much less verbose than 6.0. We'll need to set

Code:
config.HostAgent.log.level
To trivia (or very verbose) to get ESXi to give me something useful.

Code:
[root@dash:~] grep 05-02T01:43 /var/log/vmkernel.log  | grep 0000:01 | grep mmio
/var/log/vmkernel.log:2020-05-02T01:43:04.343Z cpu2:65983)PCI: 786: Unable to allocate 0x2000000 bytes in pre fetch mmio for PF=0000:01:00.0 VF-BAR[2]
Well, let's see. I specified 4 VFs in the Mellanox card config...and it asked for 0x2000000 bytes (that's 64MBytes if my math is right). And it failed on PCIe BAR allocation. Okay, does that BAR size allocation differ? Let's set it to 12 (maximum quoted)

Allocate and reboot, then scan the logs for the timestamp and look for mmio...

Code:
[root@dash:~] grep 05-02T01:53 /var/log/vmkernel.log  | grep 0000:01 | grep mmio
2020-05-02T01:53:49.286Z cpu2:65981)PCI: 786: Unable to allocate 0xc000000 bytes in pre fetch mmio for PF=0000:01:00.0 VF-BAR[2]
Huh. 0xc0000000 is 192MBytes, or 12*16. So the PCIe BAR space changes according to the VFs allocated. What about 1? Set and reboot.

Code:
[root@dash:~] grep 05-02T01:59 /var/log/vmkernel.log  | grep 0000:01 | grep mmio
2020-05-02T01:59:33.116Z cpu3:65983)PCI: 786: Unable to allocate 0x1000000 bytes in pre fetch mmio for PF=0000:01:00.0 VF-BAR[2]
Yep. I am guessing...some kind of PCIe BAR allocation issue due to the BIOS, or something funky with the driver?

@arglebargle what do you think? I don't think ESXi allows you to mess with PCIe BAR allocaton size too much, at least not beyond the boot parameter for fit-first or fit-small. I think I'll need to boot into Proxmox and look at the PCI settings again, and then use mstconfig to check PCIe BAR allocation size (If it's handled on the card side).

Update: Just checked the PCIe Virtual Functions on Proxmox...16MB BARs per VF there and no issues.

Code:
01:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
    Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]
    Flags: bus master, fast devsel, latency 0, IRQ 49
    Memory at fe700000 (64-bit, non-prefetchable) [size=1M]
    Memory at 80000000 (64-bit, prefetchable) [size=16M]
    Expansion ROM at fe600000 [disabled] [size=1M]
    Capabilities: [40] Power Management version 3
    Capabilities: [48] Vital Product Data
    Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [c0] Vendor Specific Information: Len=18 <?>
    Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
    Capabilities: [148] Device Serial Number 00-02-c9-xx-xx-xx-xx-xx
    Capabilities: [154] Advanced Error Reporting
    Capabilities: [18c] #19
    Capabilities: [108] Single Root I/O Virtualization (SR-IOV)
    Kernel driver in use: mlx4_core
    Kernel modules: mlx4_core

01:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Flags: bus master, fast devsel, latency 0
    [virtual] Memory at 81000000 (64-bit, prefetchable) [size=16M]
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [9c] MSI-X: Enable+ Count=108 Masked-
    Capabilities: [40] Power Management version 0
    Kernel driver in use: mlx4_core
    Kernel modules: mlx4_core

01:00.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Flags: fast devsel
    [virtual] Memory at 82000000 (64-bit, prefetchable) [size=16M]
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [9c] MSI-X: Enable- Count=108 Masked-
    Capabilities: [40] Power Management version 0
    Kernel modules: mlx4_core

01:00.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Flags: fast devsel
    [virtual] Memory at 83000000 (64-bit, prefetchable) [size=16M]
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [9c] MSI-X: Enable- Count=108 Masked-
    Capabilities: [40] Power Management version 0
    Kernel modules: mlx4_core

01:00.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Flags: fast devsel
    [virtual] Memory at 84000000 (64-bit, prefetchable) [size=16M]
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [9c] MSI-X: Enable- Count=108 Masked-
    Capabilities: [40] Power Management version 0
    Kernel modules: mlx4_core

01:00.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Flags: fast devsel
    [virtual] Memory at 85000000 (64-bit, prefetchable) [size=16M]
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [9c] MSI-X: Enable- Count=108 Masked-
    Capabilities: [40] Power Management version 0
    Kernel modules: mlx4_core

01:00.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Flags: fast devsel
    [virtual] Memory at 86000000 (64-bit, prefetchable) [size=16M]
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [9c] MSI-X: Enable- Count=108 Masked-
    Capabilities: [40] Power Management version 0
    Kernel modules: mlx4_core

01:00.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
    Flags: fast devsel
    [virtual] Memory at 87000000 (64-bit, prefetchable) [size=16M]
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [9c] MSI-X: Enable- Count=108 Masked-
    Capabilities: [40] Power Management version 0
    Kernel modules: mlx4_core
 
Last edited:

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
Oh yeah. Looks like BAR size is defined on the card itself:

Code:
root@Proxmox01:~#  mlxconfig -d /dev/mst/mt4099_pci_cr0 query

Device #1:
----------

Device type:    ConnectX3   
Device:         /dev/mst/mt4099_pci_cr0

Configurations:                              Next Boot
         SRIOV_EN                            True(1)     
         NUM_OF_VFS                          12         
         LINK_TYPE_P1                        ETH(2)     
         LINK_TYPE_P2                        ETH(2)     
         LOG_BAR_SIZE                        4           
         BOOT_PKEY_P1                        0           
         BOOT_PKEY_P2                        0           
         BOOT_OPTION_ROM_EN_P1               True(1)     
         BOOT_VLAN_EN_P1                     False(0)   
         BOOT_RETRY_CNT_P1                   0           
         LEGACY_BOOT_PROTOCOL_P1             PXE(1)     
         BOOT_VLAN_P1                        1           
         BOOT_OPTION_ROM_EN_P2               True(1)     
         BOOT_VLAN_EN_P2                     False(0)   
         BOOT_RETRY_CNT_P2                   0           
         LEGACY_BOOT_PROTOCOL_P2             PXE(1)     
         BOOT_VLAN_P2                        1           
         IP_VER_P1                           IPv4(0)     
         IP_VER_P2                           IPv4(0)     
         CQ_TIMESTAMP                        True(1)
Whoops.
LOG_BAR_SIZE is 2^4, or 16MB. Let's fix that by reverting it back to the default (0)...which is 8MB if I remember correctly.

Code:
root@Proxmox01:~# mlxconfig -d /dev/mst/mt4099_pci_cr0 set LOG_BAR_SIZE=0

Device #1:
----------

Device type:    ConnectX3   
Device:         /dev/mst/mt4099_pci_cr0

Configurations:                              Next Boot       New
         LOG_BAR_SIZE                        4               0          

Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.
Alright, let's swap that SSD back and see how it fares.

Code:
[root@dash:~]  grep 05-02T03:09 /var/log/vm*.log   | grep 00:01:00
/var/log/vmkwarning.log:2020-05-02T03:09:58.889Z cpu0:65983)WARNING: PCI: 1470: Enable VFs failed for device @0000:01:00.0, Please make sure the system has proper BIOS installed and enabled for SRIOV.
/var/log/vmkwarning.log:2020-05-02T03:09:58.889Z cpu0:65983)WARNING: LinPCI: vmklnx_enable_vfs:1375: unable to enable SR-IOV on PCI device 0000:01:00.0
*Gah*. Well, that didn't work. It's either the card (doubt it) the BIOS (likely), or the drivers (no SRIOV support in the native 6.5 drivers, and who knows if it plays ball on the old OFED drivers). I could purge the 1.9 and go back to 2.4 just to see what happens.
I would like to grab a ConnectX4 to test mlx5 drivers, but those are, what, 2-300 dollars each?

Let's see if the SolarFlare Flareon will fare any better.

Note: The SolarFlare Flareon 10GbE cards run really hot, so I am waiting for a PCIe extender cable to be delivered. I want to get that card as far away from the t740 internals as possible even for testing purposes.
 
Last edited:
  • Like
Reactions: Samir

csp-guy

Active Member
Jun 26, 2019
365
121
43
Hungary, Budapest
OMG, lot of loosing my hair by problems with ESXi and mellanox. :-(

I plan to test it with intel X-710 T4. I am waiting for my shipment from UK, then i will replace my Dell branded i350-t4.
 
  • Like
Reactions: Samir

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
OMG, lot of loosing my hair by problems with ESXi and mellanox. :-(

I plan to test it with intel X-710 T4. I am waiting for my shipment from UK, then i will replace my Dell branded i350-t4.
You are concentrating on the wrong strategy. It's as likely a hardware issue as much as a software limitation on the t740. Figure out how to activate SRIOV on the i350T4, and then scan your VMWare logs to see whether there are issues on the OS/BIOS itself.
 
  • Like
Reactions: Samir

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
Alright, I got some additional items yesterday and today which would make things...a bit easier. First item is the actual t740 stand (which is octagonal in shape and fits both the t640 and the t740 models) - if you do buy one, make sure that the correct stand has been supplied, since it's not compatible with the t630 stands. After I got the stand installed I decided to take a few photos to illustrate comparative sizing between the t740, the t730 and my 2011 Mac Mini (Decent machine but likely due for replacement soon).

Here's the t740 with the mounting done on the bottom:



And here it is stacked on top of the t730 (its big brother that will eventually be replaced)...



(Note that it's about the same size as my Microserver G7 and almost a perfect thematic match)

And here it is on the vertical stand.



Yeah, the 60% size difference versus the Mac Mini is fairly evident.

A further question is...can you reuse the fiber card from the t730? In one word? No. More specifically...not as it stands, but maybe. If you are willing to mod the chassis. At the very least you'll need an M.2 Type E extension cable to clear some obstructions - namely the FFC ribbon cable for the USB ports on the right, and one of the mounting posts for the fan/blower assembly on the left. You'll need to run it at a 90 degree angle via an extender (preferably not a ribbon one). Theoretically possible, but quite a challenge to pull off.



Of course, you'll probably still need to modify the blanker plate on the rear since, unlike the PCIe slot, there is no molded breakout panel for the card.
 
Last edited:
  • Like
Reactions: Samir

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
Alright, so here'e the meat and potatoes of today's adventure - how would the Solarflare Flareon (SFN7122F) work with Proxmox and ESXi given its penchant to crash machines?

Well, here's a comparison between the SFN5122F, the Flareon, and the Mellanox ConnectX-3 VPI card.



Note the PCIe slot extender and a USB powered fan. Yes, we are going to move the Flareon outside the chassis to test it, and make sure the temperature does not shoot through the roof.



Notice that the card is sitting outside, and the fan has been temporarily moved aside. Since we moved the card outside it also makes the SSD unit much easier to swap out.

So, what to do first?
Let's boot up Proxmox/Debian and check VF functionality.

We'll need to ensure that the Flareon firmware is up to date with the sfupdate tool.

Code:
root@Proxmox01:~/sfutils-8.0.1.1005/usr/sbin# ./sfupdate
Solarstorm firmware update utility [v8.0.1]
Copyright Solarflare Communications 2006-2018, Level 5 Networks 2002-2005

enp1s0f0np0 - MAC: 00-0f-53-xx-xx-xx
    Firmware version:   v6.2.6
    Controller type:    Solarflare SFC9100 family
    Controller version: v6.2.5.1000
    Boot ROM version:   v5.0.2.1000

This utility contains more recent Boot ROM firmware [v5.2.2.1004]
   - run "sfupdate --write" to perform an update
This utility contains more recent controller firmware [v6.2.7.1001]
   - run "sfupdate --write" to perform an update

enp1s0f1np1 - MAC: 00-0F-53-xx-xx-xx
    Firmware cannot be accessed via this interface,
    please use enp1s0f0np0.
root@Proxmox01:~/sfutils-8.0.1.1005/usr/sbin# ./sfupdate --write
Solarstorm firmware update utility [v8.0.1]
Copyright Solarflare Communications 2006-2018, Level 5 Networks 2002-2005
enp1s0f0np0: updating controller firmware from 6.2.5.1000 to 6.2.7.1001
enp1s0f0np0: writing controller firmware
[100%] Complete                                                            
[100%] Complete                                                            
enp1s0f0np0: updating Boot ROM from 5.0.2.1000 to 5.2.2.1004
enp1s0f0np0: writing Boot ROM
[100%] Complete                                                            
enp1s0f0np0: writing version information
enp1s0f1np1: writing version information                                    
[100%] Complete                                                            
enp1s0f1np1: Firmware is not accessible, please use enp1s0f0np0.
root@Proxmox01:~/sfutils-8.0.1.1005/usr/sbin# ./sfupdate
Solarstorm firmware update utility [v8.0.1]
Copyright Solarflare Communications 2006-2018, Level 5 Networks 2002-2005

enp1s0f0np0 - MAC: 00-0F-53-xx-xx-xx
    Firmware version:   v7.6.9
    Controller type:    Solarflare SFC9100 family
    Controller version: v6.2.7.1001
    Boot ROM version:   v5.2.2.1004

The Boot ROM firmware is up to date
The controller firmware is up to date

enp1s0f1np1 - MAC: 00-0F-53-xx-xx-xx
    Firmware cannot be accessed via this interface,
    please use enp1s0f0np0.
Okay, that needed a reboot. Once we are done with the reboot, we'll need to use the sfboot utility to setup the hardware.

Code:
root@Proxmox01:~/sfutils-8.0.1.1005/usr/sbin# ./sfboot
Solarflare boot configuration utility [v8.0.1]
Copyright Solarflare Communications 2006-2018, Level 5 Networks 2002-2005

enp1s0f0np0:
  Boot image                            Disabled
  Physical Functions on this port       1
  PF MSI-X interrupt limit              32
  Virtual Functions on each PF          0
  VF MSI-X interrupt limit              8
  Port mode                             Default
  Firmware variant                      Auto
  Insecure filters                      Default
  MAC spoofing                          Default
  Change MAC                            Default
  VLAN tags                             None
  Switch mode                           Default
  RX descriptor cache size              32
  TX descriptor cache size              16
  Total number of VIs                   2048
  Event merge timeout                   8740 nanoseconds

enp1s0f1np1:
  Boot image                            Disabled
  Physical Functions on this port       1
  PF MSI-X interrupt limit              32
  Virtual Functions on each PF          0
  VF MSI-X interrupt limit              8
  Port mode                             Default
  Firmware variant                      Auto
  Insecure filters                      Default
  MAC spoofing                          Default
  Change MAC                            Default
  VLAN tags                             None
  Switch mode                           Default
  RX descriptor cache size              32
  TX descriptor cache size              16
  Total number of VIs                   2048
  Event merge timeout                   8740 nanoseconds
Note that SRIOV is disabled and no VFs are assigned. How do we use this utility?

Code:
Usage: sfboot [-i adapter] [options] [configurable parameters]

Available options are:
  -i, --adapter=(ethn|XX:XX:XX:XX:XX:XX|XX-XX-XX-XX-XX-XX)
    Operate only on the specified adapter or comma-separated list of adapters.
  -l, --list
    Show the adapter list and exit.
  -c, --clear
    Reset all adapter configuration settings to their default.
    Note that 'Global' options are not reset when adapters are specified.
  -r, --repair
    Restore configuration settings to firmware defaults, must be used with clear option. 7xxx series only.
  -f, --factory-reset
    Reset configuration settings to factory defaults. Available on X2 adapters and later. When used with -i please specify the interface corresponding to the first port on each adapter. Firmware versions will not be reset, so sfupdate must be run immediately afterwards.
  -V, --version
    Display the version and exit.
  -s, --silent
    Silent mode, output errors only.
  -v, --verbose
    Verbose mode.
  -y, --yes
    Update without prompting.
  -h, --help
    Print this help message and exit.

  Configurable Parameters:

  boot-image=all|optionrom|uefi|disabled
    Specifies which firmware image the adapter will provide to the host.
    (Effective from next reboot.)
    - Global option : applies to all ports on the NIC

  link-speed=auto|10g|1g|100m
    Specifies the Network Link Speed of the Adapter.

  linkup-delay=<delay time in seconds>
    Specifies the value to be used for the Link Delay

  banner-delay=<delay time in seconds>
    Specifies the dwell time after the sign-on banner is shown.

  bootskip-delay=<delay time in seconds>
    Specifies the boot skip delay time.

  boot-type=pxe|disabled
    Selects the type of boot. (Effective from next reboot.)

  pf-count=<pf count>
    This is the number of available PCIe PFs on this physical network port.
    Note that MAC address assignments may change after altering this setting.

  msix-limit=8|16|32|64|128|256|512|1024
    Specifies the maximum number of MSI-X interrupts each PF may use.

  sriov=enabled|disabled
    Enable SR-IOV support for operating systems that support it.

  vf-count=<vf count>
    This is the number of Virtual Functions advertised to the operating system
    for each Physical Function on this physical network port.
    The Solarstorm SFC9000 device has a total limit of 1024 interrupts.
    The Solarstorm SFC9100 device has a total limit of 2048 interrupts.
    The Solarstorm SFC9200 device has a total limit of 2048 interrupts.
    Depending on the value of msix-limit and vf-msix-limit, some of these
    Virtual Functions may not be useable.

  vf-msix-limit=1|2|4|8|16|32|64|128|256
    Specifies the maximum number of MSI-X interrupts each VF may use.

  port-mode=<port-mode-description>
    Configures the port mode to use.
    This is for 7xxx, 8xxx and 2xxx series adapters only.
    Note that MAC address assignments may change after altering this setting.
    The valid values for this setting depend on the series and model of the NIC.
    For 7xxx series adapters the port-mode may be one of:
      7x22: default, [1x10g], [1x10G][1x10G]
      7x24: default, [1x10G][1x10G], [4x10G]
      7x42: default, [1x10G][1x10G], [4x10G], [1x40G][1x40G]
    For 8xxx series adapters the port-mode may be one of:
      8x22: default, [1x10G][1x10G]
      8x41: default, [1x40g]
      8x42: default, [4x10G], [2x10G][2x10G], [1x40G][1x40G]
    For 2xxx series adapters the port-mode may be one of:
      X2522: default, [1x10/25G][1x10/25G]
      X2541: default, [4x10/25G], [2x50G], [1x100G]
      X2542: default, [4x10/25G], [2x10/25G][2x10/25G], [2x50G], [1x50G][1x50G], [1x100G]
    - Global option : applies to all ports on the NIC

  firmware-variant=full-feature|ultra-low-latency|capture-packed-stream|rules-engine|dpdk|auto
    Specifies the firmware variant to use.
    This is for SFC9100 and SFC9200 family adapters only.
    Note that full-feature firmware is necessary to use SR-IOV.
    The default is auto.
    - Global option : applies to all ports on the NIC

  insecure-filters=default|enabled|disabled
    Grant or revoke a privilege to bypass on filter security for
    non-privileged functions on this port.
    This is for SFC9100 and SFC9200 family adapters only.
    This is required for some applications, but reduces security in
    virtualized environments.
    - Global option : applies to all ports on the NIC

  mac-spoofing=default|enabled|disabled
    Grant or revoke the privilege to transmit packets with other source
    MAC addresses for non-privileged functions on this port.
    This is for SFC9100 and SFC9200 family adapters only.
    - Global option : applies to all ports on the NIC

  change-mac=default|enabled|disabled
    Grant or revoke the privilege to change the unicast
    MAC address for non-privileged functions on this port.
    This is for SFC9100 and SFC9200 family adapters only.
    - Global option : applies to all ports on the NIC

  pf-vlans=<tag>[,<tag>[,...]]|None
    Specifies a VLAN tag for each PF on the port in a comma-separated
    list. Each VLAN tag must be unique for the port and be in the
    range [0..4094]. Specifying pf-vlans=None will clear all VLAN tags
    on the port. Pf-vlans should be included after pf-count on the
    sfboot command line.
    If the number of PFs is changed then the VLAN tags will be cleared.

  switch-mode=default|sriov|partitioning|partitioning-with-sriov|pfiov
    Specifies the mode of operation that a port will be used in.
    The values for pf-count, vf-count and pf-vlans settings should be
     appropriate for the switch-mode.

  rx-dc-size=8|16|32|64
    Specifies the size of the descriptor cache for each receive queue.
    - Global option : applies to all ports on the NIC

  tx-dc-size=8|16|32|64
    Specifies the size of the descriptor cache for each transmit queue.
    - Global option : applies to all ports on the NIC

  vi-count=<vi count>
    Sets the total number of VIs that will be available on the NIC.
    - Global option : applies to all ports on the NIC

  event-merge-timeout=<timeout in nanoseconds>
    Specifies the timeout in nanoseconds for RX event merging.
    A timeout of 0 means that event merging is disabled.
    - Global option : applies to all ports on the NIC
Hm. Looks like we'll need to set the correct parameters...

Code:
root@Proxmox01:~/sfutils-8.0.1.1005/usr/sbin# ./sfboot sriov=enabled switch-mode=sriov pf-count=1 vf-count=6
Solarflare boot configuration utility [v8.0.1]
Copyright Solarflare Communications 2006-2018, Level 5 Networks 2002-2005

enp1s0f0np0:
  Boot image                            Disabled
  Physical Functions on this port       1
  PF MSI-X interrupt limit              32
  Virtual Functions on each PF          6
  VF MSI-X interrupt limit              8
  Port mode                             Default
  Firmware variant                      Full feature / virtualization
  Insecure filters                      Default
  MAC spoofing                          Default
  Change MAC                            Default
  VLAN tags                             None
  Switch mode                           SR-IOV
  RX descriptor cache size              32
  TX descriptor cache size              16
  Total number of VIs                   2048
  Event merge timeout                   8740 nanoseconds

Alert: A cold reboot is required as a result of this configuration change!

enp1s0f1np1:
  Boot image                            Disabled
  Physical Functions on this port       1
  PF MSI-X interrupt limit              32
  Virtual Functions on each PF          6
  VF MSI-X interrupt limit              8
  Port mode                             Default
  Firmware variant                      Full feature / virtualization
  Insecure filters                      Default
  MAC spoofing                          Default
  Change MAC                            Default
  VLAN tags                             None
  Switch mode                           SR-IOV
  RX descriptor cache size              32
  TX descriptor cache size              16
  Total number of VIs                   2048
  Event merge timeout                   8740 nanoseconds

Alert: A cold reboot is required as a result of this configuration change!
Okay, let's do a reboot and see what happens...

Code:
root@Proxmox01:~# lspci -v | grep 01:00
01:00.0 Ethernet controller: Solarflare Communications SFC9120 (rev 01)
01:00.1 Ethernet controller: Solarflare Communications SFC9120 (rev 01)
Huh. The 2 ports are initialized but no VFs showed up. What the...
Oh, looks like we'll need to put in some parameters for the VFs to be allocated -

Code:
echo 3 >  /sys/class/net/enp1s0f0np0/device/sriov_numvfs

(dmesg)
  216.914636] pci 0000:01:00.2: [1924:1903] type 00 class 0x020000
[  216.915174] pci 0000:01:00.2: Adding to iommu group 12
[  216.915217] pci 0000:01:00.2: Using iommu direct mapping
[  216.915507] sfc 0000:01:00.2 (unnamed net_device) (uninitialized): Solarflare NIC detected
[  216.915512] sfc 0000:01:00.2: enabling device (0100 -> 0102)
[  216.915840] sfc 0000:01:00.2 (unnamed net_device) (uninitialized): no PTP support
[  216.918998] pci 0000:01:00.3: [1924:1903] type 00 class 0x020000
[  216.919301] pci 0000:01:00.3: Adding to iommu group 13
[  216.919354] pci 0000:01:00.3: Using iommu direct mapping
[  216.919571] sfc 0000:01:00.3 (unnamed net_device) (uninitialized): Solarflare NIC detected
[  216.919580] sfc 0000:01:00.3: enabling device (0100 -> 0102)
[  216.919902] sfc 0000:01:00.3 (unnamed net_device) (uninitialized): no PTP support
[  216.922036] sfc 0000:01:00.2 enp1s0f0np0v0: renamed from eth0
[  216.949538] pci 0000:01:00.4: [1924:1903] type 00 class 0x020000
[  216.950010] pci 0000:01:00.4: Adding to iommu group 14
[  216.950048] pci 0000:01:00.4: Using iommu direct mapping
[  216.950352] sfc 0000:01:00.4 (unnamed net_device) (uninitialized): Solarflare NIC detected
[  216.953903] sfc 0000:01:00.4: enabling device (0100 -> 0102)
[  216.954113] sfc 0000:01:00.3 enp1s0f0np0v1: renamed from eth0
[  216.954282] sfc 0000:01:00.4 (unnamed net_device) (uninitialized): no PTP support
[  217.001557] sfc 0000:01:00.4 enp1s0f0np0v2: renamed from eth0
Oh good. It showed up. How about re-allocation to a bigger number?

Code:
root@Proxmox01:~/sfutils-8.0.1.1005/usr/sbin# echo 6 >  /sys/class/net/enp1s0f0np0/device/sriov_numvfs
-bash: echo: write error: Device or resource busy

(dmesg)
[  231.574198] sfc 0000:01:00.0: 3 VFs already enabled. Disable before enabling 6 VFs
[  260.111015] pci 0000:01:01.0: [1924:1903] type 7f class 0xffffff
[  260.111930] pci 0000:01:01.0: unknown header type 7f, ignoring device
[  261.136124] sfc 0000:01:00.1 enp1s0f1np1: Failed to enable SRIOV VFs
[  268.250893] pci 0000:01:01.0: [1924:1903] type 7f class 0xffffff
[  268.251795] pci 0000:01:01.0: unknown header type 7f, ignoring device
[  269.263699] sfc 0000:01:00.1 enp1s0f1np1: Failed to enable SRIOV VFs

root@Proxmox01:~/sfutils-8.0.1.1005/usr/sbin# echo 0 >  /sys/class/net/enp1s0f1np1/device/sriov_numvfs
(dmesg)
[  269.263699] sfc 0000:01:00.1 enp1s0f1np1: Failed to enable SRIOV VFs
[  309.378901] pci 0000:01:00.2: Removing from iommu group 12
[  309.468328] pci 0000:01:00.3: Removing from iommu group 13
[  309.560393] pci 0000:01:00.4: Removing from iommu group 14
Okay, the VFs disappear cleanly. Now what about re-allocate to 6 and then initialize 3 VMs where each is allocated 2 VFs?

Code:
root@Proxmox01:~/sfutils-8.0.1.1005/usr/sbin# echo 6 >  /sys/class/net/enp1s0f0np0/device/sriov_numvfs

(dmesg)
[  320.932061] pci 0000:01:00.2: [1924:1903] type 00 class 0x020000
[  320.932564] pci 0000:01:00.2: Adding to iommu group 12
[  320.932615] pci 0000:01:00.2: Using iommu direct mapping
[  320.932984] sfc 0000:01:00.2 (unnamed net_device) (uninitialized): Solarflare NIC detected
[  320.932991] sfc 0000:01:00.2: enabling device (0100 -> 0102)
[  320.933346] sfc 0000:01:00.2 (unnamed net_device) (uninitialized): no PTP support
[  320.939096] pci 0000:01:00.3: [1924:1903] type 00 class 0x020000
[  320.939424] pci 0000:01:00.3: Adding to iommu group 13
[  320.939456] pci 0000:01:00.3: Using iommu direct mapping
[  320.939643] sfc 0000:01:00.3 (unnamed net_device) (uninitialized): Solarflare NIC detected
[  320.939647] sfc 0000:01:00.3: enabling device (0100 -> 0102)
[  320.939980] sfc 0000:01:00.3 (unnamed net_device) (uninitialized): no PTP support
[  320.942143] sfc 0000:01:00.2 enp1s0f0np0v0: renamed from eth0
[  320.973768] pci 0000:01:00.4: [1924:1903] type 00 class 0x020000
[  320.974098] pci 0000:01:00.4: Adding to iommu group 14
[  320.974136] pci 0000:01:00.4: Using iommu direct mapping
[  320.974396] sfc 0000:01:00.4 (unnamed net_device) (uninitialized): Solarflare NIC detected
[  320.974404] sfc 0000:01:00.4: enabling device (0100 -> 0102)
[  320.974808] sfc 0000:01:00.4 (unnamed net_device) (uninitialized): no PTP support
[  320.975427] sfc 0000:01:00.3 enp1s0f0np0v1: renamed from eth0
[  321.003400] pci 0000:01:00.5: [1924:1903] type 00 class 0x020000
[  321.003820] pci 0000:01:00.5: Adding to iommu group 15
[  321.003855] pci 0000:01:00.5: Using iommu direct mapping
[  321.004182] sfc 0000:01:00.5 (unnamed net_device) (uninitialized): Solarflare NIC detected
[  321.007890] sfc 0000:01:00.5: enabling device (0100 -> 0102)
[  321.011034] sfc 0000:01:00.5 (unnamed net_device) (uninitialized): no PTP support
[  321.023509] sfc 0000:01:00.4 enp1s0f0np0v2: renamed from eth0
[  321.061825] pci 0000:01:00.6: [1924:1903] type 00 class 0x020000
[  321.062198] pci 0000:01:00.6: Adding to iommu group 16
[  321.062248] pci 0000:01:00.6: Using iommu direct mapping
[  321.062469] sfc 0000:01:00.6 (unnamed net_device) (uninitialized): Solarflare NIC detected
[  321.062474] sfc 0000:01:00.6: enabling device (0100 -> 0102)
[  321.067593] sfc 0000:01:00.6 (unnamed net_device) (uninitialized): no PTP support
[  321.067659] sfc 0000:01:00.5 enp1s0f0np0v3: renamed from eth0
[  321.295574] pci 0000:01:00.7: [1924:1903] type 00 class 0x020000
[  321.296049] pci 0000:01:00.7: Adding to iommu group 17
[  321.296086] pci 0000:01:00.7: Using iommu direct mapping
[  321.296730] sfc 0000:01:00.7 (unnamed net_device) (uninitialized): Solarflare NIC detected
[  321.296737] sfc 0000:01:00.7: enabling device (0100 -> 0102)
[  321.297083] sfc 0000:01:00.7 (unnamed net_device) (uninitialized): no PTP support
[  321.297302] sfc 0000:01:00.6 enp1s0f0np0v4: renamed from eth0
[  321.348361] sfc 0000:01:00.7 enp1s0f0np0v5: renamed from eth0
Well, allocate, reallocate and unallocate worked just fine in Linux. What about ESXi? Oh be still my headache...
 
  • Like
Reactions: Samir

WANg

Well-Known Member
Jun 10, 2018
1,080
679
113
43
New York, NY
Okay, ESXi.

Note that there are no native drivers for the Solarflares below the SFN8000 series. If you want support, you'll need to install the deprecated VMKLinux based drivers.

Code:
[root@dash:~] esxcli network nic list
Name    PCI Device    Driver  Admin Status  Link Status  Speed  Duplex  MAC Address         MTU  Description                                   
------  ------------  ------  ------------  -----------  -----  ------  -----------------  ----  -----------------------------------------------------------------------------------------
vmnic0  0000:02:00.0  r8168   Up            Up            1000  Full    7c:d3:0a:xx:xx:xx  1500  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller

[root@dash:~] esxcli software vib install -d /tmp/sfc-4.10.10.1001-offline_bundle-8917756.zip
Installation Result
   Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
   Reboot Required: true
   VIBs Installed: Solarflare_bootbank_net-sfc_4.10.10.1001-1OEM.550.0.0.1331820
   VIBs Removed:
   VIBs Skipped:
[root@dash:~] reboot
Okay, after the reboot let'e see what happens.

Code:
[root@dash:~] esxcli network nic list
Name    PCI Device    Driver  Admin Status  Link Status  Speed  Duplex  MAC Address         MTU  Description                                                                       
------  ------------  ------  ------------  -----------  -----  ------  -----------------  ----  -----------------------------------------------------------------------------------------
vmnic0  0000:01:00.0  sfc     Up            Up           10000  Full    00:0f:53:xx:xx:xx  1500  Solarflare SFC9120                                                                 
vmnic1  0000:02:00.0  r8168   Up            Up            1000  Full    7c:d3:0a:xx:xx:xx  1500  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
vmnic2  0000:01:00.1  sfc     Up            Up           10000  Full    00:0f:53:xx:xx:xx  1500  Solarflare SFC9120
Oh good, it worked, and I didn't have to blacklist the native drivers like the Mellanox.

Alright, it looks like the NICs show up in the GUI. Let's request the SRIOVs be turned on, allocate 2 VFs, and reboot...
Let's see what happens:

Code:
[root@dash:~] grep 2020-05-03 /var/log/vmkwarning.log | grep 0000:01

2020-05-03T00:01:23.359Z cpu1:65982)WARNING: PCI: 1470: Enable VFs failed for device @0000:01:00.0, Please make sure the system has proper BIOS installed and enabled for SRIOV.
2020-05-03T00:01:23.359Z cpu1:65982)WARNING: LinPCI: vmklnx_enable_vfs:1375: unable to enable SR-IOV on PCI device 0000:01:00.0
2020-05-03T00:01:23.365Z cpu1:65982)WARNING: PCI: 1470: Enable VFs failed for device @0000:01:00.1, Please make sure the system has proper BIOS installed and enabled for SRIOV.
2020-05-03T00:01:23.365Z cpu1:65982)WARNING: LinPCI: vmklnx_enable_vfs:1375: unable to enable SR-IOV on PCI device 0000:01:00.1
Well, that looks like a big fat no for ESXi once again. Alright. Last shot. How about an Intel i350-T4? That's next...
 
Last edited:
  • Like
Reactions: Samir

csp-guy

Active Member
Jun 26, 2019
365
121
43
Hungary, Budapest
I have tested Intel X710-T4 10GBE ethernet card.

It is extremely hot after few minutes of usage, it is not useable without active cooling solution.

I checked intel x540T2, it is not as hot as X710, seems better.
 
  • Like
Reactions: Samir