Can't get SFN7002F vf working under Proxmox VE 6.2

RobstarUSA

Active Member
Sep 15, 2016
160
56
28
45
I've done some reading & have tried to get this to work. I've used "alien" under proxmox to convert solarflare dkms rpm to deb & the driver built successfully. I've also used alien to convert the amd64 utils & installed that deb as well.

Here is some info on my setup: Cisco UCS C220 M3, 2xE5-2643, 64G ram

root@vm3:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.3.18-2-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt

root@vm3:~# sfboot
Solarflare boot configuration utility [v8.1.2]
Copyright 2002-2020 Xilinx, Inc.

enp3s0f0np0:
Boot image Option ROM and UEFI
Link speed Negotiated automatically
Link-up delay time 5 seconds
Banner delay time 2 seconds
Boot skip delay time 5 seconds
Boot type PXE
Physical Functions on this port 1
PF MSI-X interrupt limit 32
Virtual Functions on each PF 3
VF MSI-X interrupt limit 8
Port mode Default
Firmware variant Full feature / virtualization
Insecure filters Default
MAC spoofing Default
Change MAC Default
VLAN tags None
Switch mode SR-IOV
RX descriptor cache size 32
TX descriptor cache size 16
Total number of VIs 2048
Event merge timeout 8740 nanoseconds

enp3s0f1np1:
Boot image Option ROM and UEFI
Link speed Negotiated automatically
Link-up delay time 5 seconds
Banner delay time 2 seconds
Boot skip delay time 5 seconds
Boot type PXE
Physical Functions on this port 1
PF MSI-X interrupt limit 32
Virtual Functions on each PF 3
VF MSI-X interrupt limit 8
Port mode Default
Firmware variant Full feature / virtualization
Insecure filters Default
MAC spoofing Default
Change MAC Default
VLAN tags None
Switch mode SR-IOV
RX descriptor cache size 32
TX descriptor cache size 16
Total number of VIs 2048
Event merge timeout 8740 nanoseconds

I have an onboard igb nic that DOES work:

root@vm3:~# lspci | grep -i I350
01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
01:10.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:10.1 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:10.4 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:10.5 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:11.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:11.1 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:11.4 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:11.5 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:12.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:12.1 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:12.4 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:12.5 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:13.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
01:13.1 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)

Anyone know what else I need to do to get vfs to show up so I can assign them to VM guests?

SFC Specific nic info:

root@vm3:~# sfupdate
Solarflare firmware update utility [v8.1.2]
Copyright 2002-2020 Xilinx, Inc.
Loading firmware images from /usr/share/sfutils/sfupdate_images

enp3s0f0np0 - MAC: XX-XX-XX-XX-XX-X0
Firmware version: v7.8.4
Controller type: Solarflare SFC9100 family
Controller version: v6.2.7.1001
Boot ROM version: v5.2.2.1006

The Boot ROM firmware is up to date
The controller firmware is up to date

enp3s0f1np1 - MAC: XX-XX-XX-XX-XX-X1
Firmware version: v7.8.4
Controller type: Solarflare SFC9100 family
Controller version: v6.2.7.1001
Boot ROM version: v5.2.2.1006

The Boot ROM firmware is up to date
The controller firmware is up to date

(mac edited)

What I'd like to be able to do is assign vfs to guest where guest than can use that like a physical nic & add it's own vlans as necessary.

Am I going about this wrong? I don't have a ton of experience with solarflare but I've read rave reviews here & they nics seem to be decently featured & cheap.

Thanks in advance!
 
Last edited:
  • Like
Reactions: crackelf

RobstarUSA

Active Member
Sep 15, 2016
160
56
28
45
Also, here is what shows up when the module installs: [ 522.208041] Solarflare NET driver v4.15.6.1004
[ 522.208168] sfc 0000:03:00.0 (unnamed net_device) (uninitialized): Solarflare NIC detected: device 1924:0903 subsys 1924:800a
[ 522.209167] sfc 0000:03:00.0 (unnamed net_device) (uninitialized): no PTP support
[ 522.214918] sfc 0000:03:00.1 (unnamed net_device) (uninitialized): Solarflare NIC detected: device 1924:0903 subsys 1924:800a
[ 522.215162] sfc 0000:03:00.0 enp3s0f0np0: renamed from eth0
[ 522.215874] sfc 0000:03:00.1 (unnamed net_device) (uninitialized): This function initialised before the primary function (PCI function 0).
[ 522.215875] sfc 0000:03:00.1 (unnamed net_device) (uninitialized): PTP may not work if this function is passed through to a VM.
[ 522.215876] sfc 0000:03:00.1 (unnamed net_device) (uninitialized): Manually rebind the PCI functions such that function 0 binds first.
[ 522.215926] sfc 0000:03:00.1 (unnamed net_device) (uninitialized): no PTP support
 

crackelf

New Member
Apr 11, 2021
5
0
1
Following up here trying to learn more about these cards myself: did you ever find out more about this? I have similar 7000 series cards that report as "SFC9100 Family", and they all work perfectly (with PTP!) except one. I updated the dysfunctional (and severely out of date) card, so it's a few versions newer than the other cards.

Everything I've learned so far has mostly come from this thread from the very helpful @WANg :

My issue so far has been that any sfboot command doesn't report correctly after a reboot into the card's BIOS, but debian sfboot is reporting changes correctly. I'm still hacking my way through this, but am having no luck getting it to connect to a switch as a regular NIC. Oddly, it connects just fine to the other cards if setting IP's manually, but it can't DHCP for whatever reason.

I attached some photos of the difference between each cards' BIOS. Both sfboots report identically:

Code:
  Boot image                            Option ROM only
    Link speed                          Negotiated automatically
    Link-up delay time                  5 seconds
    Banner delay time                   2 seconds
    Boot skip delay time                5 seconds
    Boot type                           Disabled
  Physical Functions on this port       1
  PF MSI-X interrupt limit              32
  Virtual Functions on each PF          0
  VF MSI-X interrupt limit              8
  Port mode                             Default
  Firmware variant                      Auto
  Insecure filters                      Default
  MAC spoofing                          Default
  Change MAC                            Default
  VLAN tags                             None
  Switch mode                           Default
  RX descriptor cache size              32
  TX descriptor cache size              16
  Total number of VIs                   2048
  Event merge timeout                   8740 nanoseconds
Functional BIOS Global settings:
s-l1600.jpg

Dysfunctional BIOS Global Settings:
s-l1602.jpg

Functional PF0 Settings:
s-l1601.jpg

Dysfunctional PF0 Settings:
s-l1603.jpg

Here are a few commands I've tried with sfboot:
--clear, vf-msix-limit=8, vf-count=0, firmware-varient=full-feature, firmware-varient=ultra-low-latency, switch-mode=sriov, vf-count=1, switch-mode=pfiov pf-count=2, insecure-filters=enabled
 

WANg

Well-Known Member
Jun 10, 2018
981
580
93
Following up here trying to learn more about these cards myself: did you ever find out more about this? I have similar 7000 series cards that report as "SFC9100 Family", and they all work perfectly (with PTP!) except one. I updated the dysfunctional (and severely out of date) card, so it's a few versions newer than the other cards.

Everything I've learned so far has mostly come from this thread from the very helpful @WANg :

My issue so far has been that any sfboot command doesn't report correctly after a reboot into the card's BIOS, but debian sfboot is reporting changes correctly. I'm still hacking my way through this, but am having no luck getting it to connect to a switch as a regular NIC. Oddly, it connects just fine to the other cards if setting IP's manually, but it can't DHCP for whatever reason.

I attached some photos of the difference between each cards' BIOS. Both sfboots report identically:

Code:
  Boot image                            Option ROM only
    Link speed                          Negotiated automatically
    Link-up delay time                  5 seconds
    Banner delay time                   2 seconds
    Boot skip delay time                5 seconds
    Boot type                           Disabled
  Physical Functions on this port       1
  PF MSI-X interrupt limit              32
  Virtual Functions on each PF          0
  VF MSI-X interrupt limit              8
  Port mode                             Default
  Firmware variant                      Auto
  Insecure filters                      Default
  MAC spoofing                          Default
  Change MAC                            Default
  VLAN tags                             None
  Switch mode                           Default
  RX descriptor cache size              32
  TX descriptor cache size              16
  Total number of VIs                   2048
  Event merge timeout                   8740 nanoseconds
Functional BIOS Global settings:
View attachment 18283

Dysfunctional BIOS Global Settings:
View attachment 18281

Functional PF0 Settings:
View attachment 18282

Dysfunctional PF0 Settings:
View attachment 18280

Here are a few commands I've tried with sfboot:
--clear, vf-msix-limit=8, vf-count=0, firmware-varient=full-feature, firmware-varient=ultra-low-latency, switch-mode=sriov, vf-count=1, switch-mode=pfiov pf-count=2, insecure-filters=enabled
Hm...Same cards, or different cards? Also, are the cards standard/generic, or are they OEM versions of the cards? Which Debian distribution?
 

crackelf

New Member
Apr 11, 2021
5
0
1
Hm...Same cards, or different cards? Also, are the cards standard/generic, or are they OEM versions of the cards? Which Debian distribution?
The myth, the legend! Thank you for hopping on this thread.

I have a handful of "Solarflare Flareon Ultra SFN7122F" Solarflare branded cards from this eBay link. The seller mentions they "have been tested and firmware has been updated to version 7.4.4".

The photos with version 5.2.0 are how most of them arrived out of the box. The 5.2.2 version card was on version 4.2 (or something in the 4's... I can't remember at this rate), and I manually upgraded it thinking that was the problem.

edit: these very well could have been flashed with something completely different (an OEM firmware etc). Is there any way I could check? Also, would it be possible to somehow export the working flash's memory and flash that onto the dysfunctional card..?

I have a few different boxes running either Debian 10 or 11, but most importantly have two identical boxes on Debian 11 with ARIfwd'ing and SR-IOV enabled that I've been testing these on.

My current theory is that I simply don't understand how these cards work (this is my first foray into fiber), so maybe I'm not writing their memory correctly. I've been reading the manuals from the Xilinx website and have tried their bootable ISO as well as their converted rpm package.

A note: I'm using the default sfc driver that ships with Debian. I haven't yet tried the DKMS driver, and would prefer not to call on DKMS here if I can get it working with the "standard" driver.

Let me know if you want me to include any more information & thanks for any help you may have!
 
Last edited:

WANg

Well-Known Member
Jun 10, 2018
981
580
93
The myth, the legend! Thank you for hopping on this thread.

I have a handful of "Solarflare Flareon Ultra SFN7122F" Solarflare branded cards from this eBay link. The seller mentions they "have been tested and firmware has been updated to version 7.4.4".

The photos with version 5.2.0 are how most of them arrived out of the box. The 5.2.2 version card was on version 4.2 (or something in the 4's... I can't remember at this rate), and I manually upgraded it thinking that was the problem.

edit: these very well could have been flashed with something completely different (an OEM firmware etc). Is there any way I could check? Also, would it be possible to somehow export the working flash's memory and flash that onto the dysfunctional card..?

I have a few different boxes running either Debian 10 or 11, but most importantly have two identical boxes on Debian 11 with ARIfwd'ing and SR-IOV enabled that I've been testing these on.

My current theory is that I simply don't understand how these cards work (this is my first foray into fiber), so maybe I'm not writing their memory correctly. I've been reading the manuals from the Xilinx website and have tried their bootable ISO as well as their converted rpm package.

A note: I'm using the default sfc driver that ships with Debian. I haven't yet tried the DKMS driver, and would prefer not to call on DKMS here if I can get it working with the "standard" driver.

Let me know if you want me to include any more information & thanks for any help you may have!
No worries -

That's fine with not using the dkms drivers - in fact, I use the stock drivers myself. Of course, I don't actively use the SFN7322F at the moment.

Why yes, you can backup the firmware on a good card and then force write it to a suspected bad card - sfupdate has a backup, write and force option. Take a look at table 24 for the command line options. Unless the vendor tells you, knowing whether the card is generic or OEM specific isn't super-useful.

So, okay. The thing that I would focus on is:

a) When you use sfboot on the non-sane Solarflare card, run it with the -c -f option first, so whatever usage-imbued idiocy is blown away. Power cycle that card immediately afterwards (I mean, actual power-down and power-up) and see what it does. Just remember that if it doesn't work, Solarflare has the older firmware out there - sometimes newer doesn't always mean it's better.

b) Just remember that any moves you make with sfboot requires a power cycle. That's why I put sfboot on Debian in an older thin client (HP gt7725), run it, make sure that it comes back up the way I want it, and then put it on whatever equipment I need to run (I also do that with my Mellanox Connect-X3 cards). Having some "scrub" hardware is always a good idea.
 

crackelf

New Member
Apr 11, 2021
5
0
1
Thanks for the direction & link! Thankfully I have two 1:1 test benches, so I've been able to power-cycle till the cows come home. I can't imagine doing this on a server in real time.

Functional card:
Code:
    Firmware version:   v7.4.4
    Controller type:    Solarflare SFC9100 family
    Controller version: v6.2.7.1001
    Boot ROM version:   v5.2.0.1004
Dysfunctional card:
Code:
    Firmware version:   v8.0.1
    Controller type:    Solarflare SFC9100 family
    Controller version: v6.2.7.1001
    Boot ROM version:   v5.2.2.1006
The only differences seem to be the Firmware version & the Boot ROM version. -f (factory reset) unfortunately isn't an option for these 7xxx series, so I messed around with sfupdate --backup a bit, and got it to spit out two files
Code:
BOOTROM_2_4_v5.2.2.1006.dat
Code:
MCPU_3_20_v6.2.7.1001.dat
Force writing an image is now possible, but I only have controller firmware (v6.2.7), not "Firmware" firmware (v7.4.4 or 8.0.1)... I also can't seem to find a way to write the boot ROM to it, but maybe I'm missing something here. I tried with the --image= argument, but it only accepts the MCPU.dat file.

You would think that an officially packaged update would get it to cooperate, but this is clearly above my pay-grade hahaha. I still can't get this one card to connect to a router or switch like the others can, and it still isn't reporting correctly in the BIOS despite sfboot showing changes. The sfutils package is pretty limiting but understandably so. I can't for the life of me imagine how the other cards were set up initially.

Straying from the topic: how has your experience with other cards (like the Mellanox or newer Solarflare) been? I went for these Solarflare cards initially because I like the open-source nature of them, and tend to avoid installing proprietary binary blobs *cough NVIDIA* if possible, but at this rate I'm not sure I could fix these if upgrading or changing settings on the other cards.
 

WANg

Well-Known Member
Jun 10, 2018
981
580
93
Thanks for the direction & link! Thankfully I have two 1:1 test benches, so I've been able to power-cycle till the cows come home. I can't imagine doing this on a server in real time.

Functional card:
Code:
    Firmware version:   v7.4.4
    Controller type:    Solarflare SFC9100 family
    Controller version: v6.2.7.1001
    Boot ROM version:   v5.2.0.1004
Dysfunctional card:
Code:
    Firmware version:   v8.0.1
    Controller type:    Solarflare SFC9100 family
    Controller version: v6.2.7.1001
    Boot ROM version:   v5.2.2.1006
The only differences seem to be the Firmware version & the Boot ROM version. -f (factory reset) unfortunately isn't an option for these 7xxx series, so I messed around with sfupdate --backup a bit, and got it to spit out two files
Code:
BOOTROM_2_4_v5.2.2.1006.dat
Code:
MCPU_3_20_v6.2.7.1001.dat
Force writing an image is now possible, but I only have controller firmware (v6.2.7), not "Firmware" firmware (v7.4.4 or 8.0.1)... I also can't seem to find a way to write the boot ROM to it, but maybe I'm missing something here. I tried with the --image= argument, but it only accepts the MCPU.dat file.

You would think that an officially packaged update would get it to cooperate, but this is clearly above my pay-grade hahaha. I still can't get this one card to connect to a router or switch like the others can, and it still isn't reporting correctly in the BIOS despite sfboot showing changes. The sfutils package is pretty limiting but understandably so. I can't for the life of me imagine how the other cards were set up initially.

Straying from the topic: how has your experience with other cards (like the Mellanox or newer Solarflare) been? I went for these Solarflare cards initially because I like the open-source nature of them, and tend to avoid installing proprietary binary blobs *cough NVIDIA* if possible, but at this rate I'm not sure I could fix these if upgrading or changing settings on the other cards.
ooof, I hear ya. Does sfupdate -c at least clear some of the ridiculousness? At this stage it sounds like the card might be acting up - I would assume that you swapped DAC cables/fibers+SFPs just to be on the safe side?

As for later Solarflare...not really. Solarflare changed their licensing model for the cards, and I never found their post-SFN7xxx cards to be all that appealing. At the previous gig we were actually in the process of switching over to the Chelsio Terminator T4s when the business failed. The Solarflare was considered interesting simply because it can do the entire TCP stack in hardware and you can write code that works with, say, a matching engine for a trade algo and get packets pushed in/out at a trade venue as quickly as possible (that's the OpenOnload drivers instead of the standard Linux drivers). For most people the standard drivers are just fine as-is. As for Mellanox, well, there's a more practical reason to use it...the SFN7xxx series of 10GbE cards run really freaking hot (9w off the heatsink) compared to the ~5w on the SFN5122, and for my usage cases (thin client with not much internal heat transfer for wiring up to the HPe MSG7 running iSCSI), the 40GbE ConnectX3 VPI simply ran cooler (around ~5w), and I was able to get ex-Netapp QSFP 1m DACs for something stupid cheap. It just made more sense. That being said, if I have the money I would've gone ConnectX4 instead since it's much better supported in ESXi 7.
 
Last edited:

crackelf

New Member
Apr 11, 2021
5
0
1
ooof, I hear ya. Does sfupdate -c at least clear some of the ridiculousness? At this stage it sounds like the card might be acting up - I would assume that you swapped DAC cables/fibers+SFPs just to be on the safe side? As for later Solarflare...not really. Solarflare changed their licensing model for the cards, and I never found their post-SFN7xxx cards to be all that appealing. At the previous gig we were actually in the process of switching over to the Chelsio Terminator T4s when it fell.

I had some DACs laying around that worked perfectly with other cards, I switched trancievers with known working cables, and a few other iterations of swapping components, but no dice. Turns out I had to clear everything manually in the BIOS. I had enough staring at that blue BIOS screen, so I ended up returning the misbehaving card to the original seller, and ordered a slew of different ones to try out including more Solarflares of the same model and an Intel X520.

I was actually looking at some Chelsio cards before all of this, but was having a hard time sourcing them from eBay. People around the forums seem to have had bad experiences with the few sellers that are selling for reasonable prices. This is all for a home-lab, so I'm trying not to blow the bank if I can...although I am mostly interested in this as a learning opportunity for seeing what fiber can do.

The Solarflare was considered interesting simply because it can do the entire TCP stack in hardware and you can write code that works with, say, a matching engine for a trade algo and get packets pushed in/out at a trade venue as quickly as possible (that's the OpenOnload drivers instead of the standard Linux drivers). For most people the standard drivers are just fine as-is.
Funny enough: I'm a finance escapee who got a little too excited by the server room at my old branch, so I'm actually really interested in the quant aspect of this whole fiber equation. OpenOnload looked pretty exciting to learn when I was first looking into these, but I haven't had the chance to dive deeper into these cards since this one hasn't been chatting properly with my Brocade. Are there any resources you can recommend for getting started on OpenOnload?

As for Mellanox, well, there's a more practical reason to use it...the SFN7xxx series of 10GbE cards run really freaking hot (9w off the heatsink) compared to the ~5w on the SFN5122, and for my usage cases (thin client with not much internal heat transfer for wiring up to the HPe MSG7 running iSCSI), the 40GbE ConnectX3 VPI simply ran cooler (around ~5w), and I was able to get ex-Netapp QSFP 1m DACs for something stupid cheap. It just made more sense. That being said, if I have the money I would've gone ConnectX4 instead since it's much better supported in ESXi 7.
Hahah you don't say... I actually bought a couple usb compatible Noctua fans to slap on these bad boys because they were physically too hot to the touch. I've been eyeballing the ConnectX3's but decided against them since my single NVME pool didn't have a partner to saturate a 40GbE link. Also, these older Brocades are too sweet of a deal. If you have any semi-affordable 40GbE switches you can recommend I could definitely be talked into upgrading!
 

crackelf

New Member
Apr 11, 2021
5
0
1
Just an update: the new cards I ordered all work & behave correctly! It's a miracle. The cards are all happy with KVM/QEMU/libvirt on both debian 10 and 11 (kernels 4.19 & 5.10), and iperf reports >9.xGbE.