X520-DA2 only one port detected

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

WanWizard

New Member
Jun 13, 2021
29
4
3
59
UK
flexcoders.co.uk
Hello All,

My issue isn't strictly home related, but given the expertise here with refurbished hardware, I'd thought I'd give it a shot.

I have a stack of Supermicro's (used by a non-profit outfit), all connected to a 10G IBM Bladecenter stack using DAC cables. All of them use an Intel X520-DA2 card to provide a redundant connection.

This all works fine, but I have three of these cards where only one port is detected by the BIOS. Doesn't matter which machine it is plugged into (X8 or X9 series mainboards), I even tried a tower with an Asus Z68 board at home, to no avail.

To make it clear, this is not a rejected optical issue (most people I ask refer me to that), the problem is that lspci only shows one port, so the OS doesn't even come into play.

Any clue what this could be?
 

CyklonDX

Well-Known Member
Nov 8, 2022
860
283
63
Check the sfp modules. x520 is quite picky, and wants only sfp's with 'IN' marking on them.


(this is only example)
1683439823296.png
Next thing is firmware... you may have to upgrade firmware of the port, or/and sfp.
Thus I would recommend looking up for firmware upgrade for your sfp/nic - they should upgrade them one by one.
 
  • Like
Reactions: Patriot

WanWizard

New Member
Jun 13, 2021
29
4
3
59
UK
flexcoders.co.uk
Like I wrote, it has nothing to do with modules, those are rejected by the driver (unless overridden), but as the BIOS doesn't even detect the second port, the driver doesn't come into play.

Also, I use DAC cables made by FS, no optics, which are absolutely compatible at both ends.

Again, like I wrote, if I swap the NIC for another one, it works fine.

All hardware is supplied by a reputable supplier, not bought from eBay. The cards are genuine Intel with yottamark.

If I search for firmware, I only find a reference from Intel that firmware should be embedded in the drivers.
 

CyklonDX

Well-Known Member
Nov 8, 2022
860
283
63
when you were flashing did it display 2 devices? You see the flash tool nvm should display both as separate devices to flash.

(you can disregard device name here... its just example)
1683539522621.png
For all intended purposes if there aren't 2
1683539670617.png

that means they are broke, or you are doing something wrong.
 

WanWizard

New Member
Jun 13, 2021
29
4
3
59
UK
flexcoders.co.uk
No, the second one isn't enumerated by the BIOS, which stops everything.
Code:
Intel(R) Ethernet Flash Firmware Utility
BootUtil version 1.7.03.0
Copyright (C) 2003-2019 Intel Corporation

Port Network Address Location Series  WOL Flash Firmware                Version
==== =============== ======== ======= === ============================= =======
  1   14DAE913A0E3     0:25.0 Gigabit YES FLASH Not Present
  2   001B217EC780     2:00.0 10GbE   N/A UEFI,iSCSI                    -------
What I do find in dmesg:
Code:
[    0.195791] pci 0000:02:00.0: BAR 7: no space for [mem size 0x00100000 64bit]
[    0.195793] pci 0000:02:00.0: BAR 7: failed to assign [mem size 0x00100000 64bit]
[    0.195795] pci 0000:02:00.0: BAR 10: no space for [mem size 0x00100000 64bit]
[    0.195797] pci 0000:02:00.0: BAR 10: failed to assign [mem size 0x00100000 64bit]
According to SuperMicro these messages are normal when Intel-VT is disabled, but it is not.

Does this suggest the second interface is disabled because it can't allocate the memory for it?

For the first interface, the output is:

Code:
[    0.166763] pci 0000:02:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit]
[    0.166764] pci 0000:02:00.0: VF(n) BAR0 space: [mem 0x00000000-0x000fffff 64bit] (contains BAR0 for 64 VFs)
[    0.166791] pci 0000:02:00.0: reg 0x190: [mem 0x00000000-0x00003fff 64bit]
[    0.166793] pci 0000:02:00.0: VF(n) BAR3 space: [mem 0x00000000-0x000fffff 64bit] (contains BAR3 for 64 VFs)
 

CyklonDX

Well-Known Member
Nov 8, 2022
860
283
63
Thats the pci mem bar. The Host register interface attempts to provision a buffer for the the device. The x520-da2 attempts to establishes 2x32bit BAR, its not like its not going to work because of it - try enabling intel dma and such in bios, then enable BAR on the os level to get rid of this err. But its potentially faulty memory, faulty cpu/s, faulty card, faulty mobo - could be pretty much anything.
(and what if you put your module to the 2nd port? Its still dead? if 2nd one appears and 1st one disappear then its likely issue with cpu/faulty card or bad memory)

// You mention that other x520's? If you are getting different results with them - then the ones that do not work are most likely faulty.
 

WanWizard

New Member
Jun 13, 2021
29
4
3
59
UK
flexcoders.co.uk
Every server we run in this rack has an X520-2 (DA2/SR2), with 1.5m DAC cables to the switch stack in the middle of the rack. So we have about 20 in operation at the moment. Some also use an X540-AT2 card, which we use point-to-point for cluster main sync / heartbeat links.

The two X520-2 NIC's currently on my desk came out of two of those servers (the installer used on of the 540 ports to cover for the failing X520 port with an Rj45 tranceiver in the switch). On my last DC visit about two weeks ago I swapped the failing ones by two spares we had in stock. On both spares the second port works fine, without BIOS changes on the servers. The servers run ESXi, so I don't know how easy it is to find firmware versions.

I tried the failing card this morning, both with two 850nm transceivers (without cable) and with a DAC cable in a loop, but that didn't change anything.

I can't rule out an issue with my test machine and it's old Asus P8Z68-V board, but it is all I have here. All I can say is the LSI SAS 9210-8i has no issues in the same slot (I know this doesn't mean much). The board doesn't have many setup options, no mention of DMA.

Just plugged in the other card, but exactly the same output in dmesg:

Code:
[    0.165377] pci 0000:02:00.0: [8086:10fb] type 00 class 0x020000
[    0.165404] pci 0000:02:00.0: reg 0x10: [mem 0xf0000000-0xf007ffff 64bit pref]
[    0.165417] pci 0000:02:00.0: reg 0x18: [io  0xe000-0xe01f]
[    0.165444] pci 0000:02:00.0: reg 0x20: [mem 0xf0080000-0xf0083fff 64bit pref]
[    0.165457] pci 0000:02:00.0: reg 0x30: [mem 0xf7d00000-0xf7d7ffff pref]
[    0.165549] pci 0000:02:00.0: PME# supported from D0 D3hot
[    0.165592] pci 0000:02:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit]
[    0.165594] pci 0000:02:00.0: VF(n) BAR0 space: [mem 0x00000000-0x000fffff 64bit] (contains BAR0 for 64 VFs)
[    0.165620] pci 0000:02:00.0: reg 0x190: [mem 0x00000000-0x00003fff 64bit]
[    0.165621] pci 0000:02:00.0: VF(n) BAR3 space: [mem 0x00000000-0x000fffff 64bit] (contains BAR3 for 64 VFs)
[    0.165932] pci 0000:02:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:1c.0 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[    0.195058] pci 0000:02:00.0: BAR 7: no space for [mem size 0x00100000 64bit]
[    0.195060] pci 0000:02:00.0: BAR 7: failed to assign [mem size 0x00100000 64bit]
[    0.195062] pci 0000:02:00.0: BAR 10: no space for [mem size 0x00100000 64bit]
[    0.195064] pci 0000:02:00.0: BAR 10: failed to assign [mem size 0x00100000 64bit]
[    5.578520] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 XDP Queue count = 0
[    5.578833] ixgbe 0000:02:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:1c.0 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[    5.578929] ixgbe 0000:02:00.0: MAC: 2, PHY: 1, PBA No: E68785-003
[    5.578932] ixgbe 0000:02:00.0: 00:1b:21:7e:cb:1c
[    5.581199] ixgbe 0000:02:00.0: Intel(R) 10 Gigabit Network Connection
[    5.584327] ixgbe 0000:02:00.0 enp2s0: renamed from eth0
[   26.878510] ixgbe 0000:02:00.0: registered PHC device on enp2s0
 

CyklonDX

Well-Known Member
Nov 8, 2022
860
283
63
while bit unconventional i presume those were in heavy use; (and those cards are cheap to replace), you can check with multimeter if there's a short, if even electricity passes through the port. You should be first looking around for any visible damages/bulges etc on cermanic capacitors, and resistors (both sides around the dead port; Then just check each one of them - could be a cap or resistor just failing. (if there's no short, and no visible issue - you can try baking it, hopefully that will temporarily help if core balls wore out)
 

WanWizard

New Member
Jun 13, 2021
29
4
3
59
UK
flexcoders.co.uk
They are indeed in heavy use, this is a CI build system for Yocto Embedded linux, for about 110 different platforms, all servers use the 10G network to access shared build data on SSD clusters.

But afaik on these two cards the second port never worked, hence the workaround with the 540-AT2 card the installer implemented.

"cheap" depends on your point of view I guess, the non-profit association that runs this depend entirely on user donations, the company I work for sponsors with the rackspace and a bit of admin (which is where I come in ;)). Our preferred supplier asks £120 per card.

I think I'll just put these back in stock with a "only port 1 works" label, and ask them to start saving to add two other cards to our stock.

Thanks for your help sofar !
 

CyklonDX

Well-Known Member
Nov 8, 2022
860
283
63
i get them from ebay for 60 usd.
But due to crappy linux driver (i.e. having to recomplile the driver to change some settings), i'm moving to mellanox xconnect-4 to 25Gb models since (they go for 120-140 usd each - 250 usd for 'new').


// under variable heavy use (where temps ramp up, then has period of low util / cool temps) the balls on the core/components often wear out ~ whole story with old consoles/foxcon assembled boards suffering from same issue.
(
-- good video, a lot of other components suffer from same issues - even today.)
 

WanWizard

New Member
Jun 13, 2021
29
4
3
59
UK
flexcoders.co.uk
Hardware is usually easier / cheaper to get stateside, compared to Europe. Mellanox is unavailable here.

I'm also a bit wary about fakes being sold on eBay, and I am particular about OEM's (not always mentioned), Dell cards for example don't work in the SuperMicro's without cutting/isolating pin-9 (server doesn't boot).

On of our other refurb suppliers just come back saying they have plenty of stock coming in from an ISP that is busy upgrading. Lets see what they come up with, at least they'll come with some warranty.