SFP+ cards with ASPM support?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

RzTen1

New Member
Nov 13, 2022
4
5
3
I've got two ConnectX-3 cards in two different machines with ASPM enabled and they seem to be working fine. One is a MCX312B dual port 'Pro' card and the other is a MCX311A single port. Both have ASPM L0s on. Linux is booted with 'pcie_aspm=force pcie_port_pm=force' and while one machine sets it up correctly the other does not and has to be overridden with the enable-aspm.sh script.

The machine that doesn't require tweaking is an ASRock X470 Taichi that required enabling ASPM through the 'secret' UEFI override menu program. (See https://www.chiphell.com/thread-2289027-1-1.html) The other box is an ASRock X470 Gaming K4 and even though this one has the ASPM options in the BIOS menu they don't actually seem to work and it requires the enable-aspm.sh script.

For what it's worth if the root port the card is connected to doesn't have L1 enabled (even though I'm enabling L0s on the card and not L1 and the root port doesn't support L0s) I do get "AER: Multiple Corrected error received" events every couple of seconds.

The machine that doesn't set ASPM correctly has a startup script that looks like this:
#ASPM overrides
/usr/sbin/enable-aspm.sh 00:01.2 2 #root port
/usr/sbin/enable-aspm.sh 02:00.1 2 #sata controller
/usr/sbin/enable-aspm.sh 02:00.2 2 #pci-e bridge
/usr/sbin/enable-aspm.sh 03:01.0 3 #pci-e bridge
/usr/sbin/enable-aspm.sh 03:07.0 3 #pci-e bridge
/usr/sbin/enable-aspm.sh 05:00.0 1 #1g ethernet
/usr/sbin/enable-aspm.sh 0a:00.0 3 #1g ethernet
/usr/sbin/enable-aspm.sh 01:00.0 1 #10g ethernet
/usr/sbin/enable-aspm.sh 0b:00.0 2 #sata controller

I think it's working but the equipment I have attached right now isn't fine grained enough to detect a power change less than 10W so I can't really say if it's saving any energy or not.

I've tweaked the enable-aspm.sh script a little from the default version linked above as I was having issues with it correctly reading the PCI configuration space on some cards and it was trying to set root ports to the same values as their downstream counterparts even if they didn't support the setting in question. If anyone wants the tweaked version it's here:
#!/bin/bash
# Copyright (c) 2010-2013 Luis R. Rodriguez <mcgrof@do-not-panic.com>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.


# ASPM Tuning script
#
# This script lets you enable ASPM on your devices in case your BIOS
# does not have it enabled for some reason. If your BIOS does not have
# it enabled it is usually for a good reason so you should only use this if
# you know what you are doing. Typically you would only need to enable
# ASPM manually when doing development and using a card that typically
# is not present on a laptop, or using the cardbus slot. The BIOS typically
# disables ASPM for foreign cards and on the cardbus slot. Check also
# if you may need to do other things than what is below on your vendor
# documentation.
#
# To use this script You will need for now to at least query your device
# PCI endpoint and root complex addresses using the convention output by
# lspci: [<bus>]:[<slot>].[<func>]
#
# For example:
#
# 03:00.0 Network controller: Atheros Communications Inc. AR9300 Wireless LAN adaptor (rev 01
# 00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
#
# The root complex for the endpoint can be found using lspci -PP
#
# For more details refer to:
#
# en:users:documentation:aspm [Linux Wireless]

# You just need to modify these three values:

ENDPOINT=$1

# We'll only enable the last 2 bits by using a mask
# of :3 to setpci, this will ensure we keep the existing
# values on the byte.
#
# Hex Binary Meaning
# -------------------------
# 0 0b00 L0 only
# 1 0b01 L0s only
# 2 0b10 L1 only
# 3 0b11 L1 and L0s
ASPM_SETTING=$2

function aspm_setting_to_string()
{
case $1 in
0)
echo -e "${BLUE}L0 only${NORMAL}, ${RED}ASPM disabled${NORMAL}"
;;
1)
echo -e "${GREEN}L0s only${NORMAL}"
;;
2)
echo -e "${GREEN}L1 only${NORMAL}"
;;
3)
echo -e "${GREEN}L1 and L0s${NORMAL}"
;;
*)
echo -e "${RED}Invalid${NORMAL}"
;;
esac
}


###################################################################
# Do not edit below here unless you are sending me a patch
###################################################################
#
# TODO: patches are welcomed to me until we submit to to
# PCI Utilities upstream.
#
# This can be improved by in this order:
#
# * Accept arguments for endpoint and root complex address, and
# desired ASPM settings
# * Look for your ASPM capabilities by quering your
# LnkCap register first. Use these values to let you
# select whether you want to enable only L1 or L1 & L0s
# * Searching for your root complex for you
# * Search for your PCI device by using the driver
# * Disable your driver and ask to reboot ?
# * Rewrite in C
# * Write ncurses interface [ wishlist ]
# * Write GTK/QT interface [ wishlist ]
# * Submit upstream as aspm.c to the PCI Utilities, which are
# maintained by Martin Mares <mj@ucw.cz>

# Pretty colors
GREEN="\033[01;32m"
YELLOW="\033[01;33m"
NORMAL="\033[00m"
BLUE="\033[34m"
RED="\033[31m"
PURPLE="\033[35m"
CYAN="\033[36m"
UNDERLINE="\033[02m"

# we can surely read the spec to get a better value
MAX_SEARCH=20
SEARCH_COUNT=1
ASPM_BYTE_ADDRESS="INVALID"

if [[ $# -ne 2 ]]; then
echo "Usage: ./enable-aspm.sh ENDPOINT ASPM_SETTING"
exit 1
fi

ENDPOINT_PRESENT=$(lspci -s $ENDPOINT)

if [[ $(id -u) != 0 ]]; then
echo "This needs to be run as root"
exit 1
fi

if [[ ${#ENDPOINT_PRESENT} -eq 0 ]]; then
echo "Endpoint $ENDPOINT is not present"
exit
fi

function device_present()
{

PRESENT=$(lspci -s $1)
COMPLAINT="${RED}not present${NORMAL}"

if [[ ${#PRESENT} -eq 0 ]]; then
if [[ $2 != "present" ]]; then
COMPLAINT="${RED}disappeared${NORMAL}"
fi

echo -e "Device ${BLUE}${1}${NORMAL} $COMPLAINT"
return 1
fi
return 0
}

function find_aspm_byte_address()
{
device_present $ENDPOINT present
if [[ $? -ne 0 ]]; then
exit
fi

SEARCH=$(setpci -s $1 34.b)
# We know on the first search $SEARCH will not be
# 10 but this simplifies the implementation.
while [[ $SEARCH != 10 && $SEARCH_COUNT -le $MAX_SEARCH ]]; do
END_SEARCH=$(setpci -s $1 ${SEARCH}.b)

# Convert hex digits to uppercase for bc
SEARCH_UPPER=$(printf "%X" 0x${SEARCH})

if [[ $END_SEARCH = 10 ]]; then
ASPM_BYTE_ADDRESS=$(echo "obase=16; ibase=16; $SEARCH_UPPER + 10" | bc)
break
fi

SEARCH=$(echo "obase=16; ibase=16; $SEARCH_UPPER + 1" | bc)
SEARCH=$(setpci -s $1 ${SEARCH}.b)

let SEARCH_COUNT=$SEARCH_COUNT+1
done

if [[ $SEARCH_COUNT -ge $MAX_SEARCH ]]; then
echo -e "Long loop while looking for ASPM word for $1"
return 1
fi
return 0
}

function enable_aspm_byte()
{
device_present $1 present
if [[ $? -ne 0 ]]; then
exit
fi

find_aspm_byte_address $1
if [[ $? -ne 0 ]]; then
return 1
fi

ASPM_BYTE_HEX=$(setpci -s $1 ${ASPM_BYTE_ADDRESS}.b)
ASPM_BYTE_HEX=$(printf "%X" 0x${ASPM_BYTE_HEX})
# setpci doesn't support a mask on the query yet, only on the set,
# so to verify a setting on a mask we have no other optoin but
# to do do this stuff ourselves.
DESIRED_ASPM_BYTE_HEX=$(printf "%X" $(( (0x${ASPM_BYTE_HEX} & ~0x7) |0x${ASPM_SETTING})))

if [[ $ASPM_BYTE_ADDRESS = "INVALID" ]]; then
echo -e "No ASPM byte could be found for $(lspci -s $1)"
return
fi

echo -e "$(lspci -s $1)"
echo -en "\t${YELLOW}0x${ASPM_BYTE_ADDRESS}${NORMAL} : ${CYAN}0x${ASPM_BYTE_HEX}${GREEN} --> ${BLUE}0x${DESIRED_ASPM_BYTE_HEX}${NORMAL} ... "

device_present $1 present
if [[ $? -ne 0 ]]; then
exit
fi

# Avoid setting if already set
if [[ $ASPM_BYTE_HEX = $DESIRED_ASPM_BYTE_HEX ]]; then
echo -n -e "[${GREEN}SUCESS${NORMAL}] (${GREEN}already set${NORMAL}) "
aspm_setting_to_string $ASPM_SETTING
return 0
fi

# This only writes the last 3 bits
setpci -s $1 ${ASPM_BYTE_ADDRESS}.b=${ASPM_SETTING}:3

sleep 0.1

ACTUAL_ASPM_BYTE_HEX=$(setpci -s $1 ${ASPM_BYTE_ADDRESS}.b)
ACTUAL_ASPM_BYTE_HEX=$(printf "%X" 0x${ACTUAL_ASPM_BYTE_HEX})

# Do not retry this if it failed, if it failed to set.
# Likey if it failed its a good reason and you should look
# into that.
if [[ $ACTUAL_ASPM_BYTE_HEX != $DESIRED_ASPM_BYTE_HEX ]]; then
echo -e "[${RED}FAIL${NORMAL}] (0x${ACTUAL_ASPM_BYTE_HEX})"
return 1
fi

echo -n -e "[${GREEN}SUCCESS]${NORMAL} "
aspm_setting_to_string $ASPM_SETTING

return 0
}

device_present $ENDPOINT not_sure
if [[ $? -ne 0 ]]; then
exit
fi

echo -e -n "${CYAN}Device${NORMAL}: "
enable_aspm_byte $ENDPOINT
 
  • Like
Reactions: klui and [Nobody]

barrenechea

New Member
Dec 1, 2022
4
1
1
Well, I'm installing a ConnectX-5 (MCX512A-ACAT), and I have had no luck with ASPM.

As soon as I install it on the mobo, my package C-states go from C8 to C2, and power usage goes from 16W to 37W with the ConnectX-5 card idling and nothing connected to it (measuring whole PC consumption with Sonoff POWR2). After unplugging the card, power usage at idle goes back to 16W.

After playing with the ´enable-aspm´ script (thank you, @RzTen1 !) I enabled it, and even though I see no weird kernel messages, power consumption has not changed.

root@HomeServer:~# lspci -vv | awk '/ASPM/{print $0}' RS= | grep --color -P '(^[a-z0-9:.]+|ASPM )'
00:01.0 PCI bridge: Intel Corporation Device a70d (rev 01) (prog-if 00 [Normal decode])
LnkCap: Port #2, Speed 32GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <4us, L1 <16us
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk+ // 00:01.0 gets ASPM disabled with the card plugged in, enforced ASPM
00:1a.0 PCI bridge: Intel Corporation Device 7ac8 (rev 11) (prog-if 00 [Normal decode])
LnkCap: Port #25, Speed 16GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk-
00:1c.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 (rev 11) (prog-if 00 [Normal decode])
LnkCap: Port #1, Speed 8GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk-
00:1c.2 PCI bridge: Intel Corporation Device 7aba (rev 11) (prog-if 00 [Normal decode])
LnkCap: Port #3, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L1 <64us
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
01:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
04:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-V (rev 03)
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L1, Exit Latency L1 <4us
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
I'm hesitant to keep it plugged in, considering it's more than doubling the previous power consumption, even with the ASPM hacks seemingly stable and turned on.
 
  • Like
Reactions: sshanee

RzTen1

New Member
Nov 13, 2022
4
5
3
It looks like maybe the ConnectX-5 might not support ASPM? It should still say if L1/L0s are supported even when they're disabled and it looks like yours is reporting 'ASPM not supported' which won't work even if you force it since it isn't there.

Have you tried playing with the port power settings? I think it's AUTO_POWER_SAVE_LINK_DOWN_P1 and P2 in mlxconfig. I believe they default to disabled. That won't help if you're using both ports, but if you're only using one it might make a difference.

It looks like the CX-5 should also support the attribute ADVANCED_POWER_SETTINGS. I can't tell what additional options that might turn on since I don't have a 5 card to play with, but it could be why ASPM isn't showing up. Also take a look at ADVANCED_PCI_SETTINGS and PCI_BUS0_RESTRICT as there are suboptions in there that should impact ASPM, mainly PCI_BUS0_RESTRICT_ASPM: "When FALSE, PCI bus will not have ASPM enabled. Valid when PCI_BUS_RESTRICT is TRUE."
 
Last edited:
  • Like
Reactions: nautilus7

barrenechea

New Member
Dec 1, 2022
4
1
1
Thank you for your input! I've already enabled AUTO_POWER_SAVE_LINK_DOWN_P1, AUTO_POWER_SAVE_LINK_DOWN_P2, ADVANCED_PCI_SETTINGS and ADVANCED_POWER_SETTINGS and no luck. There are no settings starting with PCI_BUS0_ on CX-5 (or at least mine, the ACAT version)

Here is the list of available settings I have, if you're curious. Nothing about ASPM.

# mstconfig q

Device #1:
----------

Device type: ConnectX5
Name: MCX512A-ACA_Ax_Bx
Description: ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; tall bracket; ROHS R6
Device: /sys/bus/pci/devices/0000:01:00.0/config

Configurations: Next Boot
MEMIC_BAR_SIZE 0
MEMIC_SIZE_LIMIT _256KB(1)
HOST_CHAINING_MODE DISABLED(0)
HOST_CHAINING_CACHE_DISABLE False(0)
HOST_CHAINING_DESCRIPTORS Array[0..7]
HOST_CHAINING_TOTAL_BUFFER_SIZE Array[0..7]
FLEX_PARSER_PROFILE_ENABLE 0
FLEX_IPV4_OVER_VXLAN_PORT 0
ROCE_NEXT_PROTOCOL 254
ESWITCH_HAIRPIN_DESCRIPTORS Array[0..7]
ESWITCH_HAIRPIN_TOT_BUFFER_SIZE Array[0..7]
PF_BAR2_SIZE 0
PF_NUM_OF_VF_VALID False(0)
NON_PREFETCHABLE_PF_BAR False(0)
VF_VPD_ENABLE False(0)
PF_NUM_PF_MSIX_VALID False(0)
PER_PF_NUM_SF False(0)
STRICT_VF_MSIX_NUM False(0)
VF_NODNIC_ENABLE False(0)
NUM_PF_MSIX_VALID True(1)
NUM_OF_VFS 8
NUM_OF_PF 2
PF_BAR2_ENABLE False(0)
SRIOV_EN True(1)
PF_LOG_BAR_SIZE 5
VF_LOG_BAR_SIZE 0
NUM_PF_MSIX 63
NUM_VF_MSIX 11
INT_LOG_MAX_PAYLOAD_SIZE AUTOMATIC(0)
PCIE_CREDIT_TOKEN_TIMEOUT 0
MAX_ACC_OUT_READ 0
ACCURATE_TX_SCHEDULER False(0)
PARTIAL_RESET_EN False(0)
SW_RECOVERY_ON_ERRORS False(0)
RESET_WITH_HOST_ON_ERRORS False(0)
DISABLE_SLOT_POWER_LIMITER False(0)
ADVANCED_POWER_SETTINGS True(1)
CQE_COMPRESSION BALANCED(0)
IP_OVER_VXLAN_EN False(0)
MKEY_BY_NAME False(0)
ESWITCH_IPV4_TTL_MODIFY_ENABLE False(0)
PRIO_TAG_REQUIRED_EN False(0)
UCTX_EN True(1)
PCI_ATOMIC_MODE PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0)
TUNNEL_ECN_COPY_DISABLE False(0)
LRO_LOG_TIMEOUT0 6
LRO_LOG_TIMEOUT1 7
LRO_LOG_TIMEOUT2 8
LRO_LOG_TIMEOUT3 13
LOG_TX_PSN_WINDOW 7
LOG_MAX_OUTSTANDING_WQE 7
ROCE_ADAPTIVE_ROUTING_EN False(0)
TUNNEL_IP_PROTO_ENTROPY_DISABLE False(0)
ICM_CACHE_MODE DEVICE_DEFAULT(0)
TX_SCHEDULER_BURST 0
ZERO_TOUCH_TUNING_ENABLE True(1)
LOG_MAX_QUEUE 17
LOG_DCR_HASH_TABLE_SIZE 11
MAX_PACKET_LIFETIME 0
DCR_LIFO_SIZE 16384
ROCE_CC_PRIO_MASK_P1 255
ROCE_CC_PRIO_MASK_P2 255
CLAMP_TGT_RATE_AFTER_TIME_INC_P1 True(1)
CLAMP_TGT_RATE_P1 False(0)
RPG_TIME_RESET_P1 300
RPG_BYTE_RESET_P1 32767
RPG_THRESHOLD_P1 1
RPG_MAX_RATE_P1 0
RPG_AI_RATE_P1 5
RPG_HAI_RATE_P1 50
RPG_GD_P1 11
RPG_MIN_DEC_FAC_P1 50
RPG_MIN_RATE_P1 1
RATE_TO_SET_ON_FIRST_CNP_P1 0
DCE_TCP_G_P1 1019
DCE_TCP_RTT_P1 1
RATE_REDUCE_MONITOR_PERIOD_P1 4
INITIAL_ALPHA_VALUE_P1 1023
MIN_TIME_BETWEEN_CNPS_P1 4
CNP_802P_PRIO_P1 6
CNP_DSCP_P1 48
CLAMP_TGT_RATE_AFTER_TIME_INC_P2 True(1)
CLAMP_TGT_RATE_P2 False(0)
RPG_TIME_RESET_P2 300
RPG_BYTE_RESET_P2 32767
RPG_THRESHOLD_P2 1
RPG_MAX_RATE_P2 0
RPG_AI_RATE_P2 5
RPG_HAI_RATE_P2 50
RPG_GD_P2 11
RPG_MIN_DEC_FAC_P2 50
RPG_MIN_RATE_P2 1
RATE_TO_SET_ON_FIRST_CNP_P2 0
DCE_TCP_G_P2 1019
DCE_TCP_RTT_P2 1
RATE_REDUCE_MONITOR_PERIOD_P2 4
INITIAL_ALPHA_VALUE_P2 1023
MIN_TIME_BETWEEN_CNPS_P2 4
CNP_802P_PRIO_P2 6
CNP_DSCP_P2 48
LLDP_NB_DCBX_P1 False(0)
LLDP_NB_RX_MODE_P1 OFF(0)
LLDP_NB_TX_MODE_P1 OFF(0)
LLDP_NB_DCBX_P2 False(0)
LLDP_NB_RX_MODE_P2 OFF(0)
LLDP_NB_TX_MODE_P2 OFF(0)
DCBX_IEEE_P1 True(1)
DCBX_CEE_P1 True(1)
DCBX_WILLING_P1 True(1)
DCBX_IEEE_P2 True(1)
DCBX_CEE_P2 True(1)
DCBX_WILLING_P2 True(1)
KEEP_ETH_LINK_UP_P1 True(1)
KEEP_IB_LINK_UP_P1 False(0)
KEEP_LINK_UP_ON_BOOT_P1 False(0)
KEEP_LINK_UP_ON_STANDBY_P1 False(0)
DO_NOT_CLEAR_PORT_STATS_P1 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P1 True(1)
KEEP_ETH_LINK_UP_P2 True(1)
KEEP_IB_LINK_UP_P2 False(0)
KEEP_LINK_UP_ON_BOOT_P2 False(0)
KEEP_LINK_UP_ON_STANDBY_P2 False(0)
DO_NOT_CLEAR_PORT_STATS_P2 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P2 True(1)
NUM_OF_VL_P1 _4_VLs(3)
NUM_OF_TC_P1 _8_TCs(0)
NUM_OF_PFC_P1 8
VL15_BUFFER_SIZE_P1 0
QOS_TRUST_STATE_P1 TRUST_PCP(1)
NUM_OF_VL_P2 _4_VLs(3)
NUM_OF_TC_P2 _8_TCs(0)
NUM_OF_PFC_P2 8
VL15_BUFFER_SIZE_P2 0
QOS_TRUST_STATE_P2 TRUST_PCP(1)
DUP_MAC_ACTION_P1 LAST_CFG(0)
MPFS_MC_LOOPBACK_DISABLE_P1 False(0)
MPFS_UC_LOOPBACK_DISABLE_P1 False(0)
UNKNOWN_UPLINK_MAC_FLOOD_P1 False(0)
SRIOV_IB_ROUTING_MODE_P1 LID(1)
IB_ROUTING_MODE_P1 LID(1)
DUP_MAC_ACTION_P2 LAST_CFG(0)
MPFS_MC_LOOPBACK_DISABLE_P2 False(0)
MPFS_UC_LOOPBACK_DISABLE_P2 False(0)
UNKNOWN_UPLINK_MAC_FLOOD_P2 False(0)
SRIOV_IB_ROUTING_MODE_P2 LID(1)
IB_ROUTING_MODE_P2 LID(1)
PHY_AUTO_NEG_P1 DEVICE_DEFAULT(0)
PHY_RATE_MASK_OVERRIDE_P1 False(0)
PHY_FEC_OVERRIDE_P1 DEVICE_DEFAULT(0)
PHY_AUTO_NEG_P2 DEVICE_DEFAULT(0)
PHY_RATE_MASK_OVERRIDE_P2 False(0)
PHY_FEC_OVERRIDE_P2 DEVICE_DEFAULT(0)
PF_TOTAL_SF 0
PF_SF_BAR_SIZE 0
PF_NUM_PF_MSIX 63
ROCE_CONTROL ROCE_ENABLE(2)
PCI_WR_ORDERING per_mkey(0)
MULTI_PORT_VHCA_EN False(0)
PORT_OWNER True(1)
ALLOW_RD_COUNTERS True(1)
RENEG_ON_CHANGE True(1)
TRACER_ENABLE False(0)
IP_VER IPv4(0)
BOOT_UNDI_NETWORK_WAIT 0
UEFI_HII_EN True(1)
BOOT_DBG_LOG False(0)
UEFI_LOGS DISABLED(0)
BOOT_VLAN 1
LEGACY_BOOT_PROTOCOL PXE(1)
BOOT_RETRY_CNT NONE(0)
BOOT_INTERRUPT_DIS False(0)
BOOT_LACP_DIS True(1)
BOOT_VLAN_EN False(0)
BOOT_PKEY 0
P2P_ORDERING_MODE DEVICE_DEFAULT(0)
ATS_ENABLED True(1)
DYNAMIC_VF_MSIX_TABLE False(0)
FORCE_ETH_PCI_SUBCLASS False(0)
ADVANCED_PCI_SETTINGS True(1)
SAFE_MODE_THRESHOLD 10
SAFE_MODE_ENABLE True(1)
ATS_ENABLED=1 and TRACER_ENABLE=0 are the opposite of their default values, just me playing around. Firmware on its latest available version.
 

RzTen1

New Member
Nov 13, 2022
4
5
3
Yeah, it looks like all of the PCI bus options are missing. That's weird as I wouldn't expect them to remove features from newer cards but I guess they've pulled support for it for some reason. The only other thing I can think of is maybe you need a certain driver or firmware level for the option to appear. You could open a ticket with NVidia to say for sure as that card should still be under active support.
 
  • Like
Reactions: barrenechea

barrenechea

New Member
Dec 1, 2022
4
1
1
May be unrelated to the topic, but if someone's interested: it seems that you can cross-flash the MCX512A-ACAT card with MCX512A-ADAT firmware.

Sadly, it seems to not fix the missing ASPM support...
... But now I have a ConnectX-5 Ex, Speed 16GT/s (PCIe 4.0) according to lspci :oops:

# mstconfig q

Device #1:
----------

Device type: ConnectX5
Name: MCX512A-ADA_Ax_Bx
Description: ConnectX-5 Ex EN network interface card; 25GbE dual-port SFP28; PCIe3.0/4.0 x8
Device: /sys/bus/pci/devices/0000:01:00.0/config

Configurations: Next Boot
MEMIC_BAR_SIZE 0
MEMIC_SIZE_LIMIT _256KB(1)
HOST_CHAINING_MODE DISABLED(0)
HOST_CHAINING_CACHE_DISABLE False(0)
HOST_CHAINING_DESCRIPTORS Array[0..7]
HOST_CHAINING_TOTAL_BUFFER_SIZE Array[0..7]
FLEX_PARSER_PROFILE_ENABLE 0
FLEX_IPV4_OVER_VXLAN_PORT 0
ROCE_NEXT_PROTOCOL 254
ESWITCH_HAIRPIN_DESCRIPTORS Array[0..7]
ESWITCH_HAIRPIN_TOT_BUFFER_SIZE Array[0..7]
PF_BAR2_SIZE 0
PF_NUM_OF_VF_VALID False(0)
NON_PREFETCHABLE_PF_BAR False(0)
VF_VPD_ENABLE False(0)
PF_NUM_PF_MSIX_VALID False(0)
PER_PF_NUM_SF False(0)
STRICT_VF_MSIX_NUM False(0)
VF_NODNIC_ENABLE False(0)
NUM_PF_MSIX_VALID True(1)
NUM_OF_VFS 8
NUM_OF_PF 2
PF_BAR2_ENABLE False(0)
SRIOV_EN True(1)
PF_LOG_BAR_SIZE 5
VF_LOG_BAR_SIZE 0
NUM_PF_MSIX 63
NUM_VF_MSIX 11
INT_LOG_MAX_PAYLOAD_SIZE AUTOMATIC(0)
PCIE_CREDIT_TOKEN_TIMEOUT 0
MAX_ACC_OUT_READ 0
ACCURATE_TX_SCHEDULER False(0)
PARTIAL_RESET_EN False(0)
SW_RECOVERY_ON_ERRORS False(0)
RESET_WITH_HOST_ON_ERRORS False(0)
DISABLE_SLOT_POWER_LIMITER False(0)
ADVANCED_POWER_SETTINGS True(1)
CQE_COMPRESSION BALANCED(0)
IP_OVER_VXLAN_EN False(0)
MKEY_BY_NAME False(0)
ESWITCH_IPV4_TTL_MODIFY_ENABLE False(0)
PRIO_TAG_REQUIRED_EN False(0)
UCTX_EN True(1)
PCI_ATOMIC_MODE PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0)
TUNNEL_ECN_COPY_DISABLE False(0)
LRO_LOG_TIMEOUT0 6
LRO_LOG_TIMEOUT1 7
LRO_LOG_TIMEOUT2 8
LRO_LOG_TIMEOUT3 13
LOG_TX_PSN_WINDOW 7
LOG_MAX_OUTSTANDING_WQE 7
ROCE_ADAPTIVE_ROUTING_EN False(0)
TUNNEL_IP_PROTO_ENTROPY_DISABLE False(0)
ICM_CACHE_MODE DEVICE_DEFAULT(0)
TX_SCHEDULER_BURST 0
ZERO_TOUCH_TUNING_ENABLE False(0)
LOG_MAX_QUEUE 17
LOG_DCR_HASH_TABLE_SIZE 11
MAX_PACKET_LIFETIME 0
DCR_LIFO_SIZE 16384
ROCE_CC_PRIO_MASK_P1 255
ROCE_CC_PRIO_MASK_P2 255
CLAMP_TGT_RATE_AFTER_TIME_INC_P1 True(1)
CLAMP_TGT_RATE_P1 False(0)
RPG_TIME_RESET_P1 300
RPG_BYTE_RESET_P1 32767
RPG_THRESHOLD_P1 1
RPG_MAX_RATE_P1 0
RPG_AI_RATE_P1 5
RPG_HAI_RATE_P1 50
RPG_GD_P1 11
RPG_MIN_DEC_FAC_P1 50
RPG_MIN_RATE_P1 1
RATE_TO_SET_ON_FIRST_CNP_P1 0
DCE_TCP_G_P1 1019
DCE_TCP_RTT_P1 1
RATE_REDUCE_MONITOR_PERIOD_P1 4
INITIAL_ALPHA_VALUE_P1 1023
MIN_TIME_BETWEEN_CNPS_P1 4
CNP_802P_PRIO_P1 6
CNP_DSCP_P1 48
CLAMP_TGT_RATE_AFTER_TIME_INC_P2 True(1)
CLAMP_TGT_RATE_P2 False(0)
RPG_TIME_RESET_P2 300
RPG_BYTE_RESET_P2 32767
RPG_THRESHOLD_P2 1
RPG_MAX_RATE_P2 0
RPG_AI_RATE_P2 5
RPG_HAI_RATE_P2 50
RPG_GD_P2 11
RPG_MIN_DEC_FAC_P2 50
RPG_MIN_RATE_P2 1
RATE_TO_SET_ON_FIRST_CNP_P2 0
DCE_TCP_G_P2 1019
DCE_TCP_RTT_P2 1
RATE_REDUCE_MONITOR_PERIOD_P2 4
INITIAL_ALPHA_VALUE_P2 1023
MIN_TIME_BETWEEN_CNPS_P2 4
CNP_802P_PRIO_P2 6
CNP_DSCP_P2 48
LLDP_NB_DCBX_P1 False(0)
LLDP_NB_RX_MODE_P1 OFF(0)
LLDP_NB_TX_MODE_P1 OFF(0)
LLDP_NB_DCBX_P2 False(0)
LLDP_NB_RX_MODE_P2 OFF(0)
LLDP_NB_TX_MODE_P2 OFF(0)
DCBX_IEEE_P1 True(1)
DCBX_CEE_P1 True(1)
DCBX_WILLING_P1 True(1)
DCBX_IEEE_P2 True(1)
DCBX_CEE_P2 True(1)
DCBX_WILLING_P2 True(1)
KEEP_ETH_LINK_UP_P1 True(1)
KEEP_IB_LINK_UP_P1 False(0)
KEEP_LINK_UP_ON_BOOT_P1 False(0)
KEEP_LINK_UP_ON_STANDBY_P1 False(0)
DO_NOT_CLEAR_PORT_STATS_P1 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P1 True(1)
KEEP_ETH_LINK_UP_P2 True(1)
KEEP_IB_LINK_UP_P2 False(0)
KEEP_LINK_UP_ON_BOOT_P2 False(0)
KEEP_LINK_UP_ON_STANDBY_P2 False(0)
DO_NOT_CLEAR_PORT_STATS_P2 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P2 True(1)
NUM_OF_VL_P1 _4_VLs(3)
NUM_OF_TC_P1 _8_TCs(0)
NUM_OF_PFC_P1 8
VL15_BUFFER_SIZE_P1 0
QOS_TRUST_STATE_P1 TRUST_PCP(1)
NUM_OF_VL_P2 _4_VLs(3)
NUM_OF_TC_P2 _8_TCs(0)
NUM_OF_PFC_P2 8
VL15_BUFFER_SIZE_P2 0
QOS_TRUST_STATE_P2 TRUST_PCP(1)
DUP_MAC_ACTION_P1 LAST_CFG(0)
MPFS_MC_LOOPBACK_DISABLE_P1 False(0)
MPFS_UC_LOOPBACK_DISABLE_P1 False(0)
UNKNOWN_UPLINK_MAC_FLOOD_P1 False(0)
SRIOV_IB_ROUTING_MODE_P1 LID(1)
IB_ROUTING_MODE_P1 LID(1)
DUP_MAC_ACTION_P2 LAST_CFG(0)
MPFS_MC_LOOPBACK_DISABLE_P2 False(0)
MPFS_UC_LOOPBACK_DISABLE_P2 False(0)
UNKNOWN_UPLINK_MAC_FLOOD_P2 False(0)
SRIOV_IB_ROUTING_MODE_P2 LID(1)
IB_ROUTING_MODE_P2 LID(1)
PHY_AUTO_NEG_P1 DEVICE_DEFAULT(0)
PHY_RATE_MASK_OVERRIDE_P1 False(0)
PHY_FEC_OVERRIDE_P1 DEVICE_DEFAULT(0)
PHY_AUTO_NEG_P2 DEVICE_DEFAULT(0)
PHY_RATE_MASK_OVERRIDE_P2 False(0)
PHY_FEC_OVERRIDE_P2 DEVICE_DEFAULT(0)
PF_TOTAL_SF 0
PF_SF_BAR_SIZE 0
PF_NUM_PF_MSIX 63
ROCE_CONTROL ROCE_ENABLE(2)
PCI_WR_ORDERING per_mkey(0)
MULTI_PORT_VHCA_EN False(0)
PORT_OWNER True(1)
ALLOW_RD_COUNTERS True(1)
RENEG_ON_CHANGE True(1)
TRACER_ENABLE True(1)
IP_VER IPv4(0)
BOOT_UNDI_NETWORK_WAIT 0
UEFI_HII_EN True(1)
BOOT_DBG_LOG False(0)
UEFI_LOGS DISABLED(0)
BOOT_VLAN 1
LEGACY_BOOT_PROTOCOL PXE(1)
BOOT_RETRY_CNT NONE(0)
BOOT_INTERRUPT_DIS False(0)
BOOT_LACP_DIS True(1)
BOOT_VLAN_EN False(0)
BOOT_PKEY 0
P2P_ORDERING_MODE DEVICE_DEFAULT(0)
ATS_ENABLED True(1)
DYNAMIC_VF_MSIX_TABLE False(0)
EXP_ROM_UEFI_x86_ENABLE True(1)
EXP_ROM_PXE_ENABLE False(0)
FORCE_ETH_PCI_SUBCLASS False(0)
ADVANCED_PCI_SETTINGS True(1)
SAFE_MODE_THRESHOLD 10
SAFE_MODE_ENABLE True(1)
 
Last edited:

barrenechea

New Member
Dec 1, 2022
4
1
1
I've got a reply from an employee over Nvidia forums regarding CX5:

ASPM as mentioned in lspci output, is not supported by the HCA. However, you may check if the feature can be enabled in BIOS/system drivers.

In order to check system related details, it would be great if you can reach out to your server vendor.
I guess ASPM is a no-go on newer Mellanox cards --- I haven't been able to find ASPM-supported CX6 either.
 

h0schi

Member
Oct 24, 2020
45
24
8
Germany
Quick Feedback:
The X710 supports ASPM.
i changed my Mellanox ConnectX-3 card with the Intel X710-DA2 in my Sophos XG-Firewall.

Here are the outputs of LSPCI:


Mellanox Connect-X 3 (CX312A)

LnkCap: Port #8, Speed unknown, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
Intel X710-DA2

LnkCap: Port #0, Speed unknown, Width x8, ASPM unknown, Latency L0 <2us, L1 <16us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
 

Gio

Member
Apr 8, 2017
71
11
8
36
Quick Feedback:
The X710 supports ASPM.
i changed my Mellanox ConnectX-3 card with the Intel X710-DA2 in my Sophos XG-Firewall.

Here are the outputs of LSPCI:


Mellanox Connect-X 3 (CX312A)

LnkCap: Port #8, Speed unknown, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
Intel X710-DA2

LnkCap: Port #0, Speed unknown, Width x8, ASPM unknown, Latency L0 <2us, L1 <16us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
Out of curiosity did you measure / compare watt usage?
 

h0schi

Member
Oct 24, 2020
45
24
8
Germany
Out of curiosity did you measure / compare watt usage?
The problem is not the power consumption of the cards.
Both cards consume about 5w.

The problem is, that without ASPM-support the CPU can not reach deeper sleep-states.
 

Gio

Member
Apr 8, 2017
71
11
8
36
The problem is not the power consumption of the cards.
Both cards consume about 5w.

The problem is, that without ASPM-support the CPU can not reach deeper sleep-states.
What is the lowest C state that the X710 is giving you? I saw some posts on unraid forums about C7.
 

Survivor7171

New Member
Jun 1, 2023
22
3
3
@Gio what does the script look like to set LinkCtl to Enable L1 ??

I have set pcie_aspm=force and enabled ASPM in BIOS

Power control is set to auto for the pcie device

Code:
root@host:~# cat /sys/bus/pci/devices/0000\:01\:00.0/power/control
auto

But nic still not uses ASPM...

Code:
root@host:~# lspci -vv | awk '/ASPM/{print $0}' RS= | grep --color -P '(^[a-z0-9:.]+|ASPM )'
01:00.0 Ethernet controller: Aquantia Corp. AQC100 10G Ethernet MAC controller [AQtion] (rev 02)
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s unlimited, L1 unlimited
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
Note that nvme drives etc. are using ASPM so it works in general.

edit: I also upgraded firmware, still not working.
 
Last edited:

custom90gt

Active Member
Nov 17, 2016
225
95
28
39
Curious if there are any 40GB cards with ASPM support. I had no idea my CX3 is keeping my system from going into deeper sleep states.
 

FucaBala

New Member
Sep 17, 2023
6
5
3
Well,

I bought a pair of Connectx-4 model MCX4121-ACAT (dual 25gb). I have exactly the same issue but I think I'm making progress.

My system is a 12600K on Z690 Aorus Elite, 64GB and 4 NVMEs.

Booting Ubuntu, the system users roughly 35Watts (measured from the wall), with the lowest Package state as 2.
When I install powertop and run powertop --auto-tune, it drops to 14W and Package state goes to 8 (Extremelly efficient considering I have 5 fans on the case + 2 on the CPU cooler).

When I install the mellanox card, the system boots at 45Watts.
Running powertop as I did before, does absolutelly NOTHING. Same 45watts and same Package state 2 however I can see ASPM available on the card on powertop configurations.

I updated the cards firmware and then was reading the documentation, when I found that there are A LOT of configurations which you can make on the card:
PCI_BUS0_RESTRICT_ASPM False(0)
KEEP_ETH_LINK_UP_P1 True(1)
KEEP_IB_LINK_UP_P1 False(0)
KEEP_LINK_UP_ON_BOOT_P1 False(0)
KEEP_LINK_UP_ON_STANDBY_P1 False(0)
DO_NOT_CLEAR_PORT_STATS_P1 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P1 False(0)
KEEP_ETH_LINK_UP_P2 True(1)
KEEP_IB_LINK_UP_P2 False(0)
KEEP_LINK_UP_ON_BOOT_P2 False(0)
KEEP_LINK_UP_ON_STANDBY_P2 False(0)
DO_NOT_CLEAR_PORT_STATS_P2 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P2 False(0)
ADVANCED_PCI_SETTINGS False(0)

In case you need, the command to check the configuration is mlxconfig query you need to install mellanox firmware utility software (I did this on a Windows 11 machine and works perfectly. Tool is called MFT, link is below:

I will go deep into the documentation to learn a bit more, but as far as I got, the card came from the vendor with all power saving stuff deactived.

Another thing I found is that my card is a Huawey, so the firmware is specific. Will try to find a way to flash a generic firmware on it and see how it goes.

Anyway, I haven't solved the problem, but I hope it helps with something.

Cheers.
 
  • Like
Reactions: custom90gt

FucaBala

New Member
Sep 17, 2023
6
5
3
HWInfo 7.30 shows it's available, but not enabled. This is with Link State Power Management set to Moderate, the default setting with the Ryzen Balanced profile. The ConnectX-4 is plugged into the MB's 2nd M.2 Key M slot, via a riser (mITX motherboard). I don't know if that affects things.

EDIT: the GPU states L0s and L1 supported, but also disabled. I actually cannot find a single PCIe device that states Enabled for ASPM. Looking into what this means, since it seems a bit wrong. I'll note my GPU also states it's running at 2.5GT/s, which is PCIe 1 speeds. If I load it up, it reports 8.0GT/s, but I need to restart HWINFO to see that change. GPUz notes it immediately. Off topic at this point. I had previously thought this link rate changing was part of ASPM, but apparently not? Or is this reporting Disabled, as in "it's not being used at the usec HWINFO was polling for info?"

xView attachment 24868
Mate I have a similar situation.
I believe there is a parameter we need to set to the card so ASPM can get actived.

Running mlxconfig -d <INSERT YOUR DEVICE ID HERE> i
Will show all the configurations accepted by the device, look what I found:

PCI_BUS0_RESTRICT_ASPM=<False|True> When FALSE, PCI bus will not have ASPM enabled.
Valid when PCI_BUS_RESTRICT is TRUE.

I got 2 cards and both came with ASPM deactivated from the factory.

Too late now to test,but will do during the weekend.
 

FucaBala

New Member
Sep 17, 2023
6
5
3
So another update. I put a new machine
HP Pro Desk and installed Ubuntu fresh and was checking all possible configurations. The ones I will try:
PCI_BUS0_RESTRICT_ASPM =<False|True> When FALSE, PCI bus will not have ASPM enabled. Valid when PCI_BUS_RESTRICT is TRUE.
For my card theBUS goes from 0 to 7

PCI_BUS00_ASPM =<False|True> When FALSE, PCI bus 00 will not have ASPM enabled.
For my card, it goes up to bus 27.

Will give a try tomorrow and will post the results from Powertune and power usage from the wall.
 
  • Like
Reactions: custom90gt