Beware of EMC switches sold as Mellanox SX6XXX on eBay

lambdafunction

New Member
Jun 22, 2021
4
5
3
I've got the modified hwd in place from lab.netservers.ro & this is a boot attempt of 3.6.8012 on a SX6012. It fails to fully boot:
Code:
Mellanox MLNX-OS Switch Management

switch-534580 login: admin
Password:
Last login: Mon Jul 12 22:25:57 on ttyS0

Mellanox Switch

    System is initializing!
This may take a few minutes


    Modules are being configured
% A fatal internal error occurred


Mellanox MLNX-OS Switch Management

switch-534580 login:
I've got the debug log from my last boot. Here is what I think the problem is:
Code:
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.ERR]: srm_dev_path_add(), srm.c:591, build 1: srm_dev_path_add: failed to add path to DP table, error: Device or resource busy
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.ERR]: hwd_ll_add_to_dpt(3839): Failed to add PCI SRM path for NRU:[1]. code:[14000,generic error]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[SX] from:[not-present] to:[error], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[MONITOR_BOARD1] from:[not-present] to:[ready], because:[Module ok]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[MONITOR_BOARD2] from:[not-present] to:[ready], because:[Module ok]
After this, there are tons more errors about Failed to add DPT:
Code:
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.WARNING]: hwd_set_initial_fan_speed: Skipping FRU:[MGMT] since NRU:[SWITCH_BOARD_0] not in DPT
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.WARNING]: hwd_stm_error_flow: Device error flow for FRU:[MGMT] because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[SX] from:[error] to:[fatal], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/fatal] for FRU:[MGMT] NRU:[MGMT/SX]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[MONITOR_BOARD1] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/MONITOR_BOARD1]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[MONITOR_BOARD2] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/MONITOR_BOARD2]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[MONITOR_BOARD3] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/MONITOR_BOARD3]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[MONITOR_BOARD4] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/MONITOR_BOARD4]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[MONITOR_BOARD5] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/MONITOR_BOARD5]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[MONITOR_BOARD6] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/MONITOR_BOARD6]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[MONITOR_BOARD7] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/MONITOR_BOARD7]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[MONITOR_BOARD8] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/MONITOR_BOARD8]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[SPEED_BOARD1] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/SPEED_BOARD1]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[SPEED_BOARD2] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/SPEED_BOARD2]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[SPEED_BOARD3] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/SPEED_BOARD3]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[SPEED_BOARD4] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/SPEED_BOARD4]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[VPD1] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/VPD1]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[VPD2] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/VPD2]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_nru_state: Moving FRU:[MGMT], NRU:[LED_BOARD1] from:[ready] to:[not-present], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/not-present] for FRU:[MGMT] NRU:[MGMT/LED_BOARD1]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: hwd_stm_set_fru_state: Moving FRU:[MGMT] from:[powered-on] to:[fatal], because:[Failed to add to DPT]
Jul 12 19:14:27 switch-534580 hwd[6933]: TID 1208135472: [hwd.NOTICE]: Sending event:[/system/chassis/events/state-change/fatal] for FRU:[MGMT] NRU:[]
Thoughts? Is there another hwd I can use?
 

lambdafunction

New Member
Jun 22, 2021
4
5
3
Okay, many thanks are in order here.
  • @mpogr for the initial guide and inducing me to lighten my wallet a few hundred bucks
  • @nbritton for details on transferring and running the manufacture script
  • @nasbdh9 for the shortened version path
  • @SGS for conversion tips and tricks
Things I learned:
  • Fast path: manufacture.sh -a -m ppc -f /path/to/image then MLNX-OS image fetch/install the shortened version upgrade path
  • The _shell command can be activated in 3.4.x with a special license
  • You don't need to add a new user to keep shell access, just set a password on the existing root user who already has /bin/bash as their shell:
    mddbreq /config/db/initial set modify - /auth/passwd/user/root/password string '$1$.......'
  • Figure 11 from the OCP version of this switch is surprisingly useful for I2C addresses
  • A raspberry pi connected to the main I2C header (the one you have to unplug the jumper from) can see more interesting things by adding the I2C mux:
    echo pca9548 0x70 > /sys/bus/i2c/devices/i2c-1/new_device
  • Vanilla 3.6.8012 running happily means I finally don't have to listen to those fans at 100%
    My wife is pretty happy about quiet fans too :D