Beware of EMC switches sold as Mellanox SX6XXX on eBay

Eong

Member
Dec 24, 2019
35
7
8
this mean your minios have something wrong ,don't make minios yourself, download instead , read carfully this thread someone one already share his minios image ,download it ,reflash it , and boot to minios only take seconds , not mintues or hours.
The mini one? Remember to set the log level via kernel args. It is mentioned in the guide. :)
 

up3up4

Member
Jun 10, 2018
47
11
8
After converted to 3.6.8012 the switch unable change system-profile to eth-single-switch. Anybody have same issue?
 

t2_tony

New Member
Mar 29, 2020
4
0
1
Los Angeles Metro
Thanks "team STH", appreciate the feedback. To clarify:

minios kernel image used : MLNX-bare-jffs2 (hence my discomfort with the NAND errors), mlnxbase

otherwise only deviations from guide v1.2 were grabbing vmlinuz-uni, fdt-uni using image-PPC_M460EX-ppc-m460ex-20190222-075342.tgz from inside image-PPC_M460EX-3.6.8012.img (and skipping mod drops for the chad, hwd and ibd binaries) then using MT_1270111020.bin instead of the MSX6012.bin.

i'll wait for any additional comments; might go round three before going on-call (possible week hiatus).

Thanks again
 

Eong

Member
Dec 24, 2019
35
7
8
After converted to 3.6.8012 the switch unable change system-profile to eth-single-switch. Anybody have same issue?
It seems to be an issue. But you can choose the profile you like when you edit the shell script under etc. Remember you put a number 3 there? Give it a number 4 you will get eth profile.
I will do some further modifications when I have spare time.
 
  • Like
Reactions: klui

up3up4

Member
Jun 10, 2018
47
11
8
It seems to be an issue. But you can choose the profile you like when you edit the shell script under etc. Remember you put a number 3 there? Give it a number 4 you will get eth profile.
I will do some further modifications when I have spare time.
Thank you Eong! Could you explain profile 1,2,3 meaning for us?
 

klui

Active Member
Feb 3, 2019
176
73
28
Thank you Eong! Could you explain profile 1,2,3 meaning for us?
I presume these numbers are associated with Setup > Virtual Switch Mgmt, System Profile. I don't understand why @Eong suggests to use 4 when the drop down menu on 3.6.1002 lists the profiles in this order:
  1. ib
  2. eth-single-switch
  3. vpi-single-switch
Are you using vpi-single-switch? I just kept mine to VPI and I can change ports to use ethernet with no problems.

My question for you @up3up4 is what your ASIC version is after you've upgraded to 3.6.8012? From webUI, System > Modules, Part Info, Asic FW version or from CLI, enable, sh asic. I told flint to flash over what was installed before 9.9.1260 with 9.3.8170 but thought it was better to match the ASIC FW with 1002's distro FW.
 

up3up4

Member
Jun 10, 2018
47
11
8
I presume these numbers are associated with Setup > Virtual Switch Mgmt, System Profile. I don't understand why @Eong suggests to use 4 when the drop down menu on 3.6.1002 lists the profiles in this order:
  1. ib
  2. eth-single-switch
  3. vpi-single-switch
Are you using vpi-single-switch? I just kept mine to VPI and I can change ports to use ethernet with no problems.

My question for you @up3up4 is what your ASIC version is after you've upgraded to 3.6.8012? From webUI, System > Modules, Part Info, Asic FW version or from CLI, enable, sh asic. I told flint to flash over what was installed before 9.9.1260 with 9.3.8170 but thought it was better to match the ASIC FW with 1002's distro FW.
Asic FW version:9.4.5110
I think the 9.9.1260 is come from EMC.
 

klui

Active Member
Feb 3, 2019
176
73
28
Finally took the time to convert mine to 3.6.1002 and here are my notes.

Noise (the SX6012 rests on top of my ICX6610-48P, my phone was in the same place, so it's a little closer than the ICX) with one PSU connected:
  • In U-Boot, noise was hovering at 1.3 Khz @ 62.7 dB
  • First noise drop after booting into MLNX-OS was 1 KHz @ 59.5 dB
  • Final noise drop was 650 Hz @ 48.5 dB
  • /opt/tms/bin/tc fan mod made it 200 Hz/400 Hz @ 40dB (using 27%, recommended for 2 PSUs)
I thought the standard tc settings was bearable but it does get annoying after a while--it is sitting right in front of me. The ICX6610 was very close with noise hovering around 670 Hz @ 50 dB. The tc fan mod made a big difference and is as quiet as an SRX240. Monitoring software will be affected because fan thresholds haven't been modified. EDIT: the previous sentence was a mistake. Monitoring software had heuristics for telemetry ranges that was confused after I applied the patch.

I made a mistake when I first transferred vmlinuz-uni. I thought the file address was 40,000 but it should be 400,000. imls was the first indication something was wrong.
Legacy Image at FF040000:
Verifying Checksum ... Bad Data CRC
Running mlxlinux gave the error, causing a panic in me
ERROR: Did not find a cmdline Flattened Device Tree
Could not find a valid device tree
Rebooting didn't work. But a power cycle and reflashing vmlinuz and the fdt files did.

Jumbo frames during TFTP was quite fast. Around 30 sec. to transfer the distro tgz. This is just a direct connection between an old laptop and the switch.

Even with reboot symlinked to busybox, the first couple of times I couldn't get it to work. No messages are displayed and I had to use reboot -f. Maybe it had something to do with the file system being worked on those first few reboots. It works fine now under both recovery and regular partitions.

The uncorrectable ECC errors are annoying but nothing we could do about that. The other issue that's somewhat annoying is the recovery OS at mtdblock6. With loglevel set to 2, I don't get any kernel output when I boot but that's a minor annoyance. The bigger one is the error messages about tty2, tty3, and tty4
can't open /dev/tty4: No such file or directory
I tried to create the devices per Linux 3.13 (Ubuntu 14.04; MLNX-OS distro is based on 3.10 kernel) but that didn't work, and the regular distro doesn't have tty[234].
can't open /dev/tty4: No such file or directory
#
# ls -al tty4
lrwxrwxrwx 1 0 0 9 May 3 20:04 tty4 -> /dev/null
# rm /dev/tty4; busybox mknod -m 666 /dev/tty4 c 4 4
can't open /dev/tty4: No such device or address
When I create the symlink in the partition, the symlinks are removed the next time I use it. Probably the correct say is to create the symlinks in /etc/init.d.rcS after proc and sysfs are mounted.

I also ran into the login problem and the issue seems to be prematurely removing the symlink to /var/opt/tms/output/passwd before the system replaced the 2nd (locked) field with x after initial setup. I think removing the symlink should be a rename instead because there is no /var/opt... directory in the recovery OS @ mtdblock6. @Eong was right since the system relies on keeping changes in the /var partition and while you can change your passwords, new users can't be added. The system adds the new users after initial configuration:
  • nfsnobody
  • statsd
  • xmladmin
  • xmluser
I couldn't get the Fabric Inspector to work after running the mddbreq commands. I wonder if it had something to do with the lack of symlink and Fabric Inspector requires the creation of additional users.

Space used on root (mtdblock7) is 95%. But there is a lot of space in mtdblock6.

I've noticed some discard packets from mgmt0. I wonder if it's due to Apache listening on IPv6 and I don't use it.
[Mon May 04 15:02:28 2020] [warn] (101)Network is unreachable: connect to listener on [::]:80

/var/opt/tms/output/httpd.conf
Listen [::]:80
Listen 0.0.0.0:80
Listen [::]:443
Listen 0.0.0.0:443
Disabling IPv6 doesn't have any effect on httpd.conf.

I was waiting for the resolution of ethernet working on 8012 before I tried it but looked at /opt/tms/bin/tc and found if you search for
00 01 00 00 00 00 00 04 00 00 00 04 00 00 00 04
00 04 00 01 00 05
in a hex editor, you will find 16 instances. Under 1002, you will find 6. Right before the hex string, you will see there are 4 different "signatures"
10 18 77 D8
10 18 78 74
10 18 78 88
10 18 78 C4
While 1002 had similar signatures, what's more interesting are the bytes that follow the hex string. For 8012, there are basically 2 varieties
28 00 00 0A 28 00 00
3C 00 00 0A 28 00 00
These are quite similar to what 1002's values are which are
00 28 00 0A 00 28 00
00 3C 00 0A 00 28 00
It could mean that if you search for
10 18 77 D8 00 01 00 00 00 00 00 04 00 00 00 04 00 00 00 04
00 04 00 01 00 05 28 00 00 0A 28 00 00
and replace with
10 18 77 D8 00 01 00 00 00 00 00 04 00 00 00 04 00 00 00 04
00 04 00 01 00 05 1B 00 00 0A 1B 00 00
you might get the same behavior with 8012 for 27% PWM with both PSUs and
10 18 77 D8 00 01 00 00 00 00 00 04 00 00 00 04 00 00 00 04
00 04 00 01 00 05 19 00 00 0A 19 00 00
for 25% PWM with one PSU.

A byproduct of changing the files in /opt/tms/bin after initial setup is when you run cli, it will perform a reconfiguration.

I looked at the files modified in /opt/tms/bin and it appears to be changes to PPC code. Couldn't find the same signatures in ibd but isn't that responsible for IB? Maybe it is related to the inability to switch profiles?

This is an interesting exercise because I had to use U-Boot so much. I've learned a lot and must say "think you" to @mpogr for doing the initial PoC and others for streamlining and making it easier to perform. I will spend some time and create a script to build the recovery JFFS2 image and will post when I am done.
 
Last edited:
  • Like
Reactions: metag

Freebsd1976

Active Member
Feb 23, 2018
220
33
28
I was waiting for the resolution of ethernet working on 8012 before I tried it but looked at /opt/tms/bin/tc and found if you search for
00 01 00 00 00 00 00 04 00 00 00 04 00 00 00 04
00 04 00 01 00 05
in a hex editor, you will find 16 instances. Under 1002, you will find 6. Right before the hex string, you will see there are 4 different "signatures"
10 18 77 D8
10 18 78 74
10 18 78 88
10 18 78 C4
While 1002 had similar signatures, what's more interesting are the bytes that follow the hex string. For 8012, there are basically 2 varieties
28 00 00 0A 28 00 00
3C 00 00 0A 28 00 00
These are quite similar to what 1002's values are which are
00 28 00 0A 00 28 00
00 3C 00 0A 00 28 00
It could mean that if you search for
10 18 77 D8 00 01 00 00 00 00 00 04 00 00 00 04 00 00 00 04
00 04 00 01 00 05 28 00 00 0A 28 00 00
and replace with
10 18 77 D8 00 01 00 00 00 00 00 04 00 00 00 04 00 00 00 04
00 04 00 01 00 05 1B 00 00 0A 1B 00 00
you might get the same behavior with 8012 for 27% PWM with both PSUs and
10 18 77 D8 00 01 00 00 00 00 00 04 00 00 00 04 00 00 00 04
00 04 00 01 00 05 19 00 00 0A 19 00 00
for 25% PWM with one PSU.
 
i search 000100000000000400000004000000040004000100052800000A280000000063000000000003000000000000000000000000000000000000 and mod ,it works
 

Eong

Member
Dec 24, 2019
35
7
8
Thank you Eong! Could you explain profile 1,2,3 meaning for us?
I didn't look into it myself. A friend who tested this told me we can use another number instead of 3 in that customxxxx.sh . He suggested 4 will work if you want the eth-single-switch profile. I don't remember the meaning of those numbers, maybe you can read the script and find it out.
Or you can use VPI profile for now and wait for further modification. There are also some other minor issues I need to fix. Sorry I am too busy recently.
 

neggles

is 34 Xeons too many?
Sep 2, 2017
48
15
8
Melbourne, AU
omnom.net
You do not need to replace ibt. I didn’t see any issue for now.
It does have a limitation for the LR4, I can confirm that. It’s complicated. Those compatible ones may work or not. The one from FS doesn’t work. I can not try all of the brands. Too expensive. :(
FRU information from 6036 may help. I am still waiting for mine to arrive.
I'm kind of surprised that nobody's discovered this hidden menu yet, but here's a fix for that;
Code:
configure terminal
fae cable-stamping-unlock 40g_lr4
exit
write memory
Not 100% sure I remember that right, but fae cable-stamping-unlock ? will show you the available unlocks - there's one other, which IIRC allows using 56Gb rates over 40Gb-rated DACs.

The fae menu has an awful lot of fun commands hidden inside; It doesn't show up in a regular ? or help listing, but if you type fae ? while in configure mode, it'll list out (most of) the available commands. I believe flint is included - not sure, my switch is re-imaging itself after I messed up the latest conversion attempt - along with a number of other executables in the base linux OS and a whole bunch of fun debug/manufacturing commands.

Be careful. Several of the commands in here can near-irrevocably break your switch.
 

Rand__

Well-Known Member
Mar 6, 2014
4,593
912
113
This from an fairly up to date SX6036, not a converted EMC one.

Code:
aaa                            Configure AAA Features
action                         Increase timeout for a management a/sync action
advanced                       Enable advanced mode
apitester                      apitester
backup-polling                 Enable automatic backup polling
baudrate                       Set baudrate fae settings
buffer-profile                 Configure a buffer profile and its attributes
cable-info-cache               Cable info cache options
cable-stamping-lock            Lock cable stamping
cable-stamping-unlock          Unlock cable stamping
cc-mgr                         Obscure congestion control commands
chad-retries                   Defines the number of retries the chad daemon will make to attach to the MGMTd before resetting the
                               switch (10-no reset)
change-locallinks-timeout      Set the timeout for local links
cli                            Max number of CLI sessions
configuration                  Manipulating clear text config files
cru-buff-debug                 FAE command for debug of cru buffer
delete-guid2lid-file           Delete guid2lid cache from machine
dump                           Dump packet debug command
enable-system-m-key            Enable System M Key feature
eula                           Modify EULA functionality
exit                           Leave "fae" mode
eye                            Cable eye opening configuration commands
fdb-auto-learned               Enabled fdb auto-learned mode
fi-override                    Fabric inspector tries to ignore unknown ibdiagnet data
file                           Upload all debug dump files to a remote host
filesystem-recovery            Run filesystem-recovery
flint                          flint
flush-entity-table             Flushes entPhysicalTable and loads data again
fw-auto-update                 Set handling of fw auto update
fw-package-test                Test firmware package in current system
ha                             Modify the 'other' side
health                         Health daemon configuration
help                           View description of the interactive help system
hwd                            hwd tracing
i2c                            i2c
i2c-access                     Set handling of i2c access
i2c-reset-on-stuck             Enable i2c reset bus when bus is stuck
ibdiagnet                      ibdiagnet
iblinkinfo                     iblinkinfo
ibnetdiscover                  ibnetdiscover
ibnodes                        ibnodes
ibportstate                    ibportstate
ibqueryerrors                  ibqueryerrors
ibr                            Set ibr fae commands
ibroute                        ibroute
ibstat                         ibstat
ibstatus                       ibstatus
ibswitches                     ibswitches
ibtracert                      ibtracert
install-chip-fw                Install a chip firmware image
interface                      Configure external ports locking
interrupt                      Set interrupt options
intsim                         Simulate interrupts and events
ip                             Configure ip settings
iss                            Disable/Enable logging for ISS sub-class
lacp                           Configure LACP protocol settings
lag-events-verbosity           FAE command for change verbosity of lag and port events
libport-deinit                 DeInit libport
libport-init                   Init libport
log-change                     Change log level
log-show-levels                Print log levels
logging                        Logging command
lspci                          lspci
max                            Max port speed
mcra                           mcra
md5sum                         Run md5sum on loaded file
md5sum-fetch                   Upload a file from a remote host
mdns                           Configure mdns intervals
mellagra                       mellagra tool
metad                          MetaD commands
mlag                           Set mlag fae commands
mlxcables                      mlxcables tool
mlxdump                        mlxdump
mlxfwmanager                   mlxfwmanager
mlxi2c                         mlxi2c
mst                            mst
mst-autostart                  Start mst server on init
mstdump                        mstdump
mtserver                       mtserver
no                             Negate certain fae settings
ntp-key-limit-override         Override the NTP Key Limit
other-spine-ready-timeout      Set wait for other mgmt spine timeout
perfquery                      perfquery
policer-bind                   Bind the policer
policer-set                    Edit the policer
port-mirror                    fae command for port mirror
power-budget                   Set handling of power budget
power-off-on-error             Power off device on error flow
pra                            FAE commands for PRA process
print                          Print options
print-device-table             Portd device table
print-hwd-device-table         Dump of hwd device STM table
print-ib-config-table          IBD configuration table
print-ib-device-table          IBD configuration table
print-ib-dr-table              IBD direct route table
print-ib-ifindexs-table        IBD ifindexs table
print-ib-virtual-ifindexs-table IBD virtual ifindexs table
print-md-ib-ports-table        IBD configuration table
print-mfm-devices-table        Print devices mfm table
print-port                     Prints ports lib DB to LOG
print-ports-table              Portd device table
print-sx-net-table             Portd print to the log info of sx net lib and spm
process                        Modify daemon auto-start to true
profile                        Profile options
ps                             Linux ps command. Display info about the active processes.
puppet-agent                   Puppet agent fae commands
qp0-rdq-entries-number         Configure qp0 rdq number of entries (used for MADs)
This command is only affective if followed by
                               'configuration write' and 'reload'.
rdq-rate-limiter               Change a specific rdq rate limiter
refresh                        Global refresh commands
remote-configured              Enable remote-configured
restart-daemons-timeout        Configure daemons restart timeout and retry in seconds
rm-var                         Delete files from '/var' folder
run                            <Run a script>
sa-db-file                     Allow writing SA DB file
sa-db-ssd                      Store SA DB on SSD (survives reboots)
saquery                        saquery
sdksniffer                     Sdk sniffer commands
setpci                         setpci
sflow                          fae command for sflow
show                           Display system configuration or status
show-invalid-cable             Allow showing cables with invalid checksum
show-packet-drop-counter       Show packets drop counter in driver (application too slow to read packets)
show-sw-rate-limiter-queues    Show statistics of sw rate limiter's active queues
sma                            Enable SMA/PMA debug function
sminfo                         sminfo
smpquery                       smpquery
socat                          Packet light connector tool
tc                             Temperature daemon configuration
top                            Linux Top command. Display system summary and list of running tasks
trace                          Trace register debug command
traffic-control                Traffic control configuration
ufma                           UFM agent port number
update                         Update file
vlan                           General menu for fae commands for vlan
vlan-limit-override            Set pvrst max vlan limit override
vsr                            VSR tracing
welcome-pop-up                 Change the welcome pop-up functionality
xml-rest-allow-all             Allow Xtree iterate for all trees
 

neggles

is 34 Xeons too many?
Sep 2, 2017
48
15
8
Melbourne, AU
omnom.net
I thought I saw flint in there, now to find out if the fae flint command will play ball with the -allow_psid_change -override_cache_replacement flags. My SX6012 is currently rolling through its firstboot process, so once it's done freaking out over ECC errors, I'll give it a try - saves a couple reboots.

I think you can use fae commands to adjust the fan thresholds too - fae tc ? should be enlightening.
 
  • Like
Reactions: Labs and klui

Rand__

Well-Known Member
Mar 6, 2014
4,593
912
113
I think you can use fae commands to adjust the fan thresholds too - fae tc ? should be enlightening.
Code:
 tc ?
ambient-temperature            Use ambient temperature readings to determine minimal fan speed
qsfp-cables-temperature        Enable temperature readings from qsfp cables
range                          Temperature control change range parameters
(config fae) # tc ambient-temperature ?
enable                         Enable ambient temperature readings to determine minimal fan speed
(config fae) # tc range ?
manual                         Enable/Disable TC manual ranges
qsfp                           Set Temperature control range for qsfp
sx                             Set Temperature control range for sx
(config fae) # tc range sx ?
cold-threshold                 Set Temperature control range for sx
gap                            Set Temperature control range for sx
hot-threshold                  Set Temperature control range for sx
too-hot                        Set Temperature control range for sx
 
  • Like
Reactions: klui and Labs

neggles

is 34 Xeons too many?
Sep 2, 2017
48
15
8
Melbourne, AU
omnom.net
Okay, so it hit 2am where I live and I called it quits for the night. It takes forever to log into the local console, hangs on 'Modules are being configured'...

Anyway, once it's logged in, you can indeed run flint from the system; there's no easy way to access the internal filesystem, as far as I can tell; you're meant to use a USB flash drive plugged into the port on the front via an OTG adapter, but ones for miniUSB are hard to come by. Fortunately, there's a command that'll copy a file over scp to a known path, so you can do the following;

[EDIT]
I screwed up the snippet below; mt51000_pci_cr0 is NOT the correct device path.
Correct path is /dev/mst/mt51000_pciconf0. Don't use cr0, cr0 is the 'cache replacement' device path and while flint will happily burn to it, it won't work.

Use the output your own 'mst status' command provides, though I think it should be the same for all of the SX6012s.

Side note, you can also use http:// or tftp:// in the md5sum-fetch section.
[/EDIT]

Code:
sx6012 [standalone: master] # fae
sx6012 [standalone: master] (fae) # md5sum-fetch scp://user:password@serverip:/path/to/MT_1270110020.bin
sx6012 [standalone: master] (fae) # flint -i /tmp/user.md5sum q
Image type:          FS2
FW Version:          9.4.5110
FW Release Date:     12.2.2019
Device ID:           51000
Description:         Node             Sys image
GUIDs:               0000000000000000 0000000000000000
Description:         Base             Switch
MACs:                    000000000000     000000000000
VSD:                 n/a
PSID:                MT_1270110020
sx6012 [standalone: master] (fae) # flint -allow_psid_change -override_cache_replacement -d /dev/mst/mt51000_pci_cr0 -i /tmp/user.md5sum b

-W- Firmware flash cache access is enabled. Running in this mode may cause the firmware to hang.

    Current FW version on flash:  9.4.5110
    New FW version:               9.4.5110

    Note: The new FW version is the same as the current FW version on flash.

Do you want to continue ? (y/n) [n] : y

Burning FS2 FW image without signatures - OK
Restoring signature                     - OK
sx6012 [standalone: master] (fae) # exit
sx6012 [standalone: master] #
So you don't need to reboot into your minilinux image to flash firmware on-switch!

The tc commands Rand__ posted look promising, but I've managed to blow out the fan RPM sensing and PWM control IC on my SX6012 (along with one of the power supplies) by hooking up mis-wired fans, so I can't test :( it's also weirdly sluggish, so I think I screwed up more than just one chip and PSU - replacement time! going to shove some noctuas in this one and let them run flat out in the meantime, it still switches fine.
 
Last edited:
  • Like
Reactions: klui and Labs

klui

Active Member
Feb 3, 2019
176
73
28
Anyway, once it's logged in, you can indeed run flint from the system; there's no easy way to access the internal filesystem, as far as I can tell
@Labs posted the following in this thread that will help at https://forums.servethehome.com/ind...-as-mellanox-sx6xxx-on-ebay.10786/post-227694

It involves using U-Boot and boot into single-user mode.

Thanks @Rand__ for those commands. 3.6.1002 only has tc range. But there is no way to view default values. /var/log/messages shows a clue about them (on a patched tc binary) but I would think the qsfp and sx commands are for optics and not the chassis and I am reluctant to enable manual.

May 20 03:00:40 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: hwd_set_initial_fan_speed: Set fan power, FRU:[MGMT], type:[MGMT], fan:[SPEED_BOARD1], fan type:[34,SX_FAN_2DRWR_FIX], number:[1] speed:[60%]
May 20 03:00:40 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: hwd_set_initial_fan_speed: Set fan power, FRU:[MGMT], type:[MGMT], fan:[SPEED_BOARD2], fan type:[34,SX_FAN_2DRWR_FIX], number:[1] speed:[60%]
May 20 03:00:40 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: hwd_set_initial_fan_speed: Set fan power, FRU:[MGMT], type:[MGMT], fan:[SPEED_BOARD3], fan type:[34,SX_FAN_2DRWR_FIX], number:[1] speed:[60%]
May 20 03:00:40 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: hwd_set_initial_fan_speed: Set fan power, FRU:[MGMT], type:[MGMT], fan:[SPEED_BOARD4], fan type:[34,SX_FAN_2DRWR_FIX], number:[1] speed:[60%]
.
.
May 20 03:01:37 sx6012 health[4794]: [health.NOTICE]: health_sys_init_per_device_type: thresholds parameters :{ interrupt_enable:[1], spine_fan_max_rpm:[0], spine_fan_speed:[0], core_temp_max:[105], core_temp_min:[100], ambient_temp_max:[80], ambient_temp_min:[70], chassis_fan_max_rpm:[18000], chassis_fan_speed:[1440],ps_fan_max_rpm:[13000], ps_fan_speed:[1040] }
May 20 03:01:37 sx6012 health[4794]: [health.NOTICE]: health_sys_init_per_device_type: DB parameters:{ fans_size:[4], ps_size:[2], leaf_size:[1], spine_size:[0] }
.
.
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: Initializing fan_board:[/MGMT/FAN1]
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: Initializing fan_board:[/MGMT/FAN2]
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: Initializing fan_board:[/MGMT/FAN3]
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: Initializing fan_board:[/MGMT/FAN4]
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: TC device type:[11,tc device Dingo] has:[1] Leaf's affected by chassis fans
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: TC device type:[11,tc device Dingo] has:[0] Spine's affected by spine fans
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: TC device type:[11,tc device Dingo] has:[0] management modules affected by chassis fans
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: TC is setting Chassis fans to:[27%], Spine fans to:[27%] (minimal speed)
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 11: requested by: (system)
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 11: descr: Set Fan Speed
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 11: param: Set Fan Speed fan module: "/MGMT/FAN1"
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 11: param: Set Fan Speed fan number: 1
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 11: param: Set Fan Speed fan speed: 27
May 20 03:01:37 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: hwd_handle_action_request: action:[/system/chassis/actions/set-fan-speed]
May 20 03:01:37 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: handle_set_fan_action: Set fan speed, device:[/MGMT/FAN1] fan_num:[1] speed:[27%]
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 11: status: completed with success
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: interval:[0], TC have changed board:[/MGMT/FAN1], fan_num:[1], to speed:[27%], from speed:[0%]
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 12: requested by: (system)
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 12: descr: Set Fan Speed
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 12: param: Set Fan Speed fan module: "/MGMT/FAN2"
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 12: param: Set Fan Speed fan number: 1
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 12: param: Set Fan Speed fan speed: 27
May 20 03:01:37 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: hwd_handle_action_request: action:[/system/chassis/actions/set-fan-speed]
May 20 03:01:37 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: handle_set_fan_action: Set fan speed, device:[/MGMT/FAN2] fan_num:[1] speed:[27%]
May 20 03:01:37 sx6012 mibd[4799]: [mibd.NOTICE]: mibd_init_fdr_license: license=SDR
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 12: status: completed with success
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: interval:[0], TC have changed board:[/MGMT/FAN2], fan_num:[1], to speed:[27%], from speed:[0%]
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 13: requested by: (system)
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 13: descr: Set Fan Speed
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 13: param: Set Fan Speed fan module: "/MGMT/FAN3"
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 13: param: Set Fan Speed fan number: 1
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 13: param: Set Fan Speed fan speed: 27
May 20 03:01:37 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: hwd_handle_action_request: action:[/system/chassis/actions/set-fan-speed]
May 20 03:01:37 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: handle_set_fan_action: Set fan speed, device:[/MGMT/FAN3] fan_num:[1] speed:[27%]
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 13: status: completed with success
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: interval:[0], TC have changed board:[/MGMT/FAN3], fan_num:[1], to speed:[27%], from speed:[0%]
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 14: requested by: (system)
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 14: descr: Set Fan Speed
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 14: param: Set Fan Speed fan module: "/MGMT/FAN4"
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 14: param: Set Fan Speed fan number: 1
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 14: param: Set Fan Speed fan speed: 27
May 20 03:01:37 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: hwd_handle_action_request: action:[/system/chassis/actions/set-fan-speed]
May 20 03:01:37 sx6012 hwd[4839]: TID 1208134112: [hwd.NOTICE]: handle_set_fan_action: Set fan speed, device:[/MGMT/FAN4] fan_num:[1] speed:[27%]
May 20 03:01:37 sx6012 mgmtd[4155]: [mgmtd.NOTICE]: Action ID 14: status: completed with success
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: interval:[0], TC have changed board:[/MGMT/FAN4], fan_num:[1], to speed:[27%], from speed:[0%]
May 20 03:01:37 sx6012 temp_control[4790]: [tc.NOTICE]: TC starts as:[tc device Dingo] mode

I'm not sure how to access /system/chassis paths through Mellanox's utilities. The mdreq/mddbreq seems to only look at initial/factory settings.
 

SGS

Member
May 24, 2017
35
14
8
50
Try this

syntax:
mdreq action /system/chassis/actions/set-fan-speed fan_module string {module_name} fan_number int8 {fan_number} fan_speed int8 {fan_speed} set_max uint8 {max_speed}

example:
mdreq action /system/chassis/actions/set-fan-speed fan_module string "/MGMT/FAN1" fan_number int8 1 fan_speed int8 27 set_max uint8 50
 
Last edited:
  • Like
Reactions: metag

neggles

is 34 Xeons too many?
Sep 2, 2017
48
15
8
Melbourne, AU
omnom.net
Maybe this SX6012 is dead, after all. It's being really weird, no doubt because of the blown fan chip - the issues I'm seeing seem to be mostly centered around I2C bus stuff, it essentially won't identify any optics/DACs no matter what I do :( it also seems to require an fae mst restart after every boot before half of it works, and I keep getting PCI errors when trying to use other things.

Here's a gist showing bootlogs etc; I've reloaded the OS two or three times now, using a little script I made that shoves the modified files in the right place, and manually-confirmed it's all correct with md5sum checks... Does the boot log look OK? I know for a fact that I'm missing some uboot environment variables, would super appreciate it if someone could PM me their 'printenv' output in case there's anything missing that actually matters... really should've backed it up when I first messed with the thing ages ago.

I also had the whole mount: /lib/libblkid.so.1: no version information available (required by /lib/libmount.so.1) error deal occur on first boot, but not since, and I didn't get the wall of ECC errors on firstboot which is odd. On a previous attempt, I fixed the libblkid.so.1 error by tftping out vmlinuz-uni and fdt-uni from /mnt/root2/boot/ instead of using ones I extracted on my linux laptop, but it's just... weird.

I suspect the thing's just toast. Might pop it open and remove the dead chip with a hot air pencil in case it's just i2c bus problems, or just admit defeat and replace it.

Stupid custom fan pinouts. Even Cisco and HPE don't do that without having the decency to use a custom connector.