Mellanox/EMC SX6012 - Revival Attempt

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
I'm not sure the status-lights bar and the reset-button are working. Whatever I do I get no lights, even with the switch now fully booted.

console_emc1.png

... and with current IP ;)

console_emc3.png

Unfortunately, fans are still running on 100%. Hard to work on the switch with this noise.

I have now knowledge about EMC switch os. Well, seems I have to go back straight to prison over the weekend and reading the MLNX os flash guide. What can I do about the fans quickly or does anyone know how to reset the switch so the fans are coming down?

Thanks.

PS: Seems like this is already a 'converted' switch ... SSH'd into it an dit shows me MLNX OS

mlnx_os.png
 
Last edited:

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
Unable to get into subsystem and unable to login in WebGUI. Module configuration and login both fail with 'Fatal Error'.

Good night.
 

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
I'm not going anywhere. Something with the OS or how it was converted doesn't feel ok. I have not the knowledge of MlnxOS and these switches. I can log in into the switch. Fine. I can not load any modules. 'Mst status' doesn't even show the switch itself. Several usual Mlnx commands do not work at all. If I want to login into the WebGUI I will be confronted with 'Fatal Error'. Changing IP etc do not work as well, even with saved config for next reboot. I also can't reset the switch to be greeted with the ZTP after the reboot.

Nothing changed regarding the status-lights and fans. I installed a 120mm Noctua to not going crazy while 'fighting' with this Miststueck.
 

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
Already thought about it. Thanks.

PS: But, I don't have a uBoot password in case smthg goes wrong. Also, the reset 'button' doesn't seem to work. I tried it several times and it the switch doesn't react to it, nor is the status-light bar working. Therefore I was trying to get more info about the switch, the flashing and Mlnx OS in case something goes wrong. I don't want to fully brick it.
 
Last edited:

up3up4

Member
Jun 10, 2018
85
29
18
There is no password in EMC’s uboot. Anyway you could try yes or not just control-b all the way.
 

RedX1

Active Member
Aug 11, 2017
132
144
43
Hi



It looks like you have made some significant progress.


Seeing that you already have access to the management plane, I would attempt a "Zeroconf"


https://docs.mellanox.com/display/MLNXOSv381000/Management+Interface+Commands

Zeroconfig.JPG



https://community.mellanox.com/s/article/howto-get-started-with-mellanox-switches

Mellanox Jump Start..JPG

I have used this method to initially gain access to these switches and this might enable you to get to the web GUI.


There are obviously other issues with this unit, but I hope this helps.



Good luck.




RedX1
 
  • Like
Reactions: gb00s

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
Thanks @RedX1 . But I already decided to go for a clean setup and I already regret it. I don't want to be ungrateful, but this 'Conversion Guide' has some issues and is confusing (eg. Step 6.2.B) together. Then all these annoying ECC error messages. Also, I have I/O errors all over the console. For someone not familiar with MLNX-OS, Step 6.2.B is shittily explained. Boot here and go there and ... Don't get me wrong. It's booting, a lot faster and without the error messages from the install when I got the switch. I already ran the wizard, modules are prepared atm. But why all the other issues with jffs2 >> reading and writing. I/O errors .... Jesus

Now reading again all the different opinions on what and how to do .... Like wtf am I doing. Just frustrated.

EDIT: Na na na, Module configuration always fails with fatal internal error. Maybe an issue due to my confusion on Setp 6.2.B and therefore wrong actions ... :confused:
 
Last edited:

RageBone

Active Member
Jul 11, 2017
617
159
43
i still think that you have something seriously wrong / dead power wise on the switch board.

since you had to clean one of the PSUs, have you cleaned the rest too?
I'm still waiting for a shit-ton of pictures to help you look for "it".
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
I don't want to be ungrateful, but this 'Conversion Guide' has some issues and is confusing
Which version specifically are you referring to? Mine does not even have 6.2.B but I might have cleaned it up after doing the conversion.

Also don't forget that when this guide was written, there was exactly one person who had figured out the whole process and the guide was a write up of his experiments that he shared with another brave soul willing to try;)
Back then (2016) the EMC switches were new on the market and nobody actually knew how many hw versions there were, whether all steps would be applicable to all of them, whether it would sork on 6012 and 18 and so on.
Even when I converted mine in early 2017 there were open issues like problems with keeping link speed after reboots and so on.

Nowadys with lots and lots of conversion having been done, ready made files to do everything, that guide (at least the old version) is not up to date any more, so is understandable quite confusing
 

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
i still think that you have something seriously wrong / dead power wise on the switch board.

since you had to clean one of the PSUs, have you cleaned the rest too?
I'm still waiting for a shit-ton of pictures to help you look for "it".
I disagree for the moment. The 'previously' defective PSU is out. I'm just running it with one PSU. The huge amount of ECC errors is not uncommon during the conversion. I already learned it. What alerts me are the tons of I/O errors and jiffs2 read/write errors. But can be a firmware error as I wasn't able to flash the firmware as of yet due to confusion from part 6.2.B in the guide. Whenever I change to /dev/mtdlock7 and do the 'run mlxlinux' I boot into the login screen but logging in fails due to fatal errors during the load/config of the modules. Always ...

The only issue I have with any LEDs is the amber LED on the daughterboard right next to the heatsink.
 

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
Which version specifically are you referring to? Mine does not even have 6.2.B but I might have cleaned it up after doing the conversion.
...
Link >> Index of /

....
Also don't forget that when this guide was written, there was exactly one person who had figured out the whole process and the guide was a write up of his experiments that he shared with another brave soul willing to try;). Back then (2016) the EMC switches were new on the market and nobody actually knew how many hw versions there were, whether all steps would be applicable to all of them, whether it would sork on 6012 and 18 and so on. Even when I converted mine in early 2017 there were open issues like problems with keeping link speed after reboots and so on.

Nowadys with lots and lots of conversion having been done, ready made files to do everything, that guide (at least the old version) is not up to date any more, so is understandable quite confusing
Nothing is worse than wrong documentation compared to no documentation. Update it or delete it ...
 
  • Like
Reactions: fohdeesha

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Nothing is worse than wrong documentation compared to no documentation. Update it or delete it ...
Well I'd disagree, old/outdated documentation is better then no documentation at all; don't think ppl would be able to convert the swicth without the guide;)

Not entirely sure whats confusing about 6.2.B though ?


Code:
Step 6.2.B: …from the linux shell on the switch
To flash the firmware locally you need shell access to the running MLNX-OS
{In order to get shell accss ...}
Boot into scratch linux (mtdblock6) and adjust admin shell in passwd to /bin/bash (you will need to remove the symlink of /mnt/root2/etc/passwd and create a local copy in /mnt/root2/etc).
{... or the switch would call the MLX mgmt shell and you wouldnt be able to access the OS to run flint...}

Transfer created firmware file to the switch using tftp or scp (either in scratch Linux or MLNX-OS) 
{... so you have the file to flash it locally since you don't have working networking yet ...}

Boot MLNX-OS by running these commands in U-Boot
setenv rootdev /dev/mtdblock7
run mlxlinux
...
 

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
No, if you break something with wrong documentation or you are doing a big mistake because of that, then you are better without it. You should know that better.

Well I 'booted' into mtdblock6 to adjust the passwd to then upload the firmware and to reboot into mtdblock7 to find myself then stuck with fatal errors while loading modules at the login. Then it is refered in step 7 to the 'First Boot' .... What is a boot here? A boot like a boot or a boot like ???

Now started the whole process again and I'm already stuck in step 2 'run mlxlinux' which leaves me with kernel panic and no helpful logs. Why does this POS not refer to the uploaded kernel from the image. Then it attempts to reboot automatically.
 
Last edited:

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
Again ... No matter what I do I always get the FATAL ERROR part while trying to boot/login in mtdblock7 ...

Launching CLI...

System is initializing!
This may take a few minutes

Modules are being configured

Please read and accept the End User License Agreement located at:
Local copy of the EULA can be uploaded using "file eula upload" command

% A fatal internal error occurred

Mellanox MLNX-OS Switch Management

SX6012 login:
If I login .. Loop >> FATAL ERROR and get the login again.
 
Last edited:

klui

Well-Known Member
Feb 3, 2019
824
453
63
Your switch either has hardware problems or wasn't converted properly.
  • My 6012 running 1002 does not show any SX_NET_LIB error just before Starting internal_startup
  • It doesn't display Unloading MST PCI module, Unloading MST PCI configuration module
The fact you had to remove a PSU to get this far probably indicates some surge going through the currently bad PSU that affected your switch. If mst status doesn't show mt51000_pciconf0 you have bigger problems.

I do agree that the initial documentation is on the rough side and a lot of people have been caught off guard in various places but it can be overcome. It's a testament to the original author making the guide available and worked for many folks.

The important parts about step 6.2.B are you need to let the system boot once w/out flashing the firmware so the admin account is enabled and other accounts are created by the first-run process. And if you want to change the shell, you should move the symlink for passwd and shadow instead of deleting them. I've converted 2 and I never got a kernel panic. Again, first boot after conversion (but before flashing the firmware) has a lot of ECC errors. Wait it out. Subsequent boots won't have them as long as you change the loglevel. That's just the way jffs works with this switch, no way around that. Other embedded systems seems to have similar issues while using jffs.

You mentioned I/O errors "all over the place." How many errors? During my switches' initial boots, I got around 65 I/O error entries each. But there were a lot more ECC errors. When you erased /dev/mtd7, /dev/mtd8, and /dev/mtd9, how many bad blocks did you get? I got around 10 across all 3 on both switches.

I would imagine there are quite a few revisions to the switch. What year/month was yours built? Look at the asset pull tab below the console port. Mine are built in 2014 Sep and 2015 Jun. Based on your picture of the headers, your I2C header is occupied which is also the case on mine built in 2014 so the conversion should work if the hardware is healthy.
 
  • Like
Reactions: gb00s

klui

Well-Known Member
Feb 3, 2019
824
453
63
Nothing is worse than wrong documentation compared to no documentation. Update it or delete it ...
But original author has not been replying for a while now and seems to have moved on to other things. Pretty sure other folks have just experimented and gotten things working but haven't updated their copy.
 
  • Like
Reactions: gb00s

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
Your switch either has hardware problems or wasn't converted properly.
  • My 6012 running 1002 does not show any SX_NET_LIB error just before Starting internal_startup
  • It doesn't display Unloading MST PCI module, Unloading MST PCI configuration module
The fact you had to remove a PSU to get this far probably indicates some surge going through the currently bad PSU that affected your switch. If mst status doesn't show mt51000_pciconf0 you have bigger problems.
I consider this switch broken and dead. Just realized at 4.36am while reading through all of the related threads here on STH, that the prev owner got this one never really going for two years. When I received it to play with it, it already showed the FATAL ERRORS related to modules. I also know it's an engineering sample. Checked the serial number. The FATAL ERROS have to do with not being able to flash the firmware, because as you said, I never get an mt51000_pciconf0. Yes, I get a regular mt510000. So firmware flashing is blocked. Status-lights never come up. Reset button doesn't work.

I always tried to make the conversion with 8012. Just wanted to try 1002 today. But I'm not very positive. Got the section 6.2.B covered now. But it all doesn't matter if you are unable to flash the firmware due to missing mt51000_pciconf0. That's my only issue atm. But maybe the one who is unsolvable. I'm not a person to give up quickly, but this time ....

The one question I still have is the amber LED on the daughterboard right next to the heatsink. Unfortunately, I can't find any documentation about this LED and what it indicates. But amber colour, isn't the most positive colour in my experience.

Picture_20210110060129.jpg

What can you do out of this switch of broken? Spare parts? Ok one PSU is for 100% ok. But the boards?

Regarding the documentation. Yes, I appreciate documentation. Correct documentation. I for myself, I update documentation as long as I can or delete it if nobody takes over the maintenance. But maybe several parts played badly with me, like engineering sample, already partially broken hardware etc.

Trying one more day and that's it.
 
Last edited:

necr

Active Member
Dec 27, 2017
151
47
28
124
Possibly no PCIe link to the underlying SwitchX2 board, mft can’t start, or driver has trouble accessing registers. You might want to add mft to the mini image or check lspci somehow to confirm this.
 
  • Like
Reactions: gb00s