Got Arista 7050QX-32 (non S) - ...hmmm..., now what? Some questions, comments, issues....

petreza

New Member
Dec 28, 2017
25
2
3
So I got a 7050QX-32 (non S), used, and want to test it before the return window closes.
Below is a list of the various issues, comments, questions that I have. The two main ones are some stability problems - reboots - and (perhaps related) out of storage space.
I don't have experience with CLI switches so any help would be greatly appreciated. (Of course, I've downloaded the EOS manual, but that is 2000+ pages - it will take time.)

Here is the equipment I have:
- Arista 7050QX-32 (non S)
- Three (3) Mellanox ConnectX 3 Pro cards (MCX314A-BCCT)
- one IBM branded 3m QSFP Passive Copper FDR14 InfiniBand Cable P/N: 00W0057 (56Gbit/s cable)
- one of these one the way: 10Gtek - 40G QSFP+ DAC Cable - Mellanox Compatible
- access to the Serial Console
- a host running ESXi

The main usage of the 7050 is for homelab SAN - will passthru a ConnectX 3 to a TrueNAS VM which will serve a fast NVMe RAIDZ array as VM iSCSI storage for other hosts.

Here is the cooling setup I rigged up. I have a Noctua A4-20 4-Pin fan on the say for the power supply. The three 120mm 57CFM fans cool the switch part and the two pairs of smaller heatsinks on the sides. One of the fan is tilted to blow on the memory too. The 90mm Sunon 37CFM fan cools the processor side. I glued a small heatsink on the AMD Southbrige 1119 chip. All the fans are connected to the onboard 12V input(!?!) which, when the system is on, works as output (using a molex splitter and putting some electrical tape over the input part of the splitter - becomes a coupler)

20210414_143548_small.jpg 20210414_143448_small.jpg 20210413_231127_small.jpg 20210414_143401_small.jpg

I get:
Code:
localhost>show environment temperature
System temperature status is: Ok
                                                                 Alert  Critical
                                               Temp    Setpoint  Limit     Limit
Sensor  Description                             (C)         (C)    (C)       (C)
------- ----------------------------------- ------- ----------- ------ ---------
1       Cpu temp sensor                        27.7   (N/A) N/A     95       100
2       Rear temp sensor                       30.8   (N/A) N/A     65        75
3       Board sensor                           32.0   (N/A) N/A     55        70
4       Front-panel temp sensor                28.0   (N/A) N/A     65        75
5       Trident Bottom Right Outer             46.6   (N/A) N/A    100       110
6       Trident Bottom Left Outer              48.8   (N/A) N/A    100       110
7       Trident Top Left Outer                 47.7   (N/A) N/A    100       110
8       Trident Top Right Outer                48.8   (N/A) N/A    100       110
9       Trident Bottom Right Inner             47.1   (N/A) N/A    100       110
10      Trident Bottom Left Inner              48.2   (N/A) N/A    100       110
11      Trident Top Left Inner                 47.1   (N/A) N/A    100       110
12      Trident Top Right Inner                46.6   (N/A) N/A    100       110

PowerSupply 2:
                                                                 Alert  Critical
                                               Temp    Setpoint  Limit     Limit
Sensor  Description                             (C)         (C)    (C)       (C)
------- ----------------------------------- ------- ----------- ------ ---------
1       Power supply sensor                    29.0   (N/A) N/A     60        70

On bootup I got the Zero Touch banner and a Fan Warning:
Too few working fans detected. If not resolved, the system will shut down in 1 minutes
When I try
zerotouch disable
I get
% Error writing to flash:/zerotouch-config
Code:
[admin@localhost flash]$ ls -l
total 594256
-rwxrwx--- 1 root  eosadmin 608484547 Jan 13  2020 EOS-4.19.1F.swi
drwxrwx--- 2 root  eosadmin      4096 Dec 29  2017 System Volume Information
-rwxrwx--- 1 root  eosadmin        25 Mar 13  2020 boot-config
drwxrwx--- 3 root  eosadmin      4096 Apr 14 15:57 debug
-rw-r--r-- 1 root  root             0 Jun 12  2020 enable3px
-rwxrwx--- 1 root  eosadmin         0 Mar 20  2019 fullrecover
drwxr-xr-x 3 root  eosadmin      4096 Apr 15 21:04 persist
drwxrwxrwx 3 root  eosadmin      4096 Jun 12  2020 schedule
-rw-rw-rw- 1 admin eosadmin      5927 Apr 15 19:23 startup-config
-rw-r--r-- 1 root  root             0 Apr 14 14:09 zerotouch-config
the fans I handle with
environment insufficient-fans action ignore
environment fan-speed override 30
Here is the full configuration:
localhost#show startup-config
! Command: show startup-config
! Startup-config last modified at Wed Apr 14 20:08:47 2021 by admin
! device: localhost (DCS-7050QX-32, EOS-4.19.1F)
!
! boot system flash:EOS-4.19.1F.swi
!
transceiver qsfp default-mode 4x10G
!
logging console notifications
!
logging level AAA errors
logging level ACCOUNTING errors
.....
.....
(all are "errors" except the last one: )
.....
logging level ZTP informational
!

spanning-tree mode mstp
!
no aaa root
!
username admin role network-admin secret sha512 ####################################.....
!
environment insufficient-fans action ignore
environment fan-speed override 30

!
clock timezone ###############
!
interface Ethernet1/1
!
interface Ethernet1/2
!
interface Ethernet1/3
!
interface Ethernet1/4
!
interface Ethernet2/1
!
.....
.....
interface Ethernet24/4
!
interface Ethernet25
!
.....
.....
interface Ethernet32
!
interface Management1
ip address ##.##.##.##/##
!
no ip routing
!
banner login
#######################
#######################
EOF
!
end
==================================================
(PS. I don't get the disk full errors anymore and the systems seems to be stable for now)
Code:
[admin@localhost ~]$ df -lh
Filesystem      Size  Used Avail Use% Mounted on
none            581M   43M  539M   8% /
none            581M   43M  539M   8% /.overlay
devtmpfs        8.0M     0  8.0M   0% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
tmpfs           1.9G  436K  1.9G   1% /run
tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
tmpfs           581M   76K  581M   1% /tmp
tmpfs            64M  484K   64M   1% /.deltas
tmpfs           1.9G     0  1.9G   0% /var/run/netns
tmpfs           388M     0  388M   0% /var/core
tmpfs           388M   49M  339M  13% /var/log
tmpfs           1.0G  3.7M 1021M   1% /var/shmem
/dev/sda1       1.9G  1.2G  645M  66% /mnt/flash
Q1: (see above) First the disk full problem. Here are some of the error messages that I get.
<<DATE>><<TIME>> localhost Strata: % STRATA-3-UNEXPECTED_RESTART: Unexpected restart of the Strata agent Strata_Fixed System occured
<<DATE>><<TIME>> localhost EventMon: % EVENTMON-3-DB_Write_Failed: A sqlite database or disk is full exception occured when writing to the EventMon database
localhost login: admin
Password:
login: write lastlog failed: No space left on device
Warning: The following file systems have less than 10% free space left:
tmpfs (on /var/log) 0% (0 Available)
Please remove configuration such as tracing and clean up the space.
Any advice what to delete / disable?
(I had to change the dead CR2032 battery - the system would not keep the time if I unplug the switch power. Maybe this issue is related to this.)



==================================================
(PS. I don't get the disk full errors anymore and the systems seems to be stable for now)
Q2: Stability Problems:
Several times the system would reboot at random - when left alone, when plugging in cables....
For now I assume it is related to the disk full problem above. Once that is resolved, I will check the stability again.
(I had to change the dead CR2032 battery - the system would not keep the time if I unplug the switch power. Maybe this issue is related to this.)

==================================================
Q3: EOS version:
Comes with 4.19.1F - should I just keep this one or try to upgrade?
On the Win95-------Win10 scale, what is the 4.19.1F on this switch:
Win95--Win98--Win2000--WinXP--Win7--Win10
or is it one of the crappy ones:
WinME, WinVista, Win8
I don't plan to use anything fancy - maybe (Private) VLANs, so just keep it?

==================================================
(PS I changed the fan of the power supply so now it does not matter)
Q4: Shortly after I get the login prompt, I get:
<<DATE>><<TIME>> localhost Rib: Commence routing updates
then the PS fan spins up full blast and gradually spins down to the 30% setting I have given it. It gets pretty loud and I have to smother it with a pillow for 2 minutes. Any way to disable the spinup-down?

==================================================
(PS I cannot recreate the green light anymore - it was happening only when the switch had the memory full issues - will make a new post)
Q5: With the IBM cable (above) sometimes I get a green light on ports 25-32, sometimes I don't. What is the significance of the green light - does it guarantee full speed communication or is it just an indicator that there is something connected (electrically) on the other side but communication ability is not a given.

==================================================
Q6: Is there a way to test the switch with just one cable?
(PS I received the 10Gtek cable but it also does not work)

==================================================
Q7: The /mnt/flash location has the empty file enable3px.
Does that mean that this configuration has already been tinkered with? If so how can I do a reset to clear any other settings left from the previous user?
(PS The same location also has an empty file fullrecover - I don't know if this is from the reseller or someone was trying to do the same thing I am trying to do before me - will make a new post)

==================================================
Q8: I tried
speed forced 40gfull
shutdown
no shutdown
on Et25
and /mnt/flash has enable3px
but cannot get the green light when connected to ConnectX 3 Pro (was able to get green light sometimes - before the driver installation for ConnectX 3 also updated the firmware to 2.42.5000 from 2.34.5000 - nah, I now believe that was a fluke when the unit was stuck without storage space)
Any other ideas how to establish a link?
??? service unsupported-transceiver <stringname> <8-digit hex key> ???

Code:
localhost(config-if-Et25)#show interface Eth25 status
Port       Name           Status       Vlan     Duplex Speed  Type         Flags
Et25                      notconnect   1        unconf unconf 40GBASE-CR4

localhost(config-if-Et25)#show interface Eth25
Ethernet25 is down, line protocol is down (notconnect)
  Hardware is Ethernet, address is 0000.0000.0000 (bia 001c.7352.eaca)
  Ethernet MTU 9214 bytes
  Unconfigured, Unconfigured, auto negotiation: off, uni-link: n/a
  Down 27 minutes, 29 seconds
  Loopback Mode : None
  3 link status changes since last clear
  Last clearing of "show interface" counters never
  5 minutes input rate 0 bps (- with framing overhead), 0 packets/sec
  5 minutes output rate 0 bps (- with framing overhead), 0 packets/sec
     0 packets input, 0 bytes
     Received 0 broadcasts, 0 multicast
     0 runts, 0 giants
     0 input errors, 0 CRC, 0 alignment, 0 symbol, 0 input discards
     0 PAUSE input
     0 packets output, 0 bytes
     Sent 0 broadcasts, 0 multicast
     0 output errors, 0 collisions
     0 late collision, 0 deferred, 0 output discards
     0 PAUSE output
(PS
Code:
localhost(config)#show interfaces eth25, 29 transceiver properties
Name : Et25
Administrative Speed: 40G
Administrative Duplex: full
Operational Speed: unconfigured
Operational Duplex: unconfigured
Media Type: 40GBASE-CR4

Name : Et29
Administrative Speed: 40G
Administrative Duplex: full
Operational Speed: unconfigured
Operational Duplex: unconfigured
Media Type: 40GBASE-CR4
Code:
localhost(config)#show interfaces eth25, 29 capabilities
Ethernet25
  Model:        DCS-7050QX-32
  Type:         40GBASE-CR4
  Speed/Duplex: 10G/full,40G/full(default),auto
  Flowcontrol:  rx-(unknown),tx-(unknown)
Ethernet29
  Model:        DCS-7050QX-32
  Type:         40GBASE-CR4
  Speed/Duplex: 10G/full,40G/full(default),auto
  Flowcontrol:  rx-(unknown),tx-(unknown)
Code:
localhost(config)#show inventory
System information
  Model                    Description
  ------------------------ ----------------------------------------------------
  DCS-7050QX-32            32x QSFP+ 1RU

  HW Version  Serial Number  Mfg Date   Epoch
  ----------- -------------- ---------- -----
  02.00       ###########    2013-11-19 00.00
System has 2 power supply slots
  Slot Model            Serial Number
  ---- ---------------- ----------------
  1    Not Inserted
  2    PWR-460AC-R      ############
System has 4 fan modules
  Module  Number of Fans  Model            Serial Number
  ------- --------------- ---------------- ----------------
  1       0
  2       0
  3       0
  4       0
System has 105 ports
  Type             Count
  ---------------- ----
  Management       1
  Switched         104
System has 32 transceiver slots
  Port Manufacturer     Model            Serial Number    Rev
  ---- ---------------- ---------------- ---------------- ----
  1    Not Present
  2    Not Present
......
......
  24   Not Present
  25   Mellanox         00W0057          ###########      A1
  26   Not Present
  27   Not Present
  28   Not Present
  29   Mellanox         MC2206130-002    ###########      A
  30   Not Present
)

==================================================
Q9: The enable3px trick does not work on newer versions of EOS. What is the latest version that does support it, if I have to resort to downgrading.


==================================================
==================================================
Q1001: Any other comments or suggestions?


PS.
I ran two Ubuntu Server instances each with a ConnectX 3 passthru - was able to get 37Gbit/s with 4x4 instances of iperf3 with no special options - so the cards work
 
Last edited:

petreza

New Member
Dec 28, 2017
25
2
3
Apr 18 - updates to Q4, Q6, Q7, Q8 - addressed below, added Q9
 
Last edited:

legopc

Active Member
Nov 2, 2014
218
36
28
25
The Netherlands
If you are worried if you should send back the switch, you have already shot yourself in the foot by making those altercations. No sane seller would accept the return when they notice anything has been done with it.
 
  • Like
Reactions: fohdeesha

petreza

New Member
Dec 28, 2017
25
2
3
(I have posted updates to most of the Qs above)

I am not able to establish a connection through the switch.
Reading online it seems that Arista switches should not require any configuration to establish a basic connection. As long as the "cables" are compatible, it should be just plug-n-play - like a regular dumb switch.
While transceiver compatibility is challenging, I have seen several people state that any DAC should be fine with Arista. Yet, I now have two different Mellanox capable DACs and both do not work. Both the switch and the ConnectX3 cards see that the cables are plugged but no connection is established. I posted output in Q8 from some relevant new commands that I learned.

Q4: I changed the power supply fan with the Noctua and it is much quieter now - comparable to my desktop computer. But if you know the answer please post for future reference.

Q5 still stands: Does green light mean data should be flowing or is there more to it?

Q6: There seems to be no way to set one of the regular ports as a management port in order to test the switch with just one cable.

Q7: In addition to the enable3px there is also a fullrecover file. The fullrecover file is used to do an OS recovery using a USB drive. This further leads me to believe that this is not a system pulled from a datacenter, but that someone like me already has been trying to do what I am trying to do.

Q8: I tried a bunch of " service unsupported-transceiver <stringname> <8-digit hex key> " that I was able to find but none helped.

Added:
Q9: The enable3px trick does not work on newer versions of EOS. What is the latest version that does support it, if I have to resort to downgrading.


Well, more to read and try. I am nowhere near giving up.

====================================

I don't get it why somebody would fan mod an arista switch which supports fan settings :(
living in an apartment - even at 30% the PS fan was too loud - I have to sleep 7 feet away from this thing. Never mind the jet engine takeoff sound with 5-6 fans on a reboot.(Q4)

If you are worried if you should send back the switch, you have already shot yourself in the foot by making those altercations. No sane seller would accept the return when they notice anything has been done with it.
Look at it this way. This was a unit being sold with "severe water damage" but "working," and the price reflected that. I can not afford the $600-$800 units. It did not come with fans or power supply(ies). The power supply cost me as much as the switch - just to test if it was alive. There is no way I was going to spend more than the switch just to get the standard fans, test the switch, and then remove the fans anyway. I took precautions to ensure that nothing burns out when I power the switch - is that a bad thing? If after all this, the unit never powered up, and the seller refused to accept a return on the basis of admitting to insanity, then, fine, it is all my loss.
As of right now the power supply is hardware modified so that's done. The switch seems to be working - it is just a matter of configuration to get it cranking.
 
Last edited: