Celestica D4040

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

busterswt

Member
Apr 30, 2017
66
18
8
42
Any Celestica owners out there? Curious to know what you’re running - whether it’s ICOS or Cumulus or something else.

Cheers!
JD
 

pcmoore

Active Member
Apr 14, 2018
138
48
28
New England, USA
I've been curious about these, but there doesn't seem to be a lot of folks using them (or rather those that do are very quiet). Do you have one? If so, what are you using on it?
 

busterswt

Member
Apr 30, 2017
66
18
8
42
That’s a good question. Mine came without storage installed, so I had to go thru the process of building ONIE and finding a NOS to install. The original Ubuntu/ICOS is unobtainable, so I went with Cumulus. I’ve got a trial license, but switchd won’t start due to an eeprom or some other issue. I need to be able to compare some output to know for sure.

Long story short, I can’t tell you much about it, but I know there are some STH members that have one.
 

Rand__

Well-Known Member
Mar 6, 2014
6,622
1,762
113
I do have two Celestica's but not 4040s.
One runs Sonic the other a trial of OCNos.
Have not managed to get a trial for cumulus or would have given it a whirl.
 

Labs

Member
Mar 21, 2019
88
16
8
I played with a D4040 some months ago and it was running ICOS with L2/L3/Routing/IPv6/etc enabled. Not sure if all come the same way.
I also checked the breakout cables 40G QSFP to 4x10G SFP+ from IBM and it was OK. Different QSFP-QSFP passive cables from different vendors also were OK.

SSD was mSATA 16GB and it had 4GB DDR3L I think ECC unbuffered and a empty slot for addition module. Both SSD and RAM module were with innodisk label.

What I noticed was the CPU is the Intel C2xyz model which is the affected one same like in Cisco ASA and some ISR routers.
I checked at that time the specific model and stepping to confirm the unit I had was affected and it was.

I don't have the switch anymore so I cannot help with more details...
 
  • Like
Reactions: busterswt

fohdeesha

Kaini Industries
Nov 20, 2016
2,721
3,050
113
33
fohdeesha.com
What I noticed was the CPU is the Intel C2xyz model which is the affected one same like in Cisco ASA and some ISR routers.
I checked at that time the specific model and stepping to confirm the unit I had was affected and it was.
Interestingly, the bug shouldn't affect the D4040 switches. The c2xxx failure mode is that the clock for the LPC bus fails. The c2 series can boot from two places, either a bootrom connected via the LPC bus, or via SPI flash. When the LPC clock fails, it will no longer be able to read a bootrom over the LPC bus next time you try to boot it, so it will be bricked. However for devices that boot off SPI connected flash, it won't effect functionality. That's why some vendors had massive failure rates from this bug (like cisco who's ASA's booted off LPC connected bootroms), and others had pretty much none

the D4040 boots off a pair of winbond W25Q64FV 64mbit SPI flash chips. I believe the two SPI chips are arranged in a redundant bootloader configuration following the intel recommended method for the C2000 series: https://www.intel.com/content/dam/w...ant-spi-flash-with-failover-boot-app-note.pdf
 

Labs

Member
Mar 21, 2019
88
16
8
Interestingly, the bug shouldn't affect the D4040 switches. The c2xxx failure mode is that the clock for the LPC bus fails. The c2 series can boot from two places, either a bootrom connected via the LPC bus, or via SPI flash. When the LPC clock fails, it will no longer be able to read a bootrom over the LPC bus next time you try to boot it, so it will be bricked. However for devices that boot off SPI connected flash, it won't effect functionality. That's why some vendors had massive failure rates from this bug (like cisco who's ASA's booted off LPC connected bootroms), and others had pretty much none

the D4040 boots off a pair of winbond W25Q64FV 64mbit SPI flash chips. I believe the two SPI chips are arranged in a redundant bootloader configuration following the intel recommended method for the C2000 series: https://www.intel.com/content/dam/w...ant-spi-flash-with-failover-boot-app-note.pdf
Thanks for the explanation. I was looking for this for some time. So it is somehow safe to say that all switches based on Intel C2000 CPUs that boot from SPI flash are not affected by this bug. I remember there is also one Arista switch based on some Intel CPU but I don't remember the exact model. The rest of Arista switches are based on AMD from what I saw.

Thanks again!
 
  • Like
Reactions: nedimzukic2

okrasit

Member
Jun 28, 2019
40
32
18
I have a D4040 running sonic. I've got the support done almost completely, just the leds and qsfp hotplugging in the works.:rolleyes:
 
  • Like
Reactions: fohdeesha

okrasit

Member
Jun 28, 2019
40
32
18
Interestingly, the bug shouldn't affect the D4040 switches. The c2xxx failure mode is that the clock for the LPC bus fails. The c2 series can boot from two places, either a bootrom connected via the LPC bus, or via SPI flash. When the LPC clock fails, it will no longer be able to read a bootrom over the LPC bus next time you try to boot it, so it will be bricked. However for devices that boot off SPI connected flash, it won't effect functionality. That's why some vendors had massive failure rates from this bug (like cisco who's ASA's booted off LPC connected bootroms), and others had pretty much none

the D4040 boots off a pair of winbond W25Q64FV 64mbit SPI flash chips. I believe the two SPI chips are arranged in a redundant bootloader configuration following the intel recommended method for the C2000 series: https://www.intel.com/content/dam/w...ant-spi-flash-with-failover-boot-app-note.pdf
AFAIK the CPLDs on the D4040 sit on the LPC bus. If the bus were to fail, it'd probably brick the device. If there's any one location, in the boot code, that waits for a bit to be set/cleared, eg. polling for a status change of a register, it'd hang there forever (no boot). I think, this is what's happening with most of the "failing" devices. It's not that the boot rom isn't accessible but, the bootcode/bios poking the LPC-bus, causing the code to hang. :rolleyes:

There was a mention of VDDIO 1.8V being "safe" for the E3xxx atoms, somewhere. That might also apply for the 2xxx series, if they even can do LPC @1.8V. :oops:
 
  • Like
Reactions: Labs

fohdeesha

Kaini Industries
Nov 20, 2016
2,721
3,050
113
33
fohdeesha.com
Are they really? I thought the MachXO2's in this thing were on the PCI bus, but I haven't had a chance to dive into the D4040 yet. if they're on the LPC bus and it fails then that'd certainly cause issues
 

okrasit

Member
Jun 28, 2019
40
32
18
Are they really? I thought the MachXO2's in this thing were on the PCI bus, but I haven't had a chance to dive into the D4040 yet. if they're on the LPC bus and it fails then that'd certainly cause issues
They sure are. I think i'm going to add a pull-up resistor there, before the unit breaks down.
 
  • Like
Reactions: Labs and fohdeesha

okrasit

Member
Jun 28, 2019
40
32
18
I'm not familiar with electronic design. Did you put a 121 Ohm resistor in series with the capacitor at C410 and R562 where the resistor is covered in black tape?
The right pad of R562 is the clock signal and the top pad of the C410 is just a 3.3V supply voltage. Yes, the black blob is the resistor. I had only smd resistors at hand, so there's a small pcb inside there, with the resistor soldered on.
:rolleyes:
 
  • Like
Reactions: Labs

Labs

Member
Mar 21, 2019
88
16
8
From what I read on different forums this fix is only to prolong the life of the unit because at some point they will still fail and they cannot be repaired anymore.

Is it true or it can be considered a permanent fix?