Dell 3-Node AMD DCS6005

javi404

New Member
Jan 24, 2014
26
0
1
CPU0_TEMP | 39.000 | degrees C | ok | na | na | na | na | 70.000 | na
CPU1_TEMP | 32.000 | degrees C | ok | na | na | na | na | 70.000 | na
SR5650_TEMP | 97.000 | degrees C | ok | na | na | na | na | 98.000 | na
DIMM1 INLET_TEMP | 37.000 | degrees C | ok | na | na | na | na | 85.000 | na
CPU0 INLET_TEMP | 34.000 | degrees C | ok | na | na | na | na | 80.000 | na
CPU1 INLET_TEMP | 37.000 | degrees C | ok | na | na | na | na | 80.000 | na
H0_VDD_CORE_RUN | 0.984 | Volts | ok | na | 0.736 | na | na | 1.344 | na
H1_VDD_CORE_RUN | 0.984 | Volts | ok | na | 0.736 | na | na | 1.344 | na
V1P8_MEM0 | 1.776 | Volts | ok | na | 1.520 | na | na | 2.000 | na
V1P8_MEM1 | 1.792 | Volts | ok | na | 1.520 | na | na | 2.000 | na
P1V8_LAN | 1.760 | Volts | ok | na | 1.696 | na | na | 1.904 | na
VDD_3.3_RUN | 3.248 | Volts | ok | na | 3.120 | na | na | 3.632 | na
VDD_5_ALW | 4.824 | Volts | ok | na | 4.488 | na | na | 5.472 | na
VDD_5_RUN | 4.944 | Volts | ok | na | 4.488 | na | na | 5.472 | na
VBAT | 3.232 | Volts | ok | na | 2.624 | na | na | 3.536 | na
VDD_RD890_1.8RUN | 1.832 | Volts | ok | na | 1.600 | na | na | 2.000 | na
VDD_SB700_1.2RUN | 1.200 | Volts | ok | na | 1.048 | na | na | 1.352 | na
VDD_HT1_RUN | 1.200 | Volts | ok | na | 1.048 | na | na | 1.352 | na
VDD_RD890_1.1RUN | 1.096 | Volts | ok | na | 1.000 | na | na | 1.200 | na
VDDHTTX_RD890 | 1.208 | Volts | ok | na | 1.056 | na | na | 1.368 | na
VDD_3P3_DUAL | 3.280 | Volts | ok | na | 3.120 | na | na | 3.632 | na
SYS FAN 1 | 5050.000 | RPM | ok | na | 1000.000 | na | na | na | na
SYS FAN 2 | 5100.000 | RPM | ok | na | 1000.000 | na | na | na | na
SYS FAN 3 | 5100.000 | RPM | ok | na | 1000.000 | na | na | na | na
SYS FAN 4 | 5250.000 | RPM | ok | na | 1000.000 | na | na | na | na
Ear Temp 1 | 24.000 | degrees C | ok | na | na | na | na | 50.000 | na
Ear Temp 2 | 0.000 | degrees C | ok | na | na | na | na | 50.000 | na
SEL Fullness | 0x0 | discrete | 0x1000| na | na | na | na | na | na
Memory | 0x0 | discrete | 0x0000| na | na | na | na | na | na


This is an unmodified system, only the addition of an LSI SAS card for an external array in node 3.
As you can see, the temp for SR5650 is borderline. Some how I am sure it has gotten higher than that and alarmed but hasn't rebooted yet.

The LSA card blocks one of the chips that could use a fan. Tomorrow I will see what hack I can do to lower the chip.

Node 2 is off, and Node 1 is at 81-92C.

I also think node 3 doesn't get as much air as the others because of the way the fans are placed.

I'll post updates here but I will probably move this LSA card and it's array to another host at some point.
 

Datamax

New Member
Jul 25, 2017
6
2
3
36
This is an unmodified system, only the addition of an LSI SAS card for an external array in node 3.
As you can see, the temp for SR5650 is borderline. Some how I am sure it has gotten higher than that and alarmed but hasn't rebooted yet.
This is the same issue i am having, some motherboards have already utterly failed.

My fix has been to take the chipset heatsinks out, soak them in acetone to remove the dried and stiff thermal insulating material (it's not transferring anymore), use proper compound + heatsink glue + hot glue to put the heatsinks back, and install 40mm fans on each of the heatsinks.

This has fixed most of those i have, some still keeps crashing so i assume those have failed motherboards. I have almost ran out of spare motherboards at this point too :(

Also on some motherboards i've seen white stuff under the solid state cap next to chipsets, but this was underneath the motherboard on it's solder points. I guess that is a sign of failure, but why the white stuff is underside of the board is a mystery :/

Need to find good replacement model, testing some SC fat twins right now
 
  • Like
Reactions: Tha_14

Tha_14

Server Newbie
Mar 9, 2017
73
10
8
This is the same issue i am having, some motherboards have already utterly failed.

My fix has been to take the chipset heatsinks out, soak them in acetone to remove the dried and stiff thermal insulating material (it's not transferring anymore), use proper compound + heatsink glue + hot glue to put the heatsinks back, and install 40mm fans on each of the heatsinks.

This has fixed most of those i have, some still keeps crashing so i assume those have failed motherboards. I have almost ran out of spare motherboards at this point too :(

Also on some motherboards i've seen white stuff under the solid state cap next to chipsets, but this was underneath the motherboard on it's solder points. I guess that is a sign of failure, but why the white stuff is underside of the board is a mystery :/

Need to find good replacement model, testing some SC fat twins right now
Could be bad capacitor chemistry. Failed caps can pop from underneath(older electrolitic ones used to fail like that, they didn't pop fro' above as they should).

Take a closer look and if you can please post a pic for future references.

Cheers
 

Datamax

New Member
Jul 25, 2017
6
2
3
36
Could be bad capacitor chemistry. Failed caps can pop from underneath(older electrolitic ones used to fail like that, they didn't pop fro' above as they should).

Take a closer look and if you can please post a pic for future references.

Cheers
But through the board, and there is no holes in the board? oO; Sounds to me like magic for something to go through solid object :)

I will take photos next time i stumble across it.

Anyone managed to fit any other motherboard on these machines? Looks like the sled is a few mm too short to fit Supermicro X8DTT-F and without the hotswap backplane you cannot fit C6100 motherboards neither. C6220 motherboard is waayy too long (and again connectors).

That X8DTT-F is the only one i've found which just might work with a little bit angle grinding the sled, tho the dual 20pins is a bit What? (assumed it's an option so you can use either one, perhaps some chassis required this design choice?)
 

Datamax

New Member
Jul 25, 2017
6
2
3
36
Also i tested a 470W PSU: PS-2417-1L LF

It works on these machines, but i only kept the machine running for 10-15mins, as i need to do proper power analysis etc. and if it helps at all or not with power consumption. Last time i checked we had 380W from the wall with the 1100W PSUs, but it seems more modern HDDs (380W was with 2TB/3TB Seagates) these are using more like 320W so i need to do full testing again to make sure.

If we can fit other motherboard in these we just might keep the chassis' and modernize, just might be worth the effort.
 

Tha_14

Server Newbie
Mar 9, 2017
73
10
8
Also i tested a 470W PSU: PS-2417-1L LF

It works on these machines, but i only kept the machine running for 10-15mins, as i need to do proper power analysis etc. and if it helps at all or not with power consumption. Last time i checked we had 380W from the wall with the 1100W PSUs, but it seems more modern HDDs (380W was with 2TB/3TB Seagates) these are using more like 320W so i need to do full testing again to make sure.

If we can fit other motherboard in these we just might keep the chassis' and modernize, just might be worth the effort.
Ahh, Didn't realize it was on the other side of the board. That's bizare.

Good luck with your build.
 

javi404

New Member
Jan 24, 2014
26
0
1
I'm starting to think that the temp sensor on my one node "SR5650_TEMP | 97.000" is stuck. I have dropped the ambient air temp by 3C and the other node shows a drop of 2C on SR5650

I have been logging the temp of SR5650_TEMP on all nodes for almost 24 hours and that one has not changed at all from 97, right at the brink of 98 which would be alarm.

Has anyone observed stuck sensors on these?

One day my daughter accidentally left the garage door closed for like 6 hours and it was boiling in there, I am wondering if this triggered some kind of state of this sensor because if it is at 97 now, there is no way it wasn't at 110-120 with high ambient air recirculating and not venting.


EDIT:

Here are the readings, note NODE-2 is off:


Fri Aug 17 11:43:04 EDT 2018
Chassis
Ear Temp 1 | 24.000 | degrees C | ok | na | na | na | na | 50.000 | na
NODE-1
SR5650_TEMP | 79.000 | degrees C | ok | na | na | na | na | 98.000 | na
NODE-2
SR5650_TEMP | na | degrees C | na | na | na | na | na | 98.000 | na
NODE-3
SR5650_TEMP | 97.000 | degrees C | ok | na | na | na | na | 98.000 | na
 

javi404

New Member
Jan 24, 2014
26
0
1
My fix has been to take the chipset heatsinks out, soak them in acetone to remove the dried and stiff thermal insulating material (it's not transferring anymore), use proper compound + heatsink glue + hot glue to put the heatsinks back, and install 40mm fans on each of the heatsinks.
How hot were the SR5650_TEMP sensors before and after this?

I actually want to run mine in warmer air as long as they are below ~85-90C.
 

Datamax

New Member
Jul 25, 2017
6
2
3
36
How hot were the SR5650_TEMP sensors before and after this?

I actually want to run mine in warmer air as long as they are below ~85-90C.
Roughly 30C drop, hottest i've seen is 96C on the chipset, but typically around 80-85C or it is in crashed state. I believe running these hot for years may have damaged the chipsets, and these start to crash well below the treshold temp. Not all are this flaky but many of them are.

After changes lowest i have seen is around 50C i think. After applying these fixes 2 motherboards died at the same time.
 

Thanos

New Member
Jun 16, 2014
16
0
1
54
Roughly 30C drop, hottest i've seen is 96C on the chipset, but typically around 80-85C or it is in crashed state. I believe running these hot for years may have damaged the chipsets, and these start to crash well below the treshold temp. Not all are this flaky but many of them are.

After changes lowest i have seen is around 50C i think. After applying these fixes 2 motherboards died at the same time.
Hi, when you say "died" does that mean a permanent situation? :) I got two out of three to not respond to IPMI command to power on (one does power on for a while and then shuts down). One of them does not even respond to IPMI -> I guess some part has failed and doesn't bring the board up for the IPMI to function.

Has anyone found a way to revive those boards? Or is it just for the recycling yard?
 

ashevchuk

New Member
Sep 25, 2017
3
1
3
36
Hi, when you say "died" does that mean a permanent situation? :) I got two out of three to not respond to IPMI command to power on (one does power on for a while and then shuts down). One of them does not even respond to IPMI -> I guess some part has failed and doesn't bring the board up for the IPMI to function.

Has anyone found a way to revive those boards? Or is it just for the recycling yard?
The symptoms of a failed capacitors.
 

Thanos

New Member
Jun 16, 2014
16
0
1
54
The symptoms of a failed capacitors.
Oh, really.... it’s usually a failed capacitor. I read somewhere in this thread that they are located under the motherboard (those that have failed for others)?? I hope not. If anyone has a photo/point something to locate them, I can replace them with fresh ones... a few cents component fails the entire m/b.

Thank you for the heads up!
 

ashevchuk

New Member
Sep 25, 2017
3
1
3
36
Oh, really.... it’s usually a failed capacitor. I read somewhere in this thread that they are located under the motherboard (those that have failed for others)?? I hope not. If anyone has a photo/point something to locate them, I can replace them with fresh ones... a few cents component fails the entire m/b.

Thank you for the heads up!
TYAN_S8208.jpg
 
  • Like
Reactions: Thanos

deeach

New Member
Jul 15, 2019
2
4
3
Potential Dell DCS6005 NIC solution

Wanted to thank everyone on this forum for the great info, and give back a bit by sharing the following: Like many others, I found that the on-board NICs would only work with lower-end unmanaged switches; my TP-Link 1600-52ts managed switch was not able to communicate with the NICs. However, by manually setting the switch port speed, I am able to communicate reliably with the DCS6005 NICs. I left Duplx set to 'auto'. I am unable to test this on other switches; however, hopefully this will help.
 
  • Like
Reactions: Tha_14

ashevchuk

New Member
Sep 25, 2017
3
1
3
36
My 6 nodes work fine with a mikrotik RB260GSP, but unstable with the CRS326-24G-2S + RM. At the same time, the settings of the media operation mode do not affect the operation of the network interfaces.
 
Last edited: