Hello folks,
i have the following situation and would like to ask for advice
I am using two dual port 25Gbps NICs MCX4121A-ACAT installed next to each other (pic attached).
Both cards are running latest available firmware.
OS is latest truenas core 13 U6.1.
I know these cards are running hot.
Error in dmesg is:
mlx5_core1: WARN: mlx5_temp_warning_event:227pid 12): High temperature on sensors with bit set 0 0x1
mlx5_core0: WARN: mlx5_temp_warning_event:227pid 12): High temperature on sensors with bit set 0 0x1
core0 and core1 are mce0 and mce1 adapters respectively.
When i check temp reporting from console, one of the cards is showing strangely high temps:
top card:
MCE0 -> dev.mce.0.hw_temperature: 107000 (107°C)
MCE1 -> dev.mce.1.hw_temperature: 107000 (107°C)
bottom card:
MCE2 -> dev.mce.2.hw_temperature: 75000 (75°C)
MCE3 -> dev.mce.3.hw_temperature: 75000 (75°C)
I understand that's temp on the chip die itself, but still there is 30 degrees difference!
There is active cooling in front of the cards on 10cm distance (12cm fan ~1800 rpm).
I plan to measure temps with IR thermometer next time when i open the case.
Do you think issue may be with heat sink on top card?
Any other ideas are highly appreciated
i have the following situation and would like to ask for advice
I am using two dual port 25Gbps NICs MCX4121A-ACAT installed next to each other (pic attached).
Both cards are running latest available firmware.
OS is latest truenas core 13 U6.1.
I know these cards are running hot.
Error in dmesg is:
mlx5_core1: WARN: mlx5_temp_warning_event:227pid 12): High temperature on sensors with bit set 0 0x1
mlx5_core0: WARN: mlx5_temp_warning_event:227pid 12): High temperature on sensors with bit set 0 0x1
core0 and core1 are mce0 and mce1 adapters respectively.
When i check temp reporting from console, one of the cards is showing strangely high temps:
top card:
MCE0 -> dev.mce.0.hw_temperature: 107000 (107°C)
MCE1 -> dev.mce.1.hw_temperature: 107000 (107°C)
bottom card:
MCE2 -> dev.mce.2.hw_temperature: 75000 (75°C)
MCE3 -> dev.mce.3.hw_temperature: 75000 (75°C)
I understand that's temp on the chip die itself, but still there is 30 degrees difference!
There is active cooling in front of the cards on 10cm distance (12cm fan ~1800 rpm).
I plan to measure temps with IR thermometer next time when i open the case.
Do you think issue may be with heat sink on top card?
Any other ideas are highly appreciated
Attachments
-
629.5 KB Views: 18