I'm running Proxmox 9.2.2 with 7.0 kernel. Works great. Not doing anything stressful though.Is anybody successfully running Proxmox with a 6.17 or 7.0 kernel?
I'm running Proxmox 9.2.2 with 7.0 kernel. Works great. Not doing anything stressful though.Is anybody successfully running Proxmox with a 6.17 or 7.0 kernel?
Okay, that's encouraging to hear. I've been chasing a crazy problem for months now - mine will spontaneously reboot after a few hours running any kernel newer than 6.14.8, so 6.14.11 or any 6.17/7.0 I've tried. I've posted on the proxmox forums with not much luck. Rock solid on 6.14.8.I'm running Proxmox 9.2.2 with 7.0 kernel. Works great. Not doing anything stressful though.
Crucial T500 2TB SSD, 128GB Crucial DDR5-5600 RAM, assorted BIOS tweaks to minimize power consumption and turn off LED lights. BIOS is 1.02. They pulled 1.03 which is believed to have issues and attempting to downgrade from 1.03 to 1.02 may brick your system. I did clean install Proxmox 9.1 and upgrade from there. I see that there's a new 9.2 .iso, maybe grab that and try clean installing?Okay, that's encouraging to hear. I've been chasing a crazy problem for months now - mine will spontaneously reboot after a few hours running any kernel newer than 6.14.8, so 6.14.11 or any 6.17/7.0 I've tried. I've posted on the proxmox forums with not much luck. Rock solid on 6.14.8.
So... I hope you'll forgive me asking some questions - what SSD are you running? how much RAM? did you do any BIOS tweaks?
Okay, so we have the same RAM, different SSDs (mine is a Samsung 990 Pro). Also running BIOS 1.02, they pulled 1.03 before I had a chance to think about upgrading.Crucial T500 2TB SSD, 128GB Crucial DDR5-5600 RAM, assorted BIOS tweaks to minimize power consumption and turn off LED lights. BIOS is 1.02. They pulled 1.03 which is believed to have issues and attempting to downgrade from 1.03 to 1.02 may brick your system. I did clean install Proxmox 9.1 and upgrade from there. I see that there's a new 9.2 .iso, maybe grab that and try clean installing?
Sounds like a plan. There was a guide linked to earlier in this thread for low-power operation... but on BIOS 1.02 there's a setting for 45W operation that probably makes all that redundant. Your 990 Pro runs hotter than my T500 and should have a heatsink but I doubt that's the problem.Okay, so we have the same RAM, different SSDs (mine is a Samsung 990 Pro). Also running BIOS 1.02, they pulled 1.03 before I had a chance to think about upgrading.
Do you remember what those assorted BIOS tweaks were? I mostly haven't done any, although I did try a few of the documented ones.
Clean installing sounds like such a... Windows... way of dealing with things, but... that does give me an idea, I could always boot the 9.2 ISO, leave it sitting there for a few hours, and see if it spontaneously reboots. If it doesn't then think about reinstalling.
I will look for that guide and... I don't think I noticed that 45W setting. Will look again.Sounds like a plan. There was a guide linked to earlier in this thread for low-power operation... but on BIOS 1.02 there's a setting for 45W operation that probably makes all that redundant. Your 990 Pro runs hotter than my T500 and should have a heatsink but I doubt that's the problem.
I'm running the same setup like you with Samsung 990 Pro's. well I've got a 3 Node Proxmox Cluster in my Basement Rack. The temperature of the Basement is always around 20-21°C. all the 990 Pro running in average like on the image (Zabbix Monitoring).Okay, so we have the same RAM, different SSDs (mine is a Samsung 990 Pro). Also running BIOS 1.02, they pulled 1.03 before I had a chance to think about upgrading.

How did you figure out the SSD was the issue?I'm running the same setup like you with Samsung 990 Pro's. well I've got a 3 Node Proxmox Cluster in my Basement Rack. The temperature of the Basement is always around 20-21°C. all the 990 Pro running in average like on the image (Zabbix Monitoring).
View attachment 48958
I had a similar issue at the beginning of this year with one of my nodes. The reboots, or rather system lockups, mostly happened during heavy load, such as backups to a NAS.
It took me a while to figure out that, in my case, the cause was a faulty Samsung 990 Pro. Heavy read/write activity caused the NVMe SSD to lock up, which then froze the whole system. It was never related to the NVMe temperature.
I replaced the faulty 990 Pro with a new one, and since then everything has been rock solid. I never had this issue with my other two nodes.
Of course, your reboots may have a completely different root cause, but after reading your comments, I thought I would share my experience.
What SSD/RAM do you have?@VivienM I have exactly the same issue with you witch now I install pve 9.1 and stay on 6.14.8.
Codex say maybe its a AMD microcode update crash due to kernel update.
Some days ago I return my console to factory, they tested and says all fine. They just install a brand new windows11 witch works fine even when full boost cpu and ssd. So I assume it's not the cause of ssd temputure.
Keep watching this thread.
The system log file showed suspicious issues regarding the file systemHow did you figure out the SSD was the issue?
I've certainly considered the SSD. Updated the firmware too which did not help...
kingston + Samsung 980pro + Kioxia. every ssd will fail tested one by one.What SSD/RAM do you have?
If that's aimed at me, then... maybe, but I guess my question would be, why is my power supply fine for 6.14.8 and older, but can't handle anything newer?Power supply?
Sounds wired, my three ssd all works fine on another machine(intel i9 10850). for me it's not ssd problem.More directed at zzjin who has SSD problems.
I had an old Cubox arm machine at one point and it had an SSD hung off a USB port and I retired it when the 3rd SSD in a few weeks decided to crap out. I am fairly sure that in my case the USB voltage was fluctuating and causing the problem. I retired the box and left all 3 SSDs on the shelf and came back to them about 3 years later and they had all recovered and were all completely blank. They're all still working now.