LGA 1700 Alder Lake "Servers"

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

chenxiaolong

New Member
Nov 16, 2020
10
6
3
Odd about the NVRAM space issue. Tried clearing the CMOS? Other steps like reseating the ram?
Yep, I've tried both of those. I tried transplanting the motherboard + CPU to a different system with no luck either. Unfortunately, I don't have enough hardware to test just the motherboard or CPU individually.
 

custom90gt

Active Member
Nov 17, 2016
325
114
43
41
Yep, I've tried both of those. I tried transplanting the motherboard + CPU to a different system with no luck either. Unfortunately, I don't have enough hardware to test just the motherboard or CPU individually.
Does the IPMI allow you to upgrade or downgrade the bios?
 

RolloZ170

Well-Known Member
Apr 24, 2016
8,984
2,856
113
germany
I'm not sure what caused it, but I did start to see weird issues after upgrading the BIOS to 3.3a initially and then 3.3b. When saving BIOS settings, it was occasionally complaining about lack of NVRAM space. Certain things just silently failed to save, like custom secure boot keys. I didn't see any large EFI variables in `/sys/firmware/efi/efivars` at the time so perhaps something else was eating up NVRAM space.
i would give a ch341a flasher a chance, its already dead, so what should happen.
 

chenxiaolong

New Member
Nov 16, 2020
10
6
3
The manual has a bios recovery section, maybe that would help?
Unfortunately not. Based on LEDs not lighting up on the USB flash drive or with a keyboard, I'm guessing it's not getting to the point where the ports get powered up. (I tried every available USB port)

i would give a ch341a flasher a chance, its already dead, so what should happen.
Thanks! I might give that shot if it doesn't require desoldering anything. Had to go through that years ago with my old Thinkpad W520 and would rather not do it again.
 
Last edited:

rekd0514

Member
May 11, 2015
40
14
8
Does anyone have the 3.1 BIOS file for the X13SAE-F? My unraid server has been crashing every few days since I updated to 3.3 and as far as I know that is all I have changed. I would like to test that theory by reverting back. I didn't see any location to download these short of contacting Supermicro.

I read Intel is releasing a new microcode again anyway (0x12B). Intel Rolls Out Third Major "0x12B" Microcode Patch To Fix 14th & 13th Gen CPU Instability Issues

"During its ongoing investigation, Intel discovered another issue with Vmin shift in affected CPUs, which can cause the motherboard and BIOS code to request elevated voltages during idle or light activity periods."

This part makes me think the instability in the low power modes is causing the crashing because I am using powertop in unraid to reduce idle power consumption on the 13600k.
Quoting myself here, but so far 4 days uptime on 3.1 and no crashes in Unraid. I don't think I changed any BIOS setting in 3.3 that would cause crashing. I figured I would at least share my experience.
 
  • Like
Reactions: chenxiaolong
Jan 3, 2023
69
36
18
The tale of two 13900K's.

So my family has (or had) two 13900K's in operation. The first one is in my X13SAE based workstation, which has been well documented here, which I built in late 2022. This machine continues to be as solid as a rock. The second is my younger son's gaming computer which I built him for his birthday in mid June of 2023. This machine used the MSI MPG Z790 CARBON motherboard with a 13900K. About three months ago my son reported that the machine was starting to crash randomly, especially during games. He was using the MSI performance settings in the BIOS, which, unfortunately, set the PL's at 4096W and (probably) removed the other Intel protections.

I went ahead and upgraded the BIOS to include the microcode and settings fixes and then set the BIOS settings with standard Intel performance settings, then purchased a 14900KF for him and replaced the processor. The crashing stopped.

I opened up a support ticket on October 3 with Intel to essentially get a warranty replacement for the CPU as this was clearly the symptoms of the Vmin shift due to ring bus voltage overdriving that has been much covered in the press. There were several communication rounds as they rightfully did an assessment of the environment and the work I had done to determine the processor was defective. In the end they agreed and I received a brand new 14900K on October 23 as a replacement for the bad 13900K that I returned to them.

A few observations. My guess is that I really am not just lucky with the 13900K on my Supermicro X13SAE based workstation. Supermicro is very conservative and does not allow extreme settings on the CPU. Pretty much the only thing you can do is set the PL's, and in this I am within the Intel performance setting recommendations. (They recommend with performance coolers, 253W/253W for PL1/2, and I am using 230W/253W. I think the Supermicro conservative approach is much better for the CPU, even before the microcode updates.)

The degradation caused by using the extreme performance settings on the gaming motherboards is real and will destroy a Raptor Lake CPU over time. Everyone using a gaming motherboard should use the Intel default or Intel performance settings and ensure the PL's are set to reasonable values, and that the current limit is not overridden. This, IIRC should be 309A maximum. Yes it will cost a couple of percent of performance compared to all limits removed where the CPU is totally dependent on thermal protection alone, but it will save you from a CPU replacement.

I do not know how the other W680 boards are with respect to CPU settings, but folks using the other boards with a Raptor Lake should pay careful attention to this.

I think I will replace one of my older computers that has an Ivy Bridge-E based motherboard with a new X13SAE and the warranty replacement CPU I just got! :)
 
Jan 3, 2023
69
36
18
I decided to change out the good 13900K in my workstation for the 14900K warranty replacement to 1) test the warranty replacement processor, and 2) gain the 5% speed increase. I ordered another X13SAE and associated parts and will replace the guts of that older Ivy Bridge computer with the 13900K when the motherboard arrives.

The 14900K is good and does appear to be about 5% faster. I set the PLs at 253W/253W, the Intel performance recommendation.
 
Jan 3, 2023
69
36
18
So... I spoke too soon about the warranty replacement 14900K. It runs for about 5 to 7 days and then the computer locks up hard without logging any error in /var/log/messages. To confirm it is the CPU I switched back in the 13900K and the machine runs perfectly.

Note that this motherboard has BIOS 3.3b, which has the 129 microcode, and has operated with my 13900K perfectly for almost two years under heavy load. So this all but confirms the 14900K CPU is not stable. I think this means there is more to this Raptor Lake story than Intel is willing to admit. This is a brand new CPU straight from the factory, running with proper power limits on a very conservative motherboard...
 

unwind-protect

Active Member
Mar 7, 2016
609
248
43
Boston
So... I spoke too soon about the warranty replacement 14900K. It runs for about 5 to 7 days and then the computer locks up hard without logging any error in /var/log/messages. To confirm it is the CPU I switched back in the 13900K and the machine runs perfectly.

Note that this motherboard has BIOS 3.3b, which has the 129 microcode, and has operated with my 13900K perfectly for almost two years under heavy load. So this all but confirms the 14900K CPU is not stable. I think this means there is more to this Raptor Lake story than Intel is willing to admit. This is a brand new CPU straight from the factory, running with proper power limits on a very conservative motherboard...
I'm patiently waiting for the gaming datacenters that proved the problem statistically to confirm or deny whether the new BIOS fixes the problem for them.
 
Jan 3, 2023
69
36
18
I think that the real situation is that 1/3 of the Raptor Lake CPU's manufactured are marginal right from the get-go, and the BIOS updates probably works for some less marginal examples. Others, when put under stress, even with the BIOS fixes, are unstable.

What is surprising is that Intel hasn't developed an internal test before shipping them out.
 
  • Like
Reactions: sam55todd

OP_Reinfold

Member
Sep 8, 2023
99
48
18
Didn't Gamers Nexus mention that 'his opinion' is that it is a fab issue, the 'power' thing is just how fast it will degrade, bottom line was that it is your money and if you're happy to believe the Intel 'issue is just relative to how much power is pumped into it and our microcode fixes it'... oh and how quickly they jumped into the Ultra series not long after that lol... even media moved on very quickly 'nothing to see here' lol.... well, a US-gov protected chip giant, say no more.

My advice, grab a 12th gen KS (a great cpu) or an Ultra (but will need a new board and don't think a 'WS' board is out for them just yet)... skip the c'raptors if you can, unless you grab one for pocket change.
 
Jan 3, 2023
69
36
18
I think it is a fab issue. If you are lucky and get a good one, I think you will be good for years as long as you don't try and pump unlimited current/wattage into it. The new microcode is an insurance policy for that. My original 13900K is an example. Been running flat out for over 2 years without a peep of trouble.
 
  • Like
Reactions: OP_Reinfold

OP_Reinfold

Member
Sep 8, 2023
99
48
18
I think it is a fab issue. If you are lucky and get a good one, I think you will be good for years as long as you don't try and pump unlimited current/wattage into it. The new microcode is an insurance policy for that. My original 13900K is an example. Been running flat out for over 2 years without a peep of trouble.
The gamble of not knowing until the day you know is a headache most would rather not have.

Problem with energy conscious homelabs is that most tend to go for 1 or 2 gen older cpus to the current, it keeps costs down to reasonable levels, in this case that option is out considering the gamble. Could look at AMD offerings, but if one desires inbuilt igpu with decent decoding then that's off the table too.

Power wise there isn't much in it now between AMD and Intel if you're running a 10G+ nic, because it locks out the deep power states on the Intels, plus the Intels momentarily constantly shoot up on power consumption, which the average realtime watts wall power meter misses completely, the AMDs have a more predictable power curve. Obviously best test would be to run like for like doing the exact same thing and compare apples and oranges of kwatts used.

The thing that does my head in, is that 20/24 pcie lanes limit is just really cheap, we should have at least 32 lanes on standard consumer cpu parts.

Everything is being invested in 'cloud' tech... the days of people actually 'owning' their own gear at home are severely numbered, all world leaders have bought into that bs 2030 agenda nonsense... the future will be 'thin-clients' for the rat-race, and all your data and crunching ability is behind a cloud paywall... gaming, office, browsing, whatever, it all will happen on corporate clouds... oh well, it was only a matter of time, at least we got to enjoy it while we could.
 
Last edited:
Jan 3, 2023
69
36
18
The PCIE lane limitation is a good point. My old E5-2687W v2 Xeon has more PCI lanes than the new chips... You have to spend $$$ for a Xeon to get the proper amount of PCIE support.
 

ITN0B

New Member
Apr 7, 2024
4
3
3
Looking to get MW34-SP0 rev1.1 they listed Operating temperature: 10°C to 40°C.
meanwhile my main pc motherboard easily reach 40C, should I be worried about that ?
 

JanR

Member
Nov 5, 2023
40
24
8
I think that the real situation is that 1/3 of the Raptor Lake CPU's manufactured are marginal right from the get-go, and the BIOS updates probably works for some less marginal examples. Others, when put under stress, even with the BIOS fixes, are unstable.
I agree to that.

At work, we have an 13900K running stable 24/7 on a X13SAE-F since last summer. At home, I have a 14900K on a X13SAE running since October 2023. In this time, it crashed four times without any trace in the Linux logs. Four times sounds not that bad but this is a workstation/server running 24/7 and I bought the very expensive W680 board especially to avoid such problems. Its predecessor, a Xeon X3470 on a server board, run 13 years (!) 24/7 without a single crash (obviously with some reboots for Linux kernel updates in between) - that is what I expect from this class of hardware.

To include more information: Both machines are set to PL1=PL2=125 W so no excessive thermal stress. Furthermore, once the issues appeared in press, I reduced the allowed turbo frequencies for both machines. In case of 14900k, I used 5,2 GHz for P-cores and 4,2 GHz for E-cores. This limits Vcore even in single core load to just 1.2 V.

However, based on what I read the voltage spikes we can see using the onboard monitoring are not the only problem but very short spikes usually not catched by these measurements. Therefore, my 1.2 V are only what I see - this does not mean that there are no short term peaks.

More observations: The first crash occurred within the first two month with no limits on frequencies. Than, it run more or less stable with microcode 0x123 (no BIOS update, only linux microcode update) and limited frequencies. With 0x125 (again linux) it crashed once after some months and with 0x12b it crashed twice in three weeks (first linux microcode, second BIOS to 4.1). The 13900K at work runs the same OS with the same configuration and got more or less the same Microcode updates - no issues so far.

Therefore, I start to consider using Intels warranty and swap the CPU. So far, I heard that they do not offer to first send a new processor so I have to find some LGA1700 CPU for the meantime since this machine has to run. I just hope I get something better than @James C. Owens with his 14900K that crashed although it never saw the high voltages utilized by older microcode versions.

Therefore, on the 13900K vs. 14900K issue he and me observed the same behavior, but I wonder if this has really something to do with 13th vs. 14th generation since this is the same die in the same version. The only difference is clock speed but three out of my for crashed happened at turbo frequency settings below (!) that of the 13900K. The one this morning was at 5,2/4,2 with PL1=PL2=110W and it most likely happened at idle (but this does not mean that there was no short activity that loaded one or more cores for some time).

Does anybody has a benchmark for Linux that definitely crashes an affected processor?

I already played with mprime but this did not cause any problems.