ASUS KRPA-U16 not working with EPYC 7D13 (100-000000322)

zedei

New Member
Jul 3, 2021
11
1
3
My ASUS KRPA-U16 motherboard is working fine with EPYC ROME 7742. I upgraded the BIOS to both ROME and MILAN latest versions using BMC web ui. Upon installing the EPYC 7D13 processor it does not POST. If I replace the CPU with the original 7742 is works again.
It's hard to find any info on this 7D13 processor. AMD tech support doesn't even know what it is. lol.
 

rootwyrm

Member
Mar 25, 2017
60
68
18
www.rootwyrm.com
The 7D13 is a low power variant of the 7V13, both of which are contract customer CPUs. Not OEM-only - contract customer only. You would have to extract the signed AGESA including the signature from the source system board. And the 7D13 variant has extremely specific microcode.
 
  • Like
Reactions: chraac and zedei

RolloZ170

Well-Known Member
Apr 24, 2016
1,556
402
83
54
It's hard to find any info on this 7D13 processor. AMD tech support doesn't even know what it is.
7D13 is imho the MILAN pendant of the ROME 7D12. DELL OEM, preinstalled in datacenter servers (replaced by customers need) and vendor locked then.
 

RageBone

Active Member
Jul 11, 2017
548
138
43
7D13 is imho the MILAN pendant of the ROME 7D12. DELL OEM, preinstalled in datacenter servers (replaced by customers need) and vendor locked then.
PSB Locks cause a Postcode in the 78 to 7F Range if i remember correctly.
Anything else is not caused by a vendor-lock.
Missing microcode is the most reasonable explanation in my opinion.
Especially if those CPU are indeed of a different stepping then retail.
 

rootwyrm

Member
Mar 25, 2017
60
68
18
www.rootwyrm.com
PSB Locks cause a Postcode in the 78 to 7F Range if i remember correctly.
Anything else is not caused by a vendor-lock.
Missing microcode is the most reasonable explanation in my opinion.
Especially if those CPU are indeed of a different stepping then retail.
You are correct (I can't say exactly which, because it does vary by BIOS.) It will also indicate PSB fuse trip if you have the appropriate hookups (Asus board doesn't though.) The 7D13 is, as I said, a low-power contract customer part with a unique SKU and unique microcode. Which is literally all the information I or anyone else has on it to date. This may in fact, be the first sample to be found in the wild at all.

And honestly, even I don't know that I could get it working based on that. AGESA's signed, and even attempting to run any variant microcode would require breaking seal. With the right board you could load some special pieces into the BIOS, but that's no guarantee the CPUIDs even present. And retail boards literally can't do it - it's more than just using SPI to write the BIOS, even without PSB enable.
 

RageBone

Active Member
Jul 11, 2017
548
138
43
@rootwyrm i understand what you are saying but some things don't make sense to from your choice of words.

PSB fuse trip if you have the appropriate hookups (Asus board doesn't though.)
What Hookups do you mean that the krpa is supposed to not have?

And what do you mean with "seal" and breaking it?
AGESA's signed, and even attempting to run any variant microcode would require breaking seal.
you are specifically mentioning signatures previously so i think you mean something else?

Are you talking about very specific pieces of "something" or are you just generalizing?
And what is "the right board" to you?
Do you mean the Reference Boards?
With the right board you could load some special pieces into the BIOS
I kinda disagree, so far my SM H11ssl and TRX40 boards were most supportive and accepting when ****ing with them.
I guess, here it depends on the the definition of "it". What is it you mean?
And retail boards literally can't do it
 

ExecutableFix

Active Member
Nov 25, 2019
123
60
28
I owned a 7D12 for a while and ran it in my home system. It does NOT require a custom microcode (in fact, none of the custom SKUs do because they share the same stepping as retail parts) and it was not vendor locked. A SKU is only vendor locked if it has previously been installed in a Dell or HP system. These SKUs are not meant for HP or Dell, so there’s no factory lock applied to them.

The 7D12 only has 4 memory channels which indicates it was made for a specific board. It will still work on normal boards, but the memory has to be in the right slots, which is why you need to try to boot it with one DIMM first. My guess would be that the same applies to the 7D13.
 

RageBone

Active Member
Jul 11, 2017
548
138
43
Oh, I might’ve confused HP and Lenovo. I know HP has the ability to if they want
I was told HP does it their own undisclosed "proprietary" way.
Lenovo and Dell use PSB to lock CPUs.

Again in case of a PSB Lock, it is expected that Post gets stuck at a code from 78 to 7F.
78 was observed to be the code, as far as the crystal ball states, it should be code 7D, it isn't, don't ask me why.

My guess would be that the same applies to the 7D13.
At this point in time, we simply don't know enough and i can only hope that the OP starts delivering, for example postcodes.

If it is a new stepping compared to retail Milan, i am certain that it needs additional microcode which likely isn't included.

If one were to have a bios meant for those CPUs, we could look further into it.
 

zedei

New Member
Jul 3, 2021
11
1
3
Here are the POST codes on the ASUS KRPA-U16. They go in a continuous cycle:
A1 34 02 50 dE Ad A1 34 02 50 dE Ad ...
 
Last edited:

rootwyrm

Member
Mar 25, 2017
60
68
18
www.rootwyrm.com
Oh, I might’ve confused HP and Lenovo. I know HP has the ability to if they want
Actually, every ODM and OEM has the capability to use PSB fusing. And HP actually does use PSB fusing. Hell, I use PSB fusing. The difference is that HP and everyone who isn't the giant bag of dicks that is Dell uses a different PSB method which does not lock the CPU to a specific motherboard. There's a general outline of some of the pieces involved in the AMD public SEV API Specification under the Platform Management section.

Quoth HP: "HPE does not use the same security technique that Dell is using for a BIOS hardware root of trust. HPE does not burn, fuse, or permanently store our public key into AMD processors which ship with our products. HPE uses a unique approach to authenticate our BIOS and BMC firmware: HPE fuses our hardware – or silicon – root of trust into our own BMC silicon to ensure only authenticated firmware is executed. Thus, while we implement a hardware root of trust for our BIOS and BMC firmware, the processors that ship with our servers are not locked to our platforms."

Translation:
HPE places the fusing on their side of the equation - the BMC - instead of burning their key into the processor and turning it into a fancy paperweight. Since the processor isn't generally an information sensitive component anyways, so resale carries minimal to zero security risk. (BMC compromise, however, is high risk as it contains passwords, authentication, and network information even when power is removed. And can intercept a hell of a lot on a running system to say the least.)
This is why an HPE sourced CPU works in anything. HPE's BMC confirms the AMD public key against an HPE managed root of trust. As long as the processor is genuine and hasn't been tampered with? HPE's cool with it. CPU side keys aren't kosher? Board may brick. Tamper with the BMC to alter, say, just the HPE logo on the login screen? Board will brick. Board went and bricked? Take your $16000 of CPUs, move them to a new $800 board, and you're back in business.

On the OEM/SI side, it's about the same. Because seriously, bricking a CPU just to screw over customers is nothing but a dick move and par for the course with Dell EMC. And you can hypothetically set the CPU side PSB eFuse without locking it to a specific vendor's specific model with a specific serial number. But it makes a LOT more sense from not only a cost but a serviceability perspective to brick the much more failure prone system planar when the ARKs don't line up than it does to brick a CPU because of a BMC firmware flash failure. And it's a hell of a lot cheaper too.

"Wait, what? Then why doesn't Dell do that?"
Because Dell's ethical standards are basically 'anything that makes Mikey a few extra bucks might get you not fired.' There's a reason Intel has consistently called them the best friend money can buy. That's exactly the kind of company they are. If they can use deliberate bricking to 'justify' charging you three times as much on disservice contracts because "well we have to replace the CPUs every time too"? They'll try and find some way to make the bricked CPUs your fault at the same time.


Here are the POST codes on the ASUS KRPA-U16. They go in a continuous cycle:
A1 34 02 50 dE Ad A1 34 02 50 dE Ad ...
Decode should be:
  • Reset of external IDE/SATA (though I wonder if it may be inverted 0x1A which is DXE boot service)
  • Begin CPU transition to normal mode
  • Application processor pre-microcode
  • Memory initialization failure (retry possible)
  • Unrecoverable DXE failure in microcode/firmware load - should be specifically unable to load from the board ROM side
  • Aaaaaaand the board tries to boot with the CPU still in real mode after an unrecoverable DXE failure. Are you serious with this Asus? Sigh.
Note that this assumes Asus didn't completely mangle Aptio and just decide to make up all their own UEFI->80h status codes with no relation to reality. But the likelihood of that is basically nil. I presume that the system has zero display output, so that puts it into needing a BIOS with debugging PEIMs (probably don't have space anyway, looks to be a 64Mbit) and a JTAG (no headers present) or AMI Debug.

edit: should clarify, because it's a 2-digit and not full UEFI code, DE can only be interpreted as a 'generalized DXE microcode/firmware load failure.' There isn't enough information to identify if it is IOD, CCX, or peripheral.
 
Last edited:

zedei

New Member
Jul 3, 2021
11
1
3
These are the code meaning as per the manual:
A1: Ide reset
34: CPU post-memory initialization
02: microcode
50: Memory initialization error
dE: not listed
Ad: Ready to Boot event
 

zedei

New Member
Jul 3, 2021
11
1
3
Since this is an endless loop, I didn't note the starting code. Here are some codes that come before the loop:
15
C2
C8
63
21
dE
Ad
A1
34
02
50
dE etc. etc.

So it seems the last code of the cycle is 50: Memory initialization error and then it begins again with dE (whatever that means).
I have tried with different memory sticks (that work fine when the 7742 CPU is installed).
 

RageBone

Active Member
Jul 11, 2017
548
138
43
As far as i can see, the codes asus provides are bonkers.

Experience for instance shows that without any ram installed, you get stuck at postcode 10 and nothing else.
Expected would be something in the 50s for memory issues, likely 53 or 55 from what i remember from AMI on x99.

Additionally, postcodes on Epyc seem to not be ordered "deterministic", i'd say.
On my H11ssl, code 55 returns multiple times.
According to my crystal ball, it stands for an SMBus Transaction collision.
The crystal ball also seems to mainly contain error codes, which seems weird to me but i have not found anything better yet.

Postcode 15 seems to appear instead of code 10 when you have any memory installed.
till and including code 50, everything looks familiar to me.
I will take a closes look at my board tomorrow.
Crystal Ball interprets 50 as a failure to claim ownership of SMB.
Additionally, the previous 02 can be interpreted as a generic memory error.

What i can say so far is that you don't appear to have the usual CPU incompatibility issues that we have experienced with for instance ES CPUs of A0 steppings.
Such issues would likely manifest in D0 or D1
Your system also does not quickly jump and stay at 02 which could be serious PSP-BL failure.
So i'm sorry that this isn't more then a guessing game at this point.

Does the BMC log anything?
 

zedei

New Member
Jul 3, 2021
11
1
3
System log from BMC:


  • ID: 18 October 25th 2021, 9:54:34 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][libipmi_AMIOEM.c:5987]Unable to get Current Active Image 1c1 -
  • ID: 19 October 25th 2021, 9:54:34 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][misc.c:107]Error in getting dual image active image configuration::449 -
  • ID: 16 October 25th 2021, 9:54:31 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][libipmi_AppDevice.c:764]Got invalid data field for SOl permissions eventhough userr is not disabled..adjust -
  • ID: 17 October 25th 2021, 9:54:31 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][libipmi_AppDevice.c:764]Got invalid data field for SOl permissions eventhough userr is not disabled..adjust -
  • ID: 14 October 25th 2021, 9:54:29 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][rest_default.c:359]ServiceRet 0 -
  • ID: 15 October 25th 2021, 9:54:29 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][rest_default.c:393]channel no wRet 0 -
  • ID: 13 October 25th 2021, 9:53:57 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:357]Renewing DNS for eth0 interface. -
  • ID: 12 October 25th 2021, 9:53:56 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:200]DHCP monitor: Renewing eth0 interface for IPv6 -
  • ID: 11 October 25th 2021, 9:53:55 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:160]DHCP monitor: Renewing eth0 interface for IPv4 -
  • ID: 9 October 25th 2021, 9:53:49 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:189]DHCP monitor: Releasing eth0 interface for IPv4 -
  • ID: 10 October 25th 2021, 9:53:49 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:205]DHCP monitor: Releasing eth0 interface for IPv6 -
  • ID: 8 October 25th 2021, 9:53:48 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:357]Renewing DNS for eth0 interface. -
  • ID: 7 October 25th 2021, 9:53:47 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:200]DHCP monitor: Renewing eth0 interface for IPv6 -
  • ID: 6 October 25th 2021, 9:53:46 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:160]DHCP monitor: Renewing eth0 interface for IPv4 -
  • ID: 4 October 25th 2021, 9:53:41 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:189]DHCP monitor: Releasing eth0 interface for IPv4 -
  • ID: 5 October 25th 2021, 9:53:41 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:205]DHCP monitor: Releasing eth0 interface for IPv6 -
  • ID: 2 October 25th 2021, 9:53:40 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:200]DHCP monitor: Renewing eth0 interface for IPv6 -
  • ID: 3 October 25th 2021, 9:53:40 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:357]Renewing DNS for eth0 interface. -
  • ID: 1 October 25th 2021, 9:53:39 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:160]DHCP monitor: Renewing eth0 interface for IPv4 -
 

rootwyrm

Member
Mar 25, 2017
60
68
18
www.rootwyrm.com
Since this is an endless loop, I didn't note the starting code. Here are some codes that come before the loop:
15
C2
C8
63
21
dE
Ad
A1
34
02
50
dE etc. etc.

So it seems the last code of the cycle is 50: Memory initialization error and then it begins again with dE (whatever that means).
I have tried with different memory sticks (that work fine when the 7742 CPU is installed).
Yep. People who think they know things, like "asus codes are bonkers"?
They're proving that they know nothing at all.
I have manuals, documents, standards, and NDAs.
So, amazingly, having access to actual unimpeachable and guaranteed correct documentation, I know what I'm interpreting.

The description is exactly as I gave it, no matter what their not identical gamer board manuals say. DE is a reserved UEFI code for DXE microcode load fault. It was unable to get a current active image for a peripheral installed, leading to the DXE failure. I would have to have the CPU on the bench with one of my boards and a full toolkit. I'm quite certain 1A is an inversion of A1 (common defect.) Behavior changes none at all. It hits DXE fault attempting to load microcode or firmware, and says that it can't find enough to continue DXE.
We don't even know if this processor is fused because it can't even exit real mode.
 

RageBone

Active Member
Jul 11, 2017
548
138
43
@zedei how often have you let it cycle?
Asus Boards love to reset multiple times, might actually be that everything is good and you just got spooked?
Happened to me and others before.


Yep. People who think they know things, like "asus codes are bonkers"?
To me, those lists of codes look exactly like those from previous Aptio 5 Days on for example X99.
If i have to speculate, Scaleable probably still uses those.

AMD on the other hand changed those for certain.
And the only things i have to come to that conclusion are experience on actual Hardware and leaks with juicy contents, hence the crystal ball.

So yes, i know actually pretty little. That does not keep me from using the little i have to try and be productive.
Additionally, i would like to have my questions answered to change the lack of knowledge if possible.

Onto the interpretation of Postcodes.
At this point, all we have are 2 figure Hex Numbers send on Port 0x80 that could be from anything including the PSP and Bios code.
You could take a random list and interpret the codes as a weather forecast and that it will be sunny tomorrow.

Comparing real-world behavior and codes with the expected and documented ones should show some huge discrepancies and issues.
Since i have already mentioned some of those issues, let me add ones about my crystal ball of postcodes.

Dell and Lenovo PSB Burned CPUs get stuck at Code 78 which "should" have something to do with the OEM sig not being found.
Which does not make any real sense to me.
Where as Code 0x7D is clearly labeled a PSB error code that would be way more reasonable.
On the List for the KRPA, 78 is "ACPI Module Init" and 7D is reserved.
What does that tell you?

So, amazingly, having access to actual unimpeachable and guaranteed correct documentation, I know what I'm interpreting.
Does that make any sense to you?



One thing i am certain about is that postcodes today are only of limited use in such matters.

Speed: codes only get visible when its an error and its stuck or the code isn't changed for a perceivable while.
There are likely hundreds of codes between those few observed.
Some of those might make a lot more sense if we could read them.

There are more then one two digit "Postcode".
Those additional ones could indicate a more general source for the port 80 code.
For instance if its a PSP BL, PEI or DXE Code.

And there is more then one list of interpretations for those codes.
Or at least, for the abbreviated 2 digit codes on just port 80.

Those codes were a good indicator for faults in the past.
I still can't discern what might be wrong in the OPs case.


System log from BMC:ID: 18 October 25th 2021, 9:54:34 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][libipmi_AMIOEM.c:5987]Unable to get Current Active Image 1c1 -
  • ID: 19 October 25th 2021, 9:54:34 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][misc.c:107]Error in getting dual image active image configuration::449 -
  • ID: 16 October 25th 2021, 9:54:31 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][libipmi_AppDevice.c:764]Got invalid data field for SOl permissions eventhough userr is not disabled..adjust -
  • ID: 17 October 25th 2021, 9:54:31 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][libipmi_AppDevice.c:764]Got invalid data field for SOl permissions eventhough userr is not disabled..adjust -
  • ID: 14 October 25th 2021, 9:54:29 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][rest_default.c:359]ServiceRet 0 -
  • ID: 15 October 25th 2021, 9:54:29 pm AMIF02F7496EBA0 spx_restservice: spx_restservice - - [4067 : 4067 CRITICAL][rest_default.c:393]channel no wRet 0 -
  • ID: 13 October 25th 2021, 9:53:57 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:357]Renewing DNS for eth0 interface. -
  • ID: 12 October 25th 2021, 9:53:56 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:200]DHCP monitor: Renewing eth0 interface for IPv6 -
  • ID: 11 October 25th 2021, 9:53:55 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:160]DHCP monitor: Renewing eth0 interface for IPv4 -
  • ID: 9 October 25th 2021, 9:53:49 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:189]DHCP monitor: Releasing eth0 interface for IPv4 -
  • ID: 10 October 25th 2021, 9:53:49 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:205]DHCP monitor: Releasing eth0 interface for IPv6 -
  • ID: 8 October 25th 2021, 9:53:48 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:357]Renewing DNS for eth0 interface. -
  • ID: 7 October 25th 2021, 9:53:47 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:200]DHCP monitor: Renewing eth0 interface for IPv6 -
  • ID: 6 October 25th 2021, 9:53:46 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:160]DHCP monitor: Renewing eth0 interface for IPv4 -
  • ID: 4 October 25th 2021, 9:53:41 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:189]DHCP monitor: Releasing eth0 interface for IPv4 -
  • ID: 5 October 25th 2021, 9:53:41 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:205]DHCP monitor: Releasing eth0 interface for IPv6 -
  • ID: 2 October 25th 2021, 9:53:40 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:200]DHCP monitor: Renewing eth0 interface for IPv6 -
  • ID: 3 October 25th 2021, 9:53:40 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:357]Renewing DNS for eth0 interface. -
  • ID: 1 October 25th 2021, 9:53:39 pm AMIF02F7496EBA0 dhcpmonitor: dhcpmonitor - - [3480 : 3480 CRITICAL][dhcpmonitor.c:160]DHCP monitor: Renewing eth0 interface for IPv4 -
To me, all of those posted look more like BMC events that are unrelated to Platform and CPU Boot issues.
Are there any other Logs or entries?