ASRock Rack ROMED8-2T Motherboard has thousands critical_interrupt BMC event log entries?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

j.battermann

Member
Aug 22, 2016
82
16
8
44
I just put together a new system and, as soon as I put a PCIe 4.0 x4 M.2 adapter card in it (this one: AORUS Gen4 AIC Adaptor Key Features | Solid State Drive (SSD) - GIGABYTE Global#kf), the event log in the BMC sees a few messages per second stating the BIOS sensor reporting.. something? I am not sure how to read these but I am a bit concerned if.. I should be concerned.

The card and the M.2 drives on it work as expected, so I am not sure what to think of this, but then again, it doesn't feel right if these logs keep piling up.

Does anyone know what these mean?

Thanks,
-Joerg

1647032772228.png
 

j.battermann

Member
Aug 22, 2016
82
16
8
44
you should confront the ASRock Rack support with this.
Yeah I did / am doing.. not 'confronting' but also reach out and they William as usual replied quite quickly, but with no solution, yet.. hence me asking here as some here also have that board and maybe someone has seen this before.
 

jpmomo

Active Member
Aug 12, 2018
531
192
43
Yeah I did / am doing.. not 'confronting' but also reach out and they William as usual replied quite quickly, but with no solution, yet.. hence me asking here as some here also have that board and maybe someone has seen this before.
I have the same board and AIC from gigabyte. I vaguely remember seeing similar pci error msgs. I am in the middle of running some tests and have moved my remaining amd cpu (7443P) to a sm h12ssl as that gave me less headaches for these specific tests. I needed to load up 7 pci aic that were x16 gen4 and the romed8 was the only single socket board that would support that. I needed to switch 2 of the 7 aic on the sm h12ssl to x8 due to the sm boards limitations.

When I get done with these tests, I can move the cpu back to the romed8 and test again with the gigabyte aic and look at the event log. I have the romed8 on a bench now but had it in a workstation for a bit. that workstation had a discreet gpu. I initially ran into some issues trying to update the fw and bios (had a 7763 milan cpu in it for workstation purposes) due to the discreet gpu. I was able to update both by temporarily removing the gpu and just using the onboard vga.

William is pretty good but they don't seem to have very comprehensive testing. Sometimes they haven't seen our use cases before but at least they try to sort them out.
 

j.battermann

Member
Aug 22, 2016
82
16
8
44
Good evening @jpmomo, nice hearing from you again! I have figured out last night what the culprit was with the error messages above: while the Aorus card is PCIe 4.0 and the PCIe slot I had used it in also was set to Auto (and to Gen4 manually sometimes inbetween), one of the m.2 NVME drives on it was actually a PCIe 3.0 one. So even though I would have assumed that bifurcation somehow figures out the PCIe speed per bifurcated lane, as soon as I took that 3.0 m.2 off the card, the errors (mostly) disappeared. What I have done now is keeping PCIe 4.0 m.2 drives and 3.0 ones separate (and got a 2nd such riser card from the local Micro Center) and now the errors have largely disappeared.

I say largely because there still are a few dozen such messages per 24h period and I have no idea why, but that's for now good enough. I'll spend some more time tracking down what these last ones are and where they are coming from, but the main culprit was the mixing of 3.0 and 4.0 m.2 drives in a bifurcated slot.

Regarding the motherboard itself and discreet graphics card(s): does your board and bios have the option to set the primary graphics card? I've got that option in my ROME6DU-2l-2t board.. but not in the ROMED8-2t one.. curiously in the web version of the bios I do.. but then again when I boot up i.e. proxmox, the boot messages all appear on the discreet GPU, not the AST/BMC Remote access screen and I have the suspicion that something's not quite right there. Again.. in the normal / non-web bios the entry is missing for me / my version (3.28L as far as I remember right now).

Thanks!
-Joerg
 

gsrcrxsi

Active Member
Dec 12, 2018
302
102
43
I have found that these Asrock EPYC boards don't really like mixed PCIe gens with the slots left on Auto. best to just define the speed for the device in the slot.

I had an issue like this with an EPYCD8 board, which only does PCIe gen3, but with a PCIe gen4 GPU plugged in, it didnt like it, sometimes threw a bunch of PCIe errors, and sometimes wouldnt boot. setting discrete PCIe gen speeds seems to have solved that.

Rollo, nothing bad happens when the log fills, just overwrites the old entries.
 

Ryo

New Member
Feb 12, 2023
15
1
1
wish i found this forum before i Bought my asrock rack RomeD82T as im having same issues as of 2/12/2023 still not fixed.. when i called asrock rack support thier "technician" immediately accused me of damaging the cpu socket and wanted me to take my server apart and inspect the socket and send him pictures to prove weather I damaged the socket before he would provide any support at all... i think ill be buying a supermicro board and never use asrock rack again..
 

Ryo

New Member
Feb 12, 2023
15
1
1
I just put together a new system and, as soon as I put a PCIe 4.0 x4 M.2 adapter card in it (this one: AORUS Gen4 AIC Adaptor Key Features | Solid State Drive (SSD) - GIGABYTE Global#kf), the event log in the BMC sees a few messages per second stating the BIOS sensor reporting.. something? I am not sure how to read these but I am a bit concerned if.. I should be concerned.

The card and the M.2 drives on it work as expected, so I am not sure what to think of this, but then again, it doesn't feel right if these logs keep piling up.

Does anyone know what these mean?

Thanks,
-Joerg

View attachment 22011
im getting the same errors on 02/12/23
 

Ryo

New Member
Feb 12, 2023
15
1
1
I just put together a new system and, as soon as I put a PCIe 4.0 x4 M.2 adapter card in it (this one: AORUS Gen4 AIC Adaptor Key Features | Solid State Drive (SSD) - GIGABYTE Global#kf), the event log in the BMC sees a few messages per second stating the BIOS sensor reporting.. something? I am not sure how to read these but I am a bit concerned if.. I should be concerned.

The card and the M.2 drives on it work as expected, so I am not sure what to think of this, but then again, it doesn't feel right if these logs keep piling up.

Does anyone know what these mean?

Thanks,
-Joerg

View attachment 22011
are you having the same issue as me as u clear event log and upon reboot it magically reappears?
 

gsrcrxsi

Active Member
Dec 12, 2018
302
102
43
wish i found this forum before i Bought my asrock rack RomeD82T as im having same issues as of 2/12/2023 still not fixed.. when i called asrock rack support thier "technician" immediately accused me of damaging the cpu socket and wanted me to take my server apart and inspect the socket and send him pictures to prove weather I damaged the socket before he would provide any support at all... i think ill be buying a supermicro board and never use asrock rack again..
What issue are you having specifically? Which version of the board - There’s a newer /BCM version of the board. Also what BIOS and BMC version?

I have one of these boards and don’t have any problems with it.

if the problem you’re having could be caused by a bad CPU mount or some bent pins then I could understand why they’d want you to check that before continuing. Would be a big waste of time for you to mail the board and have them tell you it’ll be $300 dollars to fix when They could tell you that over the phone or email instead.
 

Ryo

New Member
Feb 12, 2023
15
1
1
i getting basically the same eventlog errors that a posted above By J Batterman... and i mean almost exactly whats posted above
 

Obvious Potatoes

New Member
Feb 22, 2023
4
0
1
Hi everyone.
Just to join in I am also getting this issue on ROMED8-2T.
I have Hyper M.2 card in the machine, a 4 port network card, a LSI 9201-16i, and a EVGA 1050ti.
It looks like I got 5 errors on the 21st and 3 errors on the 22nd.
I'm going to tinker with the Gen Speed of the PCI lanes but does anyone else know of another solution or at the very least tell me if this is bad for the hardware.
I just bought it and I want to make sure nothing on the hardware side (motherboard, cpu, ram) is bad.
1677133630659.png
1677133660624.png
 

Obvious Potatoes

New Member
Feb 22, 2023
4
0
1
So what fixed this for me was running Proxmox with kernel 6.1
The forum I found it is here.
I reset UEFI to default with an update and after boot there was no more error even with passing through the GPU and Hyper M.2 card.
 

Ryo

New Member
Feb 12, 2023
15
1
1
So what fixed this for me was running Proxmox with kernel 6.1
The forum I found it is here.
I reset UEFI to default with an update and after boot there was no more error even with passing through the GPU and Hyper M.2 card.

is Proxmox required or is that just you os of choice