Minisforum ms-01 i9-13900H 96GB with Proxmox

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.
Feb 19, 2024
30
28
18
Hello community,

I bought a minisforum ms-01 and packed it with 2 Crucial 48GB of memory giving me 96GB. I have then installed Proxmox 7.4-17 and running kernel 5.15.143-1. Everything is running fine and i do not experiance random reboots or anything. However, i see these in my log and it is quite bothering me:
1708691142113.png

I did memtest, i tried running only with one ram and swapping ram and all that. I noticed that when i upgrade the kernel and the pve itself i see the memory failure every second. So i am running kernel 5.15.143-1 at the moment so i do not see the log every second. Am i seeing the problem because i inserted 96GB instead of 64GB? or is this kernel related problem?

Would help me a lot and maybe other people who are experiencing the same problem if you reply to this thread sharing you experience with the same setup. I would also be thankful if you say it is working on your side and do not see any errors on the log or it is not working and you also see the errors on the log.

Thank you all.
 
Feb 19, 2024
30
28
18
Just out of curiosity why are you not running a newer release of proxmox?
Before i posted i actually updated already. I am running PVE 8.1.4 already but the kernel i pinned it to 5.15.143-1 because using the 6.5.13-1 i see the memory failure.
So i bought 8GB of memory same specs just 1 8GB ram to test my "problem" and i can confirm it is because i inserted 2x 48GB of memory. When I looked via cli it showed me 64GB supported and in Proxmox GUI i was seeing 96GB. So I was getting the error above because of that. I could be totally wrong though. But after testing with 8GB everything is fine. I also tested kernel 6.5.13-1 with 8GB and no mce error and all that i was seeing before. Now i am relieved to know that it is not hardware issue. So i guess i could ignore the error when running with 96GB memory. Would still be nice to hear from others though.
 

FingerBlaster

Member
Feb 27, 2019
90
41
18
i've got 2 systems 1 running 96gb of crucial, and the other r unning 96gb of Mushkin. I'm running PVE 8.1.4 with the latest kernel and i'm don't see these errors in my logs at all.
 
  • Like
Reactions: homelabenthusiast
Feb 19, 2024
30
28
18
i've got 2 systems 1 running 96gb of crucial, and the other r unning 96gb of Mushkin. I'm running PVE 8.1.4 with the latest kernel and i'm don't see these errors in my logs at all.
Ok now that's very interesting... The thing is, i want to believe that my 2x 48GB Crucial ram are not faulty... And i did memtester86+ using usb and i did 2 pass. Maybe it needed more tests? But since it arrived (ms-01, 2x 2TB nvme, 2x 48GB Crucial came in one box) i was worried becuase the delivery guy kinda threw the box on the floor and when i opened it, the ms-01 was laying on top of both ram!!! And so i suspected it from the start. So it looks like im sending them back :(

Could you please send me the link of the Mushkin you bought? You also have i9-13900H right?

what does dmidecode -t memory show you? I see this on top:

1708737026234.png

Thank you for helping out :)
 
Last edited:
Feb 19, 2024
30
28
18
Machine Check Exceptions aren't just from memory, there's a utility called mcelog which should tell you more about the source of the exception.
I installed rasdaemon because mcelog utiliy was replaced by it I heard.
Using the command journalctl -u rasdaemon i see this:

1708766365321.png

I do not see any log similar to what i see when i execute journalctl | grep -i error or dmesg. Using dmesg i see this:

1708766862505.png

or am i using rasdaemon wrong? Where else should i look? and like i said, using 8GB mem i do not see anything so i am leaning more towards bad dimms or unsupported ram size (which is weird when @FingerBlaster do not see any error with same setup).
 

nexox

Well-Known Member
May 3, 2023
678
282
63
Are all the reported page addresses in the same range? This still could be a faulty CPU or motherboard issue, but it does look like RAM.
 
Feb 19, 2024
30
28
18
Are all the reported page addresses in the same range? This still could be a faulty CPU or motherboard issue, but it does look like RAM.
@nexox thanks for the message. What do you mean by your first question? Don't you think if it's motherboard or CPU problem that i should also see the errors when running 8GB memory? And its also quite interesting that some people with the same setup do not see the errors.
 

nexox

Well-Known Member
May 3, 2023
678
282
63
There's a hex code representing the bad memory location, the two you posted are close together, if you get a bunch more all near each other that might mean something, though I don't know exactly what. The working 8GB stick does eliminate a lot of likely failure points, but I don't know enough about the low level operation of modern DDR to tell you exactly what isn't eliminated.
 
Feb 19, 2024
30
28
18
There's a hex code representing the bad memory location, the two you posted are close together, if you get a bunch more all near each other that might mean something, though I don't know exactly what. The working 8GB stick does eliminate a lot of likely failure points, but I don't know enough about the low level operation of modern DDR to tell you exactly what isn't eliminated.
@nexox Okay so i checked the system again, and it logged something to rasdaemon after more than an hour after enabling it. Now i am not really sure at all if the it not hardware issue! This is what rasdaemon shows me. Anyone could help? I know it has something to do with L2 Cache of CPU. Just not sure how critical it is and if if i should send back my ms-01 :(

Code:
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]: overriding event (1375) ras:mc_event with new print handler
Feb 24 11:51:50 zeus rasdaemon[2661579]: overriding event (1372) ras:aer_event with new print handler
Feb 24 11:51:50 zeus rasdaemon[2661579]: overriding event (113) mce:mce_record with new print handler
Feb 24 11:51:50 zeus rasdaemon[2661579]: overriding event (1376) ras:extlog_mem_event with new print handler
Feb 24 11:51:50 zeus rasdaemon[2661579]: Calling ras_mc_event_opendb()
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005690: mce_record:           2024-02-24 10:23:21 +0100 bank=f, status= cc31afc000011136, corrected filtering (some unreported errors in same region) Data CACH E Level-2 Data-Read Error, mci=Error_overflow Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Data CACHE Level-2 Data-Read Error Large number of corrected cache errors.  System operating, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 2, socketid= 0, misc= 8320663c686, addr= c364d6e40, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 8
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005691: mce_record:           2024-02-24 10:23:22 +0100 bank=f, status= cc200fc000011136, corrected filtering (some unreported errors in same region) Data CACH E Level-2 Data-Read Error, mci=Error_overflow Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Data CACHE Level-2 Data-Read Error Large number of corrected cache errors.  System operating, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 2, socketid= 0, misc= 8320601c686, addr= 184f871f80, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 8
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005691: mce_record:           2024-02-24 10:23:23 +0100 bank=f, status= cc39ef0000011136, corrected filtering (some unreported errors in same region) Data CACH E Level-2 Data-Read Error, mci=Error_overflow Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Data CACHE Level-2 Data-Read Error Large number of corrected cache errors.  System operating, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 2, socketid= 0, misc= 8324000c686, addr= 810dbc480, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 8
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005691: mce_record:           2024-02-24 10:23:23 +0100 bank=f, status= cc39f0400001110a, corrected filtering (some unreported errors in same region) Generic C ACHE Level-2 Generic Error, mci=Error_overflow Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error Large number of corrected cache error s. System operating, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 1, socketid= 0, misc= 838c103c686, addr= ca5bc8bc0, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 1
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005691: mce_record:           2024-02-24 10:23:24 +0100 bank=f, status= cc3007000001110a, corrected filtering (some unreported errors in same region) Generic C ACHE Level-2 Generic Error, mci=Error_overflow Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error Large number of corrected cache error s. System operating, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 2, socketid= 0, misc= 878e323c686, addr= ee9b2ef40, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 8
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005691: mce_record:           2024-02-24 10:23:25 +0100 bank=f, status= cc3a4c000001110a, corrected filtering (some unreported errors in same region) Generic C ACHE Level-2 Generic Error, mci=Error_overflow Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error Large number of corrected cache error s. System operating, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 2, socketid= 0, misc= 8388403c686, addr= 184f931f00, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 8
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005691: mce_record:           2024-02-24 10:23:26 +0100 bank=f, status= cc32130000011136, corrected filtering (some unreported errors in same region) Data CACH E Level-2 Data-Read Error, mci=Error_overflow Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Data CACHE Level-2 Data-Read Error Large number of corrected cache errors.  System operating, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 2, socketid= 0, misc= 83243028686, addr= 184f8f1480, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 8
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005691: mce_record:           2024-02-24 10:23:27 +0100 bank=f, status= cc200e000001110a, corrected filtering (some unreported errors in same region) Generic C ACHE Level-2 Generic Error, mci=Error_overflow Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error Large number of corrected cache error s. System operating, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 2, socketid= 0, misc= 87286010686, addr= 184f631580, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 8
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005691: mce_record:           2024-02-24 10:23:28 +0100 bank=f, status= 8c3000400001110a, corrected filtering (some unreported errors in same region) Generic C :
ACHE Level-2 Generic Error, mci=Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error Large number of corrected cache errors. System opera ting, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 2, socketid= 0, misc= 8388203c686, addr= 184f705c00, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 8
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005691: mce_record:           2024-02-24 10:23:29 +0100 bank=f, status= cc3a800000011152, corrected filtering (some unreported errors in same region) Instructi on CACHE Level-2 Instruction-Fetch Error, mci=Error_overflow Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Instruction CACHE Level-2 Instruction-Fetch Error Large num ber of corrected cache errors. System operating, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 2, socketid= 0, misc= 83227014686, addr= f9ac56d40, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 8
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005691: mce_record:           2024-02-24 10:23:30 +0100 bank=f, status= cc2011000001110a, corrected filtering (some unreported errors in same region) Generic C ACHE Level-2 Generic Error, mci=Error_overflow Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error Large number of corrected cache error s. System operating, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 2, socketid= 0, misc= 878e423c686, addr= b56ee9480, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 8
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: register inserted at db
Feb 24 11:51:50 zeus rasdaemon[2661579]:            <...>-2631410 [002]     0.005691: mce_record:           2024-02-24 10:23:31 +0100 bank=f, status= cc3dba800001110a, corrected filtering (some unreported errors in same region) Generic C ACHE Level-2 Generic Error, mci=Error_overflow Corrected_error Threshold based error status: green, mca=corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error Large number of corrected cache error s. System operating, but might leadto uncorrected errors soon, cpu_type= Intel generic architectural MCA, cpu= 2, socketid= 0, misc= 838c203c686, addr= c9e4d33c0, synd= 0, mcgstatus=0, mcgcap= c16, apicid= 8
Feb 24 11:51:50 zeus rasdaemon[2661579]: rasdaemon: mce_record store: 0x561ce6199258
 
Last edited:
Feb 19, 2024
30
28
18
Well that looks like a CPU issue, again, not sure why it wouldn't happen with 8GB, but it's certainly possible.
@nexox i would check again with 8GB. I did not have rasdaemon installed before. I was just relying on normal log. I'll update very soon. Would you say i should send the ms-01 back?
 
  • Like
Reactions: nexox
Feb 19, 2024
30
28
18
I've been running 8GB ram for 1 hour and 40 minutes now and until now i do not see any mce errors. Still rooting for the ram. I am returning them and get new dimms. I'll update
 
Feb 19, 2024
30
28
18
How did you finally come to that conclusion

Let us know how Minisforum handles your RMA
I got in contact with them and they told me it's CPU malfunctioned. The technical support said he contacted the After- Sale department (Feb.27) to arrange the replacement and that they will contact me. But haven't heard anything from them since then. I am still patiently waiting and hoping that they are not ignoring me.

Edited: Just to let you guys know, I got and email from the after-sale support and they will replace my ms-01. I'll update when i get the new one
 
Last edited:
  • Like
Reactions: nicoska
Feb 19, 2024
30
28
18
Okay just to give you an update... I am returning my ms-01 through amazon because I am still able to return it until next week. If I do it over Minisforum then I have to send the hardware and when they received it then they can send a new one if they have in stock. AT the moment they are out of stock. And then I have to pay the shipping and can apply for reimbursement... I thought the hassel is not worth my time so i do it over Amazon. And at the moment im not wanting a replacement because of all the incompatibility and hardware issues many are facing... I heard there is going to be a new revision for ms-01 with better hardware compatilities and i am thinking of waiting. Just do not know when it is going to be produced. I am very sad though as I do not want to send it back. I tried another new 96GB same error.
 

nicoska

Member
Feb 8, 2023
54
9
8
Okay just to give you an update... I am returning my ms-01 through amazon because I am still able to return it until next week.
...
I did the same. I sent it back to Amazon DE. But it's really a shame because it could be the perfect homelab server but CPU issue e random reboot is not something I can live with.

Anyway, I hope to be able to buy a new one really soon.