Minisforum ms-01 i9-13900H 96GB with Proxmox

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Apachez

New Member
Jan 8, 2025
29
14
3
Verify that you have UEFI enabled (you can disable CSM) and have secure boot disabled for Proxmox to happily boot without issues.
 

b-rex

Member
Aug 14, 2020
60
14
8
I am testing three Minisforum MS-01 (i5, i9-12, i9-13) in a Proxmox test setup. They are intended to replace my NUCs, which I have been running since Proxmox v5.

I noticed frequent hangs on the screen BIOS screen that says press <DEL> or <ESC> to enter setup (quite boot disabled). At that moment I need to power cycle, nothing else helps. After booting, it will just repeat and hang again. To get out of this, I have to actively press F7 during boot and manually select the boot drive (Proxmox). After that, on the next boot, it seems to work normally again.

View attachment 41248

I noticed that this hang seems to happen mostly when I first enter the BIOS setup screen. By now, I almost know I have to manually boot and press F7 or power cycle after being in the BIOS.

Sometimes, even after F7 and seeing the (BIOS) boot drives to choose from, it will just flash the screen black then return to the selection (won't boot).

View attachment 41249

Sometimes I also had a Boot Option Restoration screen, but I am not sure if this is part of the BIOS or Proxmox (I think it is because once I had that screen with more options).

View attachment 41250

I am wondering if it's actually the BIOS that is frozen at that point, or already GRUB boot loader etc (which is stuff I have no idea about) - I am using ext4 for my boot drive.

It might be completely unrelated to to Proxmox, but I had once or twice simply lost the drive (didn't show up anymore) but after installing over the same drive from USB stick worked again.

Does this sound like MS-01 headache or anything else familiar?

Drives:
#1: U.2 Samsung PM9A3 2.5" U.2 SSD (boot, ext4)
#2: M.2 Micron 7400
#3: M.2 Micron 7400

They are on BIOS v1.26.

I started with the Micron M.2's as boot drive, they exhibited the same behavior.

Edit 2025-01-14:
- It seems this was caused by the Micron 7400 2280 SSDs.
- Now that I replaced them with Samsung 983 the reported issue seems to be gone (been stable for a few days).
I had these problems too with one of mine. I'm using PM9A3 U.2/M.2. My issue went away after reseating.

I'm actually new to these...kinda bummed to see how many problems people have had with them. I have 4x configured identically for a small VMware management cluster. I specifically purchased these because they had OOB management, integrated 10g SFP+ network, and room for 2x 22110 M.2 / U.2 and an additional PCI slot. But...I've noticed that even with the fans pegged, they still get way too hot. I wouldn't be surprised if that's a big part of the stability problems people have been experiencing.

To be honest, I don't know how or why I ever thought these things would be able to handle any real workload at all. I liked the idea of having a low power always on setup for my lab so I could run lots of base level services without having to have my GPU clusters up 24/7...since I'm likely going to have to add loud fans (and power....) to keep these things cool, I probably would've been better off going with 1U servers instead.
 
  • Like
Reactions: pimposh

enchanted_dragonfly

New Member
Oct 23, 2024
6
1
3
Hi all. I, too was having MS-01 stability issues last year. I installed BIOS 1.26 last fall and it seemed that things might be resolved. Uptime of ~90days. But then today, I couldn't access the UI or SSH in. Strangely, my VMs were still accessible via their web UIs and SSH. I checked the console and saw the following. Any idea what this might mean? I appreciate the help.

Code:
[7287678.484947] EXT4-fs error (device dm-1) in ext4 do update inode:5109: Journal has aborted
[7287678.4849471 EXT4-fs error (device-dm-1) in ext4_do_update fnode:5109: Journal has aborted
[7287678.485046] EXT4-fs (dm-1): Remounting filesystem read-only
 

Apachez

New Member
Jan 8, 2025
29
14
3
Something bad have happend with your root partition - perhaps out of space?

You probably need to boot with system rescue or similar to mount the drive from a ramdisk and then manually fsck it followed by removing logs and other unnecessary stuff if it is a case of out of space. Otherwise the fsck should fix this.

Once you recovered you can wipe all old systemd logs (if you wish) by using this command

Code:
journalctl --rotate && journalctl --vacuum-time=1s
 

enchanted_dragonfly

New Member
Oct 23, 2024
6
1
3
Thanks, Apachez.

I have a hard time believing that it's a problem with my boot drive since my boot drive is this guy: Western Digital 500GB WD Red SN700 NVMe Internal Solid State Drive SSD. Should be a solid option, which is why I bought it. Maybe even overkill for a proxmox boot drive.

Also, it's definitely not out of space: 26.19 GB of 482.13 GB used

I was digging around on the proxmox forums and found some folks with similar issues and they recommended the following fix. I applied this and it's been stable since, but last time it was up for like 87 days before that error popped up, so not sure how long I have to wait until I consider that it worked for me. Ha.

Edit: /etc/default/grub
And add the line: GRUB_CMDLINE_LINUX_DEFAULT="quiet nvme_core.default_ps_max_latency_us=0 pcie_aspm=off"
 

binboum

New Member
Apr 22, 2020
5
0
1
Hello,

I'm writing to report a persistent issue with one of my MS-01 machines, specifically with segfault CPU-related error messages. Despite updating to BIOS 1.26, the machine continues to exhibit erratic behavior, including displaying error messages on dmesg, sudden shutdowns, and unresponsiveness to the keyboard and monitor. A reboot resolves the issue temporarily, but it's not a reliable solution. Notably, my other MS-01 unit does not experience this issue, even when running a different kernel version.

I've reached out to Minisforum multiple times via email (support@, contact@, and store@) to initiate the RMA process, but unfortunately, I have yet to receive a response. My concern is the lack of acknowledgment and the uncertain timeline for when I can expect a response from Minisforum. As a customer, it would be reassuring to know if Minisforum is actively working on RMA requests and if there's a standard timeframe for when I can expect a response.
I’m in pretty much the same situation, one unit more than the other. How have you progressed with your requests to support?
 

robthered

New Member
Jul 15, 2023
7
8
3
I have a 3x Minsiforum MS-01 setup with proxmox (2 nodes have 96GB RAM, 1 node has 64 GB RAM - HDD setups are: 1TB boot drive on each, 2x 2TB drives for ceph) All are running bios 1.26. Workload is 1VM for PBS, 1VM for ubuntu server for my talos config, and then 1 talos manager and 1 talos worker per node. For some reason only 1 of my machines seems to randomly reboot or get into a state where I can't bring up shell because it's locked. It's always node 1 (96GB RAM). Only way for me to get it to respond is to reboot, but lately I'm unable to get the command shell to respond via the UI and sometimes when I go to the machine itself, it's locked up.

My networking layout is ceph on a thunderbolt 4 ring network with the other machines, both SFP+ ports are linked up for each node to one ip (.21, .22, .23 for example), and the 2.5gig ports are aggregated to a different ip that I'm using for the management IP address for proxmox (.13, .14, .15 for example). I didn't get vpro setup, but I did aggregate all the network links (2.5 gig together, 10gig together). Should I have the network devices separate? I haven't tried a proxmox reinstall yet, but I'm not sure it will work. This just started happening in the last month.

What has me puzzled is that it's only node 1 that this is messing up on, nodes 2 and 3 seem to be fine. All have the latest microcode on it, and are all running the 1.26 bios.
 

randman

Member
May 3, 2020
72
12
8
I’m looking to get the ms-01. I’ve read the beginning and the last couple of pages of this thread. Am I correct to conclude that folks who were having reliability trouble earlier have had their issues resolved with version 1.26?

TBH, I only need 64GB of RAM, but the price difference for Crucial 96GB RAM versus 64 GB RAM is only $15. But more important for me is stability. Any reason that the 64 GB is more stable than 96 GB for the i9-13900H CPU?
 

robthered

New Member
Jul 15, 2023
7
8
3
I’m looking to get the ms-01. I’ve read the beginning and the last couple of pages of this thread. Am I correct to conclude that folks who were having reliability trouble earlier have had their issues resolved with version 1.26?

TBH, I only need 64GB of RAM, but the price difference for Crucial 96GB RAM versus 64 GB RAM is only $15. But more important for me is stability. Any reason that the 64 GB is more stable than 96 GB for the i9-13900H CPU?
All of my machines are on 1.26 with the latest microcode and like I said 2 nodes have 96 and one has 64... Only one of the 96GB nodes is having issues. Don't know why yet. Hoping someone finds out why.
 

randman

Member
May 3, 2020
72
12
8
All of my machines are on 1.26 with the latest microcode and like I said 2 nodes have 96 and one has 64... Only one of the 96GB nodes is having issues. Don't know why yet. Hoping someone finds out why.
While some folks' issues seem to have been resolved with 1.26, your issue, and the nature of the problem in general (problem only occurring potentially days in-use), does make me nervous and doubtful about the ms-01. Specs are great, price is great, but stability for me is important.
 

SwanLab

New Member
Dec 7, 2024
5
6
3
While some folks' issues seem to have been resolved with 1.26, your issue, and the nature of the problem in general (problem only occurring potentially days in-use), does make me nervous and doubtful about the ms-01. Specs are great, price is great, but stability for me is important.
I ordered the 13900, 32GB, 1TB kit early December and just recently got it up and running. I do hope to eventually upgrade the RAM to 96GB and install the best GPU that can fit for running small LLM's in Open-WebUI. Curious if anyone has tried this and what GPU they used?

I am just about to hit 21 days uptime after the initial install with zero issues.

Here is what I did if it's any help:

  • 3D Printed new fan cover and installed a 140mm Noctua Fan blowing onto the m.2's
  • Re-apply the thermal paste thanks to @wadup 's post this was easy and didn't have to guess where the screws were, the factory paste really is terrible
  • Moved the 1TB m.2 to the 2nd slot and installed a 4TB 990 Pro in the 1st slot (diagram)
  • Installed pure copper heatsinks to both m.2 due to the new fan mod you get clearance for nice and thick 6mm heatsinks
  • Installed v1.26 BIOS from the official page
    • Some people seem to have problem with this, if so check here
  • Installed ProxMox to the 1TB drive and am using the 990 Pro as storage for an Ubuntu Server and Windows Server VM along with Docker\Portainer and a handful of containers
 
  • Like
Reactions: randman

randman

Member
May 3, 2020
72
12
8
I ordered the 13900, 32GB, 1TB kit early December and just recently got it up and running. I do hope to eventually upgrade the RAM to 96GB and install the best GPU that can fit for running small LLM's in Open-WebUI. Curious if anyone has tried this and what GPU they used?

I am just about to hit 21 days uptime after the initial install with zero issues.

Here is what I did if it's any help:

  • 3D Printed new fan cover and installed a 140mm Noctua Fan blowing onto the m.2's
  • Re-apply the thermal paste thanks to @wadup 's post this was easy and didn't have to guess where the screws were, the factory paste really is terrible
  • Moved the 1TB m.2 to the 2nd slot and installed a 4TB 990 Pro in the 1st slot (diagram)
  • Installed pure copper heatsinks to both m.2 due to the new fan mod you get clearance for nice and thick 6mm heatsinks
  • Installed v1.26 BIOS from the official page
    • Some people seem to have problem with this, if so check here
  • Installed ProxMox to the 1TB drive and am using the 990 Pro as storage for an Ubuntu Server and Windows Server VM along with Docker\Portainer and a handful of containers
Thanks. Very useful info. Did you experience any heat related issues before your changes, or you were just being cautious?
 

robthered

New Member
Jul 15, 2023
7
8
3
So from another forum, it was suggested I set the memory speed to 4200 for the nodes that are having issues, and as of right now, it's working. I'll let you know how long it goes before it needs to be redone.
 
  • Like
Reactions: randman