Minisforum MS-01 + QNAP JBOD issues

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

just_a_person

New Member
Apr 18, 2024
16
7
3
For the past 3 weeks, I've been experiencing the weirdest issue while testing new hard drives with badblocks (using this wrapper script specifically: bht).

Specifically, when using either the TL-D800S or TL-D1600S with the MS-01, I experience a non-zero amount of verify errors on all my drives. These errors start appearing anywhere between 25% and 75% through the first read pass of the test. Below is a sample output:

Screenshot from 2024-05-11 00-45-37.png

Despite the verify errors reported by badblocks, smartctl shows no issues, and all smartctl self-tests pass without any problems (long, short, conveyance). Additionally, I couldn't find any dmesg or system logs directly correlated with these errors.

Initially, I thought I had just been unlucky with shipping damage, so I returned the first eight drives I tested (a mix of Ultrastar and Exos drives). However, I grew suspicious of the TL-D800S after experiencing the same issues with the second set of eight drives I received. Therefore, I switched to the TL-D1600S, but the badblocks errors persisted.

Before starting any of my tests, I made sure to check my RAM with Memtest86+ for over five passes, detecting zero errors. I'm currently using the 96GB kit (CT2K48G56C46S5), but switching to the 64GB kit didn't seem to make any difference in my badblocks tests (I have the 13900H version). Moreover, since I'm running Proxmox, I updated the Intel microcode as suggested by Craft Computing. I've tried disabling C-states with no success, but I haven't yet tried the new 1.22 BIOS.

Most recently, I decided to test the TL-D1600S and the drives connected to a PC with an AMD CPU, and so far, I have not seen the same badblocks errors. The test hasn't completed all four read/write passes yet (which takes around a week), but at least it has gotten past the first read/write pass.

Has anyone else experienced this issue with their MS-01 + QNAP jbod combo? I'm really confused as to what could be going on and I am at a loss as to how to debug further.

UPDATE:
I returned the Minisforum MS-01 and replaced it with a Lenovo Thinkstation P3 Ultra. The new setup is working without any issues, so it seems the problems I experienced were exclusive to the MS-01, rather than related to Big/Little, C-states, or Proxmox.
 
Last edited:
  • Like
Reactions: pimposh

just_a_person

New Member
Apr 18, 2024
16
7
3
Possibly a driver issue. You could try your same tests on Windows to confirm.
I don't think there's a direct equivalent of "badblocks" on Windows? Also, I haven't seen any of the same errors so far when running badblocks/proxmox on an AMD CPU, so I'm not sure how drivers alone would explain what I'm witnessing.
 

just_a_person

New Member
Apr 18, 2024
16
7
3
Is that different from disabling "C States" and "Race To Halt" in BIOS? I'm happy to try toggling other bios settings if that may help.
 
  • Like
Reactions: ColdCanuck

skipper ohms

Member
Jan 24, 2024
35
26
18
Intel has E cores while AMD does not, which may be a contributing factor. My point is they are different chips with different features and the driver may perform differently on each platform.

I don't know the solution here. If you aren't going to try my suggestion then you'll have to figure it out some other way.
 

just_a_person

New Member
Apr 18, 2024
16
7
3
I tried to setup the AMD system identically to the MS-01, so same OS (Proxmox 8.2), same badblocks script, same JBOD (TL-D1600S), and same hard drives.

If it were an issue of poorly seated hard drives, SFF cables, or pcie card, I would expect more frequent errors? (rather than cropping up only after reading between 4-12TB of data).

I can try vanilla Ubuntu Server later to see if it's purely a proxmox incompatibility with Big+Little, although I would have expected proxmox to behave like vanilla debian when not running any VMs.
 

just_a_person

New Member
Apr 18, 2024
16
7
3
I ended up returning the MS-01 unit I got because I simply was getting frustrated with it (it randomly crashed overnight in the middle of another test). Instead, I should be receiving a Lenovo P3 ultra on Friday, so I might at least be able to see if the issues are MS-01 specific or an intel big+little/proxmox incompatibility.
 

just_a_person

New Member
Apr 18, 2024
16
7
3
Just received the Lenovo P3 Ultra (i7-13700) with ECC RAM and running the badblocks test now on 12 disks. I'm using proxmox 8.2.2, updated to the latest intel-microcode, and left all the bios settings at their default (i.e. cstates enabled and all cores on).

Will update here with initial results in a few days (takes roughly 20 hours for each pass).
 

just_a_person

New Member
Apr 18, 2024
16
7
3
So the first two passes (1 write + 1 read) have finished on my 12 drives using the P3 Ultra, and so far, I haven't encountered any of the previous corruption errors. I'll continue the test for at least two more passes, but typically, errors would have appeared by now with the MS-01. This suggests that the issue lies with the MS-01 itself rather than with Proxmox + Intel Big/Little. My three current guesses as to what is going on are:

1. My specific unit had a unique hardware defect.
2. There's some sort of bug in the motherboard firmware causing this issue (I had updated to 1.22).
3. Somehow, there's a difference in how Laptop and Desktop Raptor Lake CPUs behave.

Is there anyone with a similar setup (MS-01 + [QNAP TL-D800S or QNAP TL-D1600S] + very large drives) who can test whether they experience the same issue when running badblocks? I would hate to see other people experience silent data corruption because they used a similar setup to mine.
 
  • Like
Reactions: Tech Junky

just_a_person

New Member
Apr 18, 2024
16
7
3
Still no errors after another two passes, so I'm fairly confident now that things are working for my new setup w/o the MS-01.
 

rog713

New Member
Mar 15, 2024
2
0
1
I struggled with the same issues for several months. I went through 10+ lsi & qnap hba's thinking they were cause of my problems, i recently dumped the ms-01 for a p3 and all my problems went away.
 
  • Wow
Reactions: just_a_person

just_a_person

New Member
Apr 18, 2024
16
7
3
Oof, sucks that you had to go through all that, but I'm glad something is working for you now at least.
 

dvdplm

New Member
Jan 11, 2024
8
4
3
I’m having a somewhat similar issue with a tl-d400s enclosure. The qxp-400es-a1164 card is not recognized at all by the minisforum so I got a cheap LSI card on eBay. This card recognizes my drives (4x16tb seagates) but there are frequent problems reported in dmesg and Unraid just fails drives seemingly at random. SMART tests are all passing (extended and short) for all drives.
I am a bit at a loss as what to try next. Different hba? Return the ms-01?

the qxp card works just fine on my gaming pc so that points to the minisforum being the culprit. Tried both bios versions.

Help?
 
Last edited:

orinivan

New Member
Jun 4, 2024
9
2
3
I'm very concerned by this thread, as I bought two MS-01s, and two TL-D800Ses, both running TrueNAS Scale, and am planning to buy have purchased two more MS-01s and two TL-D1600Ses, to also run TrueNAS Scale. I have the same memory and CPU option as OP. I'm using the HBA that came with the QNAP JBOD, in both cases.

I'm running bht now (one unit has 8 x 8 TB drives, the other, 8 x 6 TB) ...

Edit: bht has done two complete sets of write, then read, passes, on my 6 TB drives. It reported one error, on one drive, at ~81% of the way through the first read pass, and nothing on the second. It's about 25% through the second read pass on my 8 TB drives, and no errors have been found on any disks yet. It doesn't look to me like I'm having the issues that OP did. It seems I am having the issue, just not to the extent that OP had.
 
Last edited: