Minisforum MS-01 + QNAP JBOD issues

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

orinivan

New Member
Jun 4, 2024
9
2
3
I'm running TrueNAS Scale on my MS-01s, with various sizes of QNAP JBODs. I had a drive faile in my TL800S, the other day. After replacing the drive, the QNAP took zero notice of that, and did not spin up the drive. I had to shutdown the TrueNAS Scale server, then power cycle the QNAP before it would recognize the drive, then restart the Scale server. Zero transparent replacement of a failed drive. BOO. Not happy AT ALL.
 

MetalPhreak

New Member
Jan 19, 2024
27
19
3
Are all you guys with issues using ZFS and/or truenas scale with default settings?

I keep getting random disk dropouts under load and all sorts of chaos. Why? I can only speculate but what fixed it was preventing ZFS ARC cache from gobbling up all the VMs ram. I suspect the chipset driver doesn't like waiting for RAM to free up?

TrueNAS -> Advanced Settings -> startup/init scripts add -> post-init "echo 8589934592 >> /sys/module/zfs/parameters/zfs_arc_max" put in what ever safe RAM limit you want in bytes. I don't care about caching too much so just put 50%.

No more issues. Doing a resilver now across 9 disks, while dumping 50+ GB of new data on, no issues like before.

Hotplug (and removal) went ok with no issue.
 
Last edited:
  • Like
Reactions: _ed

_ed

New Member
Oct 28, 2024
3
0
1
Are all you guys with issues using ZFS and/or truenas scale with default settings?

I keep getting random disk dropouts under load and all sorts of chaos. Why? I can only speculate but what fixed it was preventing ZFS ARC cache from gobbling up all the VMs ram. I suspect the chipset driver doesn't like waiting for RAM to free up?

TrueNAS -> Advanced Settings -> startup/init scripts add -> post-init "echo 8589934592 >> /sys/module/zfs/parameters/zfs_arc_max" put in what ever safe RAM limit you want in bytes. I don't care about caching too much so just put 50%.

No more issues. Doing a resilver now across 9 disks, while dumping 50+ GB of new data on, no issues like before.
Oooh interesting, I will try this out and report back. I've been running Scale with default settings; have tried the QNAP SATA card, and various HBAs, all with problems (though LSI 9207 HBA has been the most stable of the bunch with the caveat that you can't leave any disk bays empty)
 

MetalPhreak

New Member
Jan 19, 2024
27
19
3
I will point out I'm virtualising TrueNAS Scale as a VM in Proxmox with all 4x sata controllers that show up passed through via iommu. Could be some weird virtualisation memory allocation thing.
 

_ed

New Member
Oct 28, 2024
3
0
1
Also running Scale as VM in proxmox. Your fix hasn't worked for me; though I'm using HBA with pci passthrough. Might have a play with controller later.
 

_ed

New Member
Oct 28, 2024
3
0
1
It's early days with testing, but so far your fix looks good! I've managed to test some large transfers with no stability issues or I/O faults or anything like I normally get.
 

benjvfr

New Member
Feb 19, 2024
10
1
3
Did someone tried to use a QXP-800eS-A1164 card with 2x TL-D800S ? Is passthrough to TrueNAS Scale VM working correctly and detect the 16 drives ? (Is bifurcation required for this setup ?)

EDIT : not QXP-800eS-A1164 but QXP-1600eS-A1164
 
Last edited:

pimposh

hardware pimp
Nov 19, 2022
434
270
63
No bifurc would be needed since there is a plx chip onboard on this card.
 

benjvfr

New Member
Feb 19, 2024
10
1
3
No bifurc would be needed since there is a plx chip onboard on this card.
Ok, thanks for your answer !

So on paper the QXP-1600eS-A1164 + 2x TL-D800S setup should work without too many problems?

EDIT : It's not the QXP-800eS-A1164 card that I need, but the QXP-1600eS-A1164. Does this card have plx chip?
 

benjvfr

New Member
Feb 19, 2024
10
1
3
Do not see any reason why it shouldn't.
I don't know enough about hardware / electronic level, that's why I'm asking for details here.

I had seen that the PCIe port of the MS-01 does not support bifurcation, and that it could cause problems when connecting several devices.

I believe I understand that the QXP-1600eS-A1164 card will manage the 2x TL-D800S itself, and therefore the exposure of the 16 disks to the system.

I would just like to avoid buying before being sure...
 

qiang

New Member
Nov 23, 2024
4
1
3
For the past 3 weeks, I've been experiencing the weirdest issue while testing new hard drives with badblocks (using this wrapper script specifically: bht).

Specifically, when using either the TL-D800S or TL-D1600S with the MS-01, I experience a non-zero amount of verify errors on all my drives. These errors start appearing anywhere between 25% and 75% through the first read pass of the test. Below is a sample output:

View attachment 36611

Despite the verify errors reported by badblocks, smartctl shows no issues, and all smartctl self-tests pass without any problems (long, short, conveyance). Additionally, I couldn't find any dmesg or system logs directly correlated with these errors.

Initially, I thought I had just been unlucky with shipping damage, so I returned the first eight drives I tested (a mix of Ultrastar and Exos drives). However, I grew suspicious of the TL-D800S after experiencing the same issues with the second set of eight drives I received. Therefore, I switched to the TL-D1600S, but the badblocks errors persisted.

Before starting any of my tests, I made sure to check my RAM with Memtest86+ for over five passes, detecting zero errors. I'm currently using the 96GB kit (CT2K48G56C46S5), but switching to the 64GB kit didn't seem to make any difference in my badblocks tests (I have the 13900H version). Moreover, since I'm running Proxmox, I updated the Intel microcode as suggested by Craft Computing. I've tried disabling C-states with no success, but I haven't yet tried the new 1.22 BIOS.

Most recently, I decided to test the TL-D1600S and the drives connected to a PC with an AMD CPU, and so far, I have not seen the same badblocks errors. The test hasn't completed all four read/write passes yet (which takes around a week), but at least it has gotten past the first read/write pass.

Has anyone else experienced this issue with their MS-01 + QNAP jbod combo? I'm really confused as to what could be going on and I am at a loss as to how to debug further.

UPDATE:
I returned the Minisforum MS-01 and replaced it with a Lenovo Thinkstation P3 Ultra. The new setup is working without any issues, so it seems the problems I experienced were exclusive to the MS-01, rather than related to Big/Little, C-states, or Proxmox.
1733465068993.png

I run bht test on my four 8T hdds with TL-D400s connected to ms01 with QXP-400eS-A1164 , all seems good after 65 hours running.
 
  • Like
Reactions: serialoverflow

serialoverflow

New Member
Nov 4, 2024
2
0
1
I run bht test on my four 8T hdds with TL-D400s connected to ms01 with QXP-400eS-A1164 , all seems good after 65 hours running.
That's great! Did you have to apply any fixes like pin taping or BIOS updates or other settings to get to this point?
I've been running 4 SSDs and 4 HDDs with Unraid and MS01 + QNAP TL-D400s.
The 4 HDDs work fine but the 4 SSDs throw errors which take down the whole breakout cable and i have to reboot Unraid to restore "hotplug" capability. I've tested various enterprise SSDs and they test fine outside of this setup but once I throw them in, I eventually get write errors and all SSDs on the cable are unavailable for IO.
 

qiang

New Member
Nov 23, 2024
4
1
3
That's great! Did you have to apply any fixes like pin taping or BIOS updates or other settings to get to this point?
I've been running 4 SSDs and 4 HDDs with Unraid and MS01 + QNAP TL-D400s.
The 4 HDDs work fine but the 4 SSDs throw errors which take down the whole breakout cable and i have to reboot Unraid to restore "hotplug" capability. I've tested various enterprise SSDs and they test fine outside of this setup but once I throw them in, I eventually get write errors and all SSDs on the cable are unavailable for IO.
I didn’t specifically do any pin taping or BIOS updates just for running the BHT tests. However, before starting, I had already updated the BIOS to the latest version (1.26V) based on recommendations I found in forum posts. Currently, I’m running TrueNAS in Proxmox VE with 4 HDDs and haven’t encountered any issues so far. I haven’t tested with SSDs yet.
 

mikewhiskey

New Member
Jan 29, 2025
2
0
1
Hi Guys, Im hoping someone could save me, and I hope I didn't just make a huge mistake. I have an MS01 running Proxmox. I just got a QNAP TL-D400S, I can't seem to get the QXP PCIe Card to get detected. Could someone save me?