Recommended FS for Shared storage to back NVME-oF/RoCE?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

kapone

Well-Known Member
May 23, 2015
2,014
1,368
113
10.99.9.1:/cache /mnt/san01/cache nfs vers=4.1,proto=rdma,port=20049,hard,intr,rsize=1048576,wsize=1048576 0 0
That seems mostly right. Try a few combinations perhaps?

Code:
10.99.9.1:/cache /mnt/san01/cache nfs vers=4.2,proto=rdma,port=20049,defaults 0 0
The default option should fill in the rest (mostly) optimally.

Try vers=3 with and without nconnect=8 or 16 as well?

This is just to rule out some stupid mismatch between the two.
 

IamSpartacus

Well-Known Member
Mar 14, 2016
2,545
656
113
That seems mostly right. Try a few combinations perhaps?

Code:
10.99.9.1:/cache /mnt/san01/cache nfs vers=4.2,proto=rdma,port=20049,defaults 0 0
The default option should fill in the rest (mostly) optimally.

Try vers=3 with and without nconnect=8 or 16 as well?

This is just to rule out some stupid mismatch between the two.
Here are the results.

Code:
10.99.9.1:/cache             /mnt/san01/cache             nfs  vers=4.1,proto=tcp,hard,rsize=1048576,wsize=1048576  0 0
time sh -c 'dd if=/dev/zero of=/mnt/san01/cache/test.bin bs=1M count=4096 conv=fdatasync status=progress'
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 2.46141 s, 1.7 GB/s

real 0m2.570s
user 0m0.001s
sys 0m0.419s


Code:
10.99.9.1:/cache             /mnt/san01/cache             nfs vers=4.2,proto=rdma,port=20049,defaults  0 0
time sh -c 'dd if=/dev/zero of=/mnt/san01/cache/test.bin bs=1M count=4096 conv=fdatasync status=progress'
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 224.13 s, 19.2 MB/s

real 3m44.135s
user 0m0.002s
sys 0m0.450s


Code:
10.99.9.1:/cache /mnt/san01/cache nfs vers=3,proto=rdma,port=20049,defaults 0 0

time sh -c 'dd if=/dev/zero of=/mnt/san01-rdma/cache/test.bin bs=1M count=4096 conv=fdatasync status=progress'
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 264.74 s, 16.2 MB/s

real 4m24.745s
user 0m0.003s
sys 0m0.446s


Code:
10.99.9.1:/cache /mnt/san01/cache nfs vers=3,proto=rdma,port=20049,nconnect=8,defaults 0 0

time sh -c 'dd if=/dev/zero of=/mnt/san01-rdma/cache/test.bin bs=1M count=4096 conv=fdatasync status=progress'
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 123.908 s, 34.7 MB/s

real 2m3.928s
user 0m0.009s
sys 0m0.397s
 

IamSpartacus

Well-Known Member
Mar 14, 2016
2,545
656
113
And the location is shared on the other end with sync option or async?
Right now it's like this. But I've done the same tests with async and it made no difference.

Code:
/cache              10.99.9.0/28(rw,sync,no_subtree_check,no_root_squash)
 

kapone

Well-Known Member
May 23, 2015
2,014
1,368
113
Right now it's like this. But I've done the same tests with async and it made no difference.

Code:
/cache              10.99.9.0/28(rw,sync,no_subtree_check,no_root_squash)
Grr...now this is just getting frustrating. Thinking hat on...
 

Greg_E

Active Member
Oct 10, 2024
504
164
43
Be glad you aren't trying to flog an HCI type of storage up to "acceptable" levels, I'll be getting back to Longhorn v2 as soon as I have my Foobernetes cluster up (Foo for using mini-PC, it's going to perform "good enough").
 

kapone

Well-Known Member
May 23, 2015
2,014
1,368
113
I'm glad I'm not the only one who's frustrated by this :D.
I simply don't get why your setup isn't performing (it should). I just ran the exact same test on my setup. I do have an SX6036 as the switch, but I don't think it's material. No jumbo frames involved, standard 1500 MTU.

Server end: Truenas 25.10, Connectx-3 Pro 56gbps RDMA, Inbox drivers, no OFED. The NFS target points to a... Fusion IO 6.4TB SSD/PCI-e card in an x8 slot, which is much slower than your NVME disks.

Client end: Debian 13, Connectx-4 lx 10gb RDMA, Inbox drivers, no OFED.

On the client:
NFS mount = sudo mount -o rdma,port=20049,vers=4.2 10.10.3.1:/mnt/fio1/testnfs /mnt/testnfs
File copy = sudo dd if=/dev/zero of=/mnt/testnfs/testfile bs=1M count=4096 conv=fdatasync

Screenshot 2026-04-10 at 10.41.02 AM.png

This is hitting wire speed of the connectx-4 at 10gb and the nfs share is exported as sync. Grrr....

Edit: and nconnect does make a bit of a difference.

Screenshot 2026-04-10 at 10.59.53 AM.png

Once again, total saturation of 10gb...

And more grrr.....
 
Last edited:

Greg_E

Active Member
Oct 10, 2024
504
164
43
Is it a jumbo frames issue? Maybe worth putting everything back down to 1500 and try again, maybe fragmentation is getting in the way somehow.
 

kapone

Well-Known Member
May 23, 2015
2,014
1,368
113
Is it a jumbo frames issue? Maybe worth putting everything back down to 1500 and try again, maybe fragmentation is getting in the way somehow.
If jumbo frames was an issue, it should affect read speeds as well, which doesn't seem to be the case.
 

TrevorH

Active Member
Oct 25, 2024
219
90
28
If the client was set to use jumbo frames and the server was set to 1500 then reads would work as it would be receiving 1500 byte packets from the server. The other way round would need retry(?) or fragmentation?
 

kapone

Well-Known Member
May 23, 2015
2,014
1,368
113
If the client was set to use jumbo frames and the server was set to 1500 then reads would work as it would be receiving 1500 byte packets from the server. The other way round would need retry(?) or fragmentation?
Hmm...that's a a good point.
 

Greg_E

Active Member
Oct 10, 2024
504
164
43
OP said jumbo on all ends, but is it worth trying 1500 just to see what happens? If 1500 can give the same/similar throughput to jumbo, then it might be worth staying at 1500.
 

TrevorH

Active Member
Oct 25, 2024
219
90
28
Also worth using `tracepath -n 10.99.9.1` from the client (depends on the firewall on 10.99.9.1 returning an ICMP "go away" response)
 

IamSpartacus

Well-Known Member
Mar 14, 2016
2,545
656
113
And went to do most testing today but before doing so I went to through an Arc a40 GPU into the machine so I could run some services on this machine with local storage will I continue to test out this issue.

Well, now I'm left with this screen at boot an no matter how many PCI cards I take out or how many times I reset the CMOS I can't get past it. Talk about a Grrrrrrrrrrrrrrrrrrrrrrrrr.

1775875412875.png
 

TrevorH

Active Member
Oct 25, 2024
219
90
28
Did you remove the power cord from the wall and leave it out for long enough to discharge anything remaining?
 

IamSpartacus

Well-Known Member
Mar 14, 2016
2,545
656
113
Did you remove the power cord from the wall and leave it out for long enough to discharge anything remaining?
Every time I reset the CMOS unplug the power, press the power button a few times to discharge, then leave it off for 5 minutes with the battery out. Think it needs to sit longer?
 

kapone

Well-Known Member
May 23, 2015
2,014
1,368
113
Well...

1. Did we find the culprit? :)
2. Is "Above 4G Decoding" enabled in the BIOS?
3. If there's an "MMIO" setting, set it to a higher value than the default, test.
4. Double check your Bifurcation/pcie lane allocation setup. Test.

Just thinking out loud here.