Linux Storage ZFS - Windows Clients

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

letniq

New Member
Jun 18, 2021
1
0
1
Hi all. I've learned so much over the past few years just reading the forums and want to say big THANKS to all.

I'm building now a data server for the office and want to explore what are my options. I'm having about 300-400 windows 10 machines that reads (mostly), do their calculations and then writes the output to a local data server. The main difficulty is that the data set the the machines read is sometime about 100GB of thousands of mixed size files. This puts big pressure on the server and disks. To try to battle this in my current set up I have the following configuration.

Ubuntu Server with ZFS file system.

350GB of RAM for the Arc Cache of ZFS
960 GB of Intel Optane PCIE Layer 2 Cache of ZFS

Final layer is 12 HDDs in RAID5 for a total of 80TB.

When the machines read two or three data sets that fits in the Arc it's fine. However there is time when we need 10 data sets of 100 GB each that are not fiting in the Arc and it needs to read them from HDDs. This is when the server is screaming for help.

The strange thing is that even in the haviest workloads I can see that the Layer 2 cache is hit only about 5% while the Arc efficiency is about 90-99%. May be there is a setting I'm missing to adjust how the Layer 2 is used. My Arch size breakdown mfu / mru is about 90 / 10 most of the time under load.

I'm serving all this data through samba to all Winwdows machines. I'm feeling sometime that Samba is also a bottleneck for the performance.

Is there better way to share the storage to all windows machines? So that all machines can read the data and then write back the output.
 

gea

Well-Known Member
Dec 31, 2010
3,157
1,195
113
DE
some comments

1.
ZFS Arc cache is only there to cache small random reads and metadata, not whole files and sequential data with a read last/read most optimisation. This means that a huge Arc ramcache does not help for large files.

2. L2Arc
Works the same. With 350GB Ram, L2Arc is not needed or helpful. A little benefit may come if you enable read ahead or with newest ZFS and persistent L2Arc.

3. Raid ZFS Z1/raid 5 with many disks or high capacity is a bad idea.

4. Mostly your ZFS throughput is restricted by pool iops. As a single ZFS Z(1-3) vdev has only the same iops as a single disk (around 100 iops). You can increase iops by more vdevs (iops scale with vdevs). Use at least ZFS Z2/raid 6 and 2x6 disks layout (200 iops read/write). Fastest will be 6 x mirror (600 iops write, 1200 iops read)

5. SAMBA is single threaded. Often the multithreaded kernel/ ZFS based SMB server in Oracle Solaris or the free Solaris forks ex OmniOS is faster especially in a multiuser environment,

6.
If you really want performance, use an SSD/NVMe pool, ex WD SS530 SAS disks
 
Last edited: