Hi all. I've learned so much over the past few years just reading the forums and want to say big THANKS to all.
I'm building now a data server for the office and want to explore what are my options. I'm having about 300-400 windows 10 machines that reads (mostly), do their calculations and then writes the output to a local data server. The main difficulty is that the data set the the machines read is sometime about 100GB of thousands of mixed size files. This puts big pressure on the server and disks. To try to battle this in my current set up I have the following configuration.
Ubuntu Server with ZFS file system.
350GB of RAM for the Arc Cache of ZFS
960 GB of Intel Optane PCIE Layer 2 Cache of ZFS
Final layer is 12 HDDs in RAID5 for a total of 80TB.
When the machines read two or three data sets that fits in the Arc it's fine. However there is time when we need 10 data sets of 100 GB each that are not fiting in the Arc and it needs to read them from HDDs. This is when the server is screaming for help.
The strange thing is that even in the haviest workloads I can see that the Layer 2 cache is hit only about 5% while the Arc efficiency is about 90-99%. May be there is a setting I'm missing to adjust how the Layer 2 is used. My Arch size breakdown mfu / mru is about 90 / 10 most of the time under load.
I'm serving all this data through samba to all Winwdows machines. I'm feeling sometime that Samba is also a bottleneck for the performance.
Is there better way to share the storage to all windows machines? So that all machines can read the data and then write back the output.
I'm building now a data server for the office and want to explore what are my options. I'm having about 300-400 windows 10 machines that reads (mostly), do their calculations and then writes the output to a local data server. The main difficulty is that the data set the the machines read is sometime about 100GB of thousands of mixed size files. This puts big pressure on the server and disks. To try to battle this in my current set up I have the following configuration.
Ubuntu Server with ZFS file system.
350GB of RAM for the Arc Cache of ZFS
960 GB of Intel Optane PCIE Layer 2 Cache of ZFS
Final layer is 12 HDDs in RAID5 for a total of 80TB.
When the machines read two or three data sets that fits in the Arc it's fine. However there is time when we need 10 data sets of 100 GB each that are not fiting in the Arc and it needs to read them from HDDs. This is when the server is screaming for help.
The strange thing is that even in the haviest workloads I can see that the Layer 2 cache is hit only about 5% while the Arc efficiency is about 90-99%. May be there is a setting I'm missing to adjust how the Layer 2 is used. My Arch size breakdown mfu / mru is about 90 / 10 most of the time under load.
I'm serving all this data through samba to all Winwdows machines. I'm feeling sometime that Samba is also a bottleneck for the performance.
Is there better way to share the storage to all windows machines? So that all machines can read the data and then write back the output.