How to get the fastest possible write and read speeds on Linux?

zara654

New Member
May 15, 2021
23
1
3
The work load consists of generating many files as fast as possible on very fast storage, (RAID 0 of nvme drives) and transferring that data to slow storage for long term storage while minimizing the bottleneck. I'm trying to minimize the time limit for the transfers to long term storage. When an HDD is full it gets pulled and replaced. If possible I'd like to maintain 16 TB writes per day on the long term storage.

Would ZFS be the ideal filesystem for this use case? Should the short term fast storage be a part of the ZFS filesystem? If the fast storage should be it's own file system, would I need other fast storage dedicated to the ZFS array to maximize write speed to the HDDs? Can a ZPool be rebuilt on another machine dedicated to housing the HDDs long term?

I'll be pulling 16 TB HDDs fairly often depending on the transfer rate. Is there any means of data redundancy for this use case?
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,395
501
113
For moving stuff to the "slow" storage, you're likely going to be limited by the maximum write speed of a single hard drive - even the best platter-based HDD is unlikely to get you more than ~200MB/s for large sequential writes, and if you're able to achieve that 24/7 (and I doubt you will be especially if you're writing small files) then that's a maximum of 16TB/day.

Aside from write coalescing, there's not much any filesystem can do to improve things in that scenario unless the data being written responds well to compression or suchlike.

Is the activity on the NVME part 24/7, only in batches or are you just limited by the speed with which you can stage the data off to nearline?
 

zara654

New Member
May 15, 2021
23
1
3
It could be in batches. I'm guessing the most efficient means would be multiple NVME RAID 0 arrays. Writes get performed to one array, while the others are being transferred to separate HDDs each. Think it's possible to have 3 RAID 0 arrays of ten disks on a system. Lower disk count per array with multiple arrays working on the data set might be more ideal for transferring. I think the same organization between fast storage and slow storage would still apply.