What storage approach and drives would you go for?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

fatherboard

Member
Jun 15, 2025
37
1
8
Before I bark the wrong tree, I would like your experience to point me in the right direction:

Use case:

Content 1
: 500TB to 750TB to 1PB of storage, not permanent, gets deleted when the project is finished. every file is multiple GBs in size. Speed matters.
This content 1 is made of 3 parts that can and should be in separate physical drives.
Assuming total of 750TB
-> 1a = 100TB
-> 1b = 300TB
-> 1c = 350TB

Content 2: rough guess 150to 200TB, permanent, used across multiple projects, file size usually less than 1GB, perhaps a couple of hundred MBs, some much smaller. Tens of thousands of files.

Processing
Step 1: only Content 1 is being worked on (all the files). generate 1a. read 1a to generate 1b. Then read 1a to generate 1c.
Step 2: only Content 2 is being worked on (just a few files)
Step 3: All content 1 + a selected few from Content 2 are being worked on at the same time.

Users: for now just one, myself.

Notes:
- I'm allergic to the word NAS, because the connection is dead slow (bottleneck), it would render the great speed of the CPU and NVMe RAIDs useless.
- Content 1: RAID 0, speed matters, if lost not a big deal, re-HPC
- Content 2: I would rather have a backup somewhere

This storage will be built and used overtime, bit by bit, drive by drive (if too expensive), the drives will be used separately as they're purchased before I bring them all together to make the 750TB-1PB. so they have to be coherent within each sub-group (1a, 1b, 1c and 2), and RAIDable (if that word exists), e.g. same capacity.

Why am I asking? because over several months I will buy and need to use this storage while it's being built up. I currently have workstation cases, a mix of workstation motherboards (ASUS® PRO WS WRX90E-SAGE SE) and server motherboards (SUPERMICRO H13SSL-N), so I'm tempted to start with M.2 drives in an Asus Hyper M.2 x16 Gen5 Card or similar temporary solutions that could end up a big spaghetti at the end, not really a coherent set of drives I can bring together later under one roof as the 1PB storage (not a NAS), with a different future 48 RAM slot motherboard e.g. TURIN2D48G-2L+/500W.

What storage approach would you go for now for immediate use taking into account it's meant to become 4 separate sets of storage (1a, 1b, 1c and 2)? What type of drives would you choose for each type of content 1a, 1b, 1c, and 2? U.2, U.3, M.2, SSD, HDD? and how do I keep the flexibility of moving the drives around?
 
Last edited:

justincormack

New Member
Jun 5, 2025
8
2
3
This is a difficult size to run on one machine. And it will be expensive regardless. How much performance do you actually need? Whats your performance tradeoff? HDD will be slower but cheaper, and you workload is roughly sequential by the sound of it, although you dont really make it clear. With HDD the network wont be a bottleneck really, you need 50 HDD at 20TB, 100Gb ethernet would easily meet peak performance. WIth SSD network is a bottleneck, but unless you buy 8x the new 122TB drives https://www.servethehome.com/solidigm-d5-p5336-122-88tb-nvme-ssd-review/ you will need enclosures/pcie switches, I mean you might anyway, you dont say what sort of machine you need for this work. These large drives are expensive, thats going to cost you $130k or so, although 30TB drives wont be much cheaper, though will be faster but it gets harder to connect them.
 

kapone

Well-Known Member
May 23, 2015
1,388
820
113
Assuming Spinners are fine… and they have to be, because all flash at that size is not exactly cheap. If this is for a business ( which it doesn’t seem to be), strike my prior statement.

That said, this is not difficult. Four separate RAID arrays… each with its own card (for PCI-e bandwidth)…with say 22TB HDDs… you need ~50 drives.

That’s trivial to do. Get two external 24 or more bay SAS enclosures, one (or more … you are thinking HA… right?), head node and you’re up and running.

A 56gbps (uhhh Mellanox is easy..) or better should be easily doable.

For even higher performance, build four servers with individual networking/ RAID arrays etc.

Easy peezy!
 

gea

Well-Known Member
Dec 31, 2010
3,489
1,371
113
DE
With a DAS concept (local arrays for local apps), you only need enough disk bays. Depending on data you can asume 100-150 MB/s per hd, 500 MB/s per SSD and 1000+ MB/s per NVMe. In a raid array ex Raid-6 or ZFS Z2 data is striped over disks but not linear, more a factor like 1,5 ex a raid Z2 of 10 disks (8 datadisks, 2 redundancy disks, asume a minimum of 8x100=800 MB/s, with SSD 4000 MB/s and with NVMe 8000 MB/s

For more than say 2000 MB/s throughput you need a very fast CPU and server system.
For concurrent read/write and disks prefer SAS over Sata as SAS has twice the performance of Sata and can read/write concurrently

If storage and client OS must be different or you want to separate devices, you need a fast lan connectivity with nics > 10G in a NAS with a filesharing service. Fastest option is SMB Direct with RDMA that allows up to 10Gbyte/s over lan with lowest latency and cpu load. This is an option mainly with a Windows Server + Windows 11 clients. With a Linux server you can use ksmbd as SMB server that promises SMB Direct but I have only seen success reports with Linux clients. for some clients, use dac connections host>client ex with 1-4 x 25G nics in a server and around 3GByte/s to clients.

Filesystem wise, prefer ZFS pools from multiple Z2 vdevs. With a disk based pool, you may add a special vdev mirror for metadata and small files. For fast SSD or NVMe pools enable direct io (disables extra copy actions to readcache). ZFS is available on Free-BSD, Illumos, Linux and now OSX and Windows (beta there, nearly ready) and Solaris. On Windows a Storage Spaces Pool is an alternative where you can pool disks of different size or type with redundancy, location and tiering per Space. Ntfs or ReFS are slightly faster than ZFS.

For any of the OS options you can use napp-it cs as web-gui for ZFS storage management (+ Storage Spaces on Windows)
 

fatherboard

Member
Jun 15, 2025
37
1
8
This is a difficult size to run on one machine. And it will be expensive regardless. How much performance do you actually need? What's your performance tradeoff?
Most important is to keep everything inside one machine, no network, and get data transferred at the maximum possible speed that the drive can reach.

HDD will be slower but cheaper, and you workload is roughly sequential by the sound of it, although you dont really make it clear. With HDD the network wont be a bottleneck really, you need 50 HDD at 20TB, 100Gb ethernet would easily meet peak performance.
HDDs are my least preferred. If I hear 100Gb ethernet would meet peak performance it means I have chosen drives as slow as the network. HDDs could be used as a backup, they're too slow for processing.
Yes, the workload is sequential, operations are performed one after another in a specific order, with each step dependent on the completion of the previous one. A new task cannot begin until the preceding task has finished.

WIth SSD network is a bottleneck, but unless you buy 8x the new 122TB drives https://www.servethehome.com/solidigm-d5-p5336-122-88tb-nvme-ssd-review/ you will need enclosures/pcie switches, I mean you might anyway, you dont say what sort of machine you need for this work. These large drives are expensive, thats going to cost you $130k or so,
I don't have a problem with the Solidigm D5-P5336 61,4 TB or even the new 122TB drives except that the later is not available, and I would rather buy something after it has got a few independent reviews that confirm the speeds. My problem is that they all have to be 100% identical for the RAID 0, so if I go for the 122TB it's not a problem from a cost perspective, I'll take it nice and slow, but it's not there, so I can't even get started, and if I take now one or two 61TB, then I'm stuck with 61TB for the rest. The first drive defines the rest. If the 122 TB is proven and my only way out to have everything in one machine, I'm ok with that.

you dont say what sort of machine you need for this work.
Currently :
AMD Epyc 9754, motherboard SUPERMICRO H13SSL-N, MSI GeForce RTX 5090 32G SUPRIM LIQUID SOC, 12 RAM sticks (in a workstation case)
AMD Epyc 9754, motherboard SUPERMICRO H13SSL-N, MSI GeForce RTX 5090 32G SUPRIM LIQUID SOC, 12 RAM sticks (in a workstation case)
In addition there is a workstation ASUS® PRO WS WRX90E-SAGE SE motherboard with a AMD Ryzen Threadripper PRO 7985WX 64 Core (3.2GHz - 5.1GHz, CACHE VAN 320MB)
These two twin machines will be "merged" later into a Dual Epyc 9754 with 48 sticks something like TURIN2D48G-2L+/500W, not sure because ASRock forces me to take the whole barebone just to get the motherboard.

although 30TB drives wont be much cheaper, though will be faster but it gets harder to connect them.
Are there any metrics out there that show how slower the 61TB and the 122TB are vs the 30TB?
 

fatherboard

Member
Jun 15, 2025
37
1
8
Assuming Spinners are fine… and they have to be, because all flash at that size is not exactly cheap.
What if we take Spinners out of the equation, they are my least favorite. Let's also take the price out of the equation for a while, just to get a clear technical idea, then I will reintroduce the price back. We're left with flash. How would do it with flash?
 

fatherboard

Member
Jun 15, 2025
37
1
8
is this Technically doable or are there bottlenecks I'm creating that I don't see?:

I assume the maximum number of onboard NVMe drives natively supported in a motherboard is 12 (correct?), based on GigaByte G494-ZB4-AAP2 as a reference.

In the meantime:
4 x Solidigm D5-P5336 61,4 TB ---> GLOTRENDS PU41 Quad U.2 SSD to PCIe 4.0 X16 Adapter ----> Motherboard SUPERMICRO H13SSL-N
us there a bottleneck by the u.2 to pcie 4 adapter card?

After buying all 12 drives and MB is available in the market:
12 x Solidigm D5-P5336 61,4 TB (=736TB)---> 12 NVMe drives bays in case ---> Motherboard TURIN2D48G-2L+/500W (does it have enough I/O?)
 

kapone

Well-Known Member
May 23, 2015
1,388
820
113
What if we take Spinners out of the equation, they are my least favorite. Let's also take the price out of the equation for a while, just to get a clear technical idea, then I will reintroduce the price back. We're left with flash. How would do it with flash?
But you can’t. The system has to be designed one way, if spinners are feasible, and a different way if not.

If you’re willing to go the path of the Solidigm 122tb drives, cost be damned…then this is even simpler. 8x of them will do the job. Each is PCI-e 4.0x4… so 32 pci-e lanes. That’s nothing.

Choose a motherboard(s), that have enough PCI-e lanes, do software RAID, call it a day?
 

BlueFox

Legendary Member Spam Hunter Extraordinaire
Oct 26, 2015
2,369
1,754
113
Your posts (not just in this thread) and lack of technical knowledge on all of this indicate to me you're not likely spending $100k+ on hardware. If you actually are serious, you need to do way more independent research instead of asking this forum to do it for you or pay for some consulting services to detail out what will fit your rather ambiguous requirements. It would be a drop in the bucket compared to hardware costs.
 
  • Like
Reactions: pimposh and SnJ9MX

Stephan

Well-Known Member
Apr 21, 2017
1,085
845
113
Germany
The specifics of the original post suggest this is a person tasked at the work place with something over his/her head. You can probably find the same post on Reddit and L1 because inexplicably the AI did not give a satisfactory answer.
 
  • Like
Reactions: SnJ9MX

fatherboard

Member
Jun 15, 2025
37
1
8
Your posts (not just in this thread) and lack of technical knowledge on all of this indicate to me you're not likely spending $100k+ on hardware.
By far, this is the best compliment I’ve heard about this. It’s reassuring to be so off the charts that it becomes questionable, thank you.
Just a simple guy here who prefers simple solutions that work, no corporate stuff, no boss, no stress, no nothing.
It is as individual and personal as having the purchased hardware stored next to my socks and underwear.
But it’s enough serious that in the last few months >$50K already spent in hardware alone, and will hit the $100K mark in hardware alone before the end of this year, and it will continue for a while.

If you actually are serious, you need to do way more independent research
I have done a lot of research but the results were ugly (corporate ugly). Hence my pursuit of simpler ideas that just work.

do way more independent research instead of asking this forum to do it for you
“I would like your experience to point me in the right direction” That was it, written up there. Haven’t asked for more, definitely not to do research for me.

or pay for some consulting services to detail out what will fit your

to end up with claustrophobic cases, dog slow NAS, repeat the corporate mainstream burden that is slowing everybody down, and still pay for it, no thank you.


your rather ambiguous requirements.
Simple things that work, doesn’t have to be mainstream.
 
Last edited:

fatherboard

Member
Jun 15, 2025
37
1
8
The specifics of the original post suggest this is a person tasked at the work place with something over his/her head.
I'm truly flattered, I really mean it.
No I'm just a normal dude, no corporate stuff, no boss...

You can probably find the same post on Reddit and L1 because inexplicably the AI did not give a satisfactory answer.
No idea, didn't ask in Reddit or L1, don't even have an account.
 

fatherboard

Member
Jun 15, 2025
37
1
8
But you can’t. The system has to be designed one way, if spinners are feasible, and a different way if not.
Let’s completely forget about spinners, they’re too slow for the processing. The focus it on flash, only on flash.

If you’re willing to go the path of the Solidigm 122tb drives, cost be damned…then this is even simpler. 8x of them will do the job. Each is PCI-e 4.0x4… so 32 pci-e lanes. That’s nothing.
Choose a motherboard(s), that have enough PCI-e lanes, do software RAID, call it a day?
I’m willing and U.2 is the preferred sofar but:
- 122TB is impossible to find
- 61TB has a 3 months delivery lead time
- 30TB is available now
So for now I have to keep all TB options open.

8 x 122TB → 8 x 4 = 32 lanes
12 x 61TB → 12 x 4 = 48 lanes
24 x 30TB → 24 x 4 = 96 lanes (not much left of the 128)

What do I use to connect the drives to the motherboard Supermicro H13SSL-N and later to ASRock TURIN2D48G-2L+/500W?
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,096
454
83
I'd throw one more wrench here, just for the heck of it.

OP mentioned that the larger SSD (750TB) array will receive very frequent writes and rewrites.
With that in mind, going with QLC drives, even with increased durability of 122TB drives by Solidigm, the endurance of each drive is only 0.6DWPD or 134PB over 5 years. Unlimited RANDOM writes warranty doesn't seem to apply to your case, since the writes will be sequential.

Do you feel like spending another $150k (hopefully less due to decreased cost per TB) on SSDs in 5 years?

p.s: Endurance gets somewhat worse on smaller QLC drives btw
 

fatherboard

Member
Jun 15, 2025
37
1
8
even with increased durability of 122TB drives by Solidigm, the endurance of each drive is only 0.6DWPD or 134PB over 5 years.
Hardware limitation
134PB over 5 years
1 PB ≈ 1 project
134 PB ≈ 134 projects

Project limitations
1 project duration ≥ 1 month → 5 years = max 60 projects
134 > 60

A bit over 11 years of projects, correct?

Do you feel like spending another $150k (hopefully less due to decreased cost per TB) on SSDs in 5 years?
No, just once. There is a one-off circumstance that allows this madness spend to happen now, not planning to do it again. But if the drives last 11 years and deliver 134 projects, that's not very bad.

p.s: Endurance gets somewhat worse on smaller QLC drives btw
This one took me off guard, I thought I heard the smaller ones are faster, so I kind of assumed they have more endurance.
I do have to ask the question though: if I ever find myself forced to take only the 30TB version, more than 24 of them, ( the only version available now), am I better off going woth Micron (as per your link it's TLC) than with Solidigm (QLC)?
 
Last edited:

nabsltd

Well-Known Member
Jan 26, 2022
721
511
93
Most important is to keep everything inside one machine, no network, and get data transferred at the maximum possible speed that the drive can reach.
You say you want your CPU to perform both computing tasks and handle the I/O demands of at least 8 NVMe drives. Good luck with that, because one of the two will have to suffer.

On the other hand, if you drop the idea that everything has to be in one box, you could have 100% CPU utilization for your compute node and a 100Gbps (or faster) network card that is fully saturated transferring data to storage.

15 years ago, I built a similar solution where I saturated a 40Gbps NIC, with the backend storage being a gluster cluster of 6x nodes each with 10Gbps NICs and relatively low-performance (by today's standards) 20x4TB spinning disks and 2x1TB SATA SSDs. Today, it would be trivial to build a 3-node storage cluster that could feed your compute node at 100Gbps, and would be much cheaper than trying to stuff it all into one box.

134PB over 5 years
Note that the drives alone will require at least 10KW to run, plus about the same amount of power in cooling. That's assuming you go with 100TB+ drives. At 30TB, it will be closer to 40KW.
 
Last edited:
  • Like
Reactions: nexox

fatherboard

Member
Jun 15, 2025
37
1
8
You say you want your CPU to perform both computing tasks and handle the I/O demands of at least 8 NVMe drives. Good luck with that, because one of the two will have to suffer.
Now it's just one Epyc 9754 per machine, but in a few months it will be dual epyc 9754 per system, hopefully next year dual Epyc 9965 (when a company brave enough dares make a cooler for it).
How much do I have to pay from CPU performance in handling the I/O demands of 12x61TB or 24x30TB NVMe drives*?
* because the 122TB doesn't exist in the market yet.

On the other hand, if you drop the idea that everything has to be in one box, you could have 100% CPU utilization for your compute node and a 100Gbps (or faster) network card that is fully saturated transferring data to storage.
15 years ago, I built a similar solution where I saturated a 40Gbps NIC, with the backend storage being a gluster cluster of 6x nodes each with 10Gbps NICs and relatively low-performance (by today's standards) 20x4TB spinning disks and 2x1TB SATA SSDs. Today, it would be trivial to build a 3-node storage cluster that could feed your compute node at 100Gbps, and would be much cheaper than trying to stuff it all into one box.
My objective is not to saturate the network, but to not have one to slow me down. So if the rest of the system is faster than the network, to hell with the network, keep it inside if I can fit 750TB to 1PB.

Note that the drives alone will require at least 10KW to run, plus about the same amount of power in cooling. That's assuming you go with 100TB+ drives. At 30TB, it will be closer to 40KW.
Would a dual Corsair AX1600i be enough?
 
Last edited:

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,096
454
83
Hardware limitation
134PB over 5 years
1 PB ≈ 1 project
134 PB ≈ 134 projects

Project limitations
1 project duration ≥ 1 month → 5 years = max 60 projects
134 > 60

A bit over 11 years of projects, correct?

...

This one took me off guard, I thought I heard the smaller ones are faster, so I kind of assumed they have more endurance.
I do have to ask the question though: if I ever find myself forced to take only the 30TB version, more than 24 of them, ( the only version available now), am I better off going woth Micron (as per your link it's TLC) than with Solidigm (QLC)?
See my second link.
The math for a 30.72TB drive is a bit different - ie, Guaranteed Endurance (PBW) is 31.5. It likely won't fail exactly at 32PBW, but you will (very likely) lose the warranty after it.

However, since you'd be stripping data across many drives, you won't be writing 1PB on each drive every month, but 1PB divided evenly by how many drives you're going to buy (assuming raid-0). That will decrease writes to each individual drive significantly.

Still, I'd carefully read the terms of conditions of the warranty if I were in your shoes.

1751300993414.png
 

fatherboard

Member
Jun 15, 2025
37
1
8
Thank you for your guidance sofar, almost there, some clarification questions:

1. What is the maximum number of drives a dual epyc motherboard can natively support inside the machine?
it will determine what to buy:
If 24 → 24 x 30TB TLC (Micron 9400 Pro) available
elseif 12 → 12 x 61TB QLC (Solidigm D5-P5336 61,4 TB) 3 months delivery lead time
elseif 8 → 8 x 122TB QLC (Solidigm D5-P5336 122.88TB) impossible to find

2. Which goes back to the question what are the possible ways to connect these drives including workarounds like u.2 to pcie 4 adapter card …?
 

kapone

Well-Known Member
May 23, 2015
1,388
820
113
Thank you for your guidance sofar, almost there, some clarification questions:

1. What is the maximum number of drives a dual epyc motherboard can natively support inside the machine?
it will determine what to buy:
If 24 → 24 x 30TB TLC (Micron 9400 Pro) available
elseif 12 → 12 x 61TB QLC (Solidigm D5-P5336 61,4 TB) 3 months delivery lead time
elseif 8 → 8 x 122TB QLC (Solidigm D5-P5336 122.88TB) impossible to find

2. Which goes back to the question what are the possible ways to connect these drives including workarounds like u.2 to pcie 4 adapter card …?
Something like this...

1751314055365.png


It has 20x PCI-e 5.0 x8 connectors, which can each be split into two for a total of 40x pci-e 5.0 x4 endpoints. Uses these types of cables

1751314163262.png


BUT...this doesn't take into account what chassis that motherboard is going to go into, and/or what type of backplane comes into play.

No add-on cards/adapters/converters needed. Native pci-e connectivity to 40x drives.
 
Last edited:
  • Like
Reactions: homeserver78