So maybe you've seen some of my posts in other subforums, micro version is that i'm forced to start on a shoestring that can be upgraded over time preferably without totally changing to a different system doing a data migration of lots of exporting and importing.
The software i'm planning to use is SnapRAID - my reason for choosing it is the Drobo like functionality where I can upgrade any drive at any time in size, or add drives at any time. (FreeNAS ZFS wants me to buy matched monoliths set up as single vdevs and makes upgrading piecemeal more difficult) I can also upgrade the server side hardware - motherboards, controllers and such without any forced remigration of data internally like hardware RAID changes may involve. I originally planned for the capacity of a 300TB scaleout in the future, but in reality it could be more or less. I was going to build an 8TB test box, start at 32TB for production work, which will probably rapidly end up upgraded to 64TB. Beyond 64tb i'm not sure when it will happen but planning for that, in case opportunities present themself requiring me to rapidly grow the back end to hold the data i'd like to have a plan on the shelf knowing exactly what to do.
I at least am aware of things like SAS Expanders, SAS without expanders, SATA multilane and SATA port multipliers. Not very well mind you, but I know what exists. I mostly plan to use consumer grade SATA hard drives now or future, good enough for Backblaze and Google. Future SSD to help cache things might be commercial (due to total lifetime bytes written issues if I plan to burn through petabytes of processing) or at least like an Intel 750 NVME but the questions here are mostly about the hard drive array.
My priorities are 1) drive space, meaning until I have 'enough' space the performance is not as important, 2) after plenty of free space might upgrade to improve that, 3) reliability/uptime after plenty of space and fast enough it could mean adding redundancy (perhaps migrating drives to some kind of SAN nodes and having an entire spare or two node ready to plug in if any problems - or simply a prebuilt backup server and swapping the drives over), 4) minimizing time spent on sysadmin duties, as close to a "set it and forget it" self managing array as we can get. Just periodically swap drives when they die or upgrade and the rest auto takes care of itself.
So at first it's just about the minimum cost overhead per drive (which seems like it will go up past 20-some because 8+ channel controllers are more expensive per port), like common 2 or 4 port pcie x1 cards, and being happy saturating 1gig ethernet. 2nd stage is trying to make those existing drives faster, which might involve some internal drive migration, setting up RAID 0 stripes, and SSD caching while trying to potentially swamp 10gig Ethernet. Stage 3 makes me want to have some kind of total system failover or clustering option, so something like a dying HBA or motherboard doesn't stop the work. So whatever drives are already in there might have to be swappable easily to a backup system. Stage 4 is trying to have that automatic - failure detection, automatic failover, etc. (no idea if it requires special controllers or hardware to do that, but if it's so thats when it's relevant)
Can anyone give me some quick and dirty guidelines about where the sweet spots might be, and when certain upgrades might be necessary? For instance when I should look at SAS controllers (it might be a maybe at 16 drives and a definate at 24-48 for instance) or whether multilane SATA introduces reliability problems or what kind of loads make the SSD useful as a cache and similar. I realize this is kind of open ended and wordy but i'm not sure where else to start. I have a plan in my head (which was to just start with 4 port PCIe x1 cards maxing out a motherboard) but know it may not work as I plan. (issues of bus contention, shared PCIe line bandwidth and similar)
The software i'm planning to use is SnapRAID - my reason for choosing it is the Drobo like functionality where I can upgrade any drive at any time in size, or add drives at any time. (FreeNAS ZFS wants me to buy matched monoliths set up as single vdevs and makes upgrading piecemeal more difficult) I can also upgrade the server side hardware - motherboards, controllers and such without any forced remigration of data internally like hardware RAID changes may involve. I originally planned for the capacity of a 300TB scaleout in the future, but in reality it could be more or less. I was going to build an 8TB test box, start at 32TB for production work, which will probably rapidly end up upgraded to 64TB. Beyond 64tb i'm not sure when it will happen but planning for that, in case opportunities present themself requiring me to rapidly grow the back end to hold the data i'd like to have a plan on the shelf knowing exactly what to do.
I at least am aware of things like SAS Expanders, SAS without expanders, SATA multilane and SATA port multipliers. Not very well mind you, but I know what exists. I mostly plan to use consumer grade SATA hard drives now or future, good enough for Backblaze and Google. Future SSD to help cache things might be commercial (due to total lifetime bytes written issues if I plan to burn through petabytes of processing) or at least like an Intel 750 NVME but the questions here are mostly about the hard drive array.
My priorities are 1) drive space, meaning until I have 'enough' space the performance is not as important, 2) after plenty of free space might upgrade to improve that, 3) reliability/uptime after plenty of space and fast enough it could mean adding redundancy (perhaps migrating drives to some kind of SAN nodes and having an entire spare or two node ready to plug in if any problems - or simply a prebuilt backup server and swapping the drives over), 4) minimizing time spent on sysadmin duties, as close to a "set it and forget it" self managing array as we can get. Just periodically swap drives when they die or upgrade and the rest auto takes care of itself.
So at first it's just about the minimum cost overhead per drive (which seems like it will go up past 20-some because 8+ channel controllers are more expensive per port), like common 2 or 4 port pcie x1 cards, and being happy saturating 1gig ethernet. 2nd stage is trying to make those existing drives faster, which might involve some internal drive migration, setting up RAID 0 stripes, and SSD caching while trying to potentially swamp 10gig Ethernet. Stage 3 makes me want to have some kind of total system failover or clustering option, so something like a dying HBA or motherboard doesn't stop the work. So whatever drives are already in there might have to be swappable easily to a backup system. Stage 4 is trying to have that automatic - failure detection, automatic failover, etc. (no idea if it requires special controllers or hardware to do that, but if it's so thats when it's relevant)
Can anyone give me some quick and dirty guidelines about where the sweet spots might be, and when certain upgrades might be necessary? For instance when I should look at SAS controllers (it might be a maybe at 16 drives and a definate at 24-48 for instance) or whether multilane SATA introduces reliability problems or what kind of loads make the SSD useful as a cache and similar. I realize this is kind of open ended and wordy but i'm not sure where else to start. I have a plan in my head (which was to just start with 4 port PCIe x1 cards maxing out a motherboard) but know it may not work as I plan. (issues of bus contention, shared PCIe line bandwidth and similar)