I'm re-working a few things in the structure of my startup's data storage and r&d lab. Since I've been working, and somewhat halting on the super high performance data serving need in the r&d lab. I'm taking a quick break to tackle, what should be a much easier challenge, and that is my primary means for backup, data integrity, and failover in the event of a catastrophe.
For the means of this challenge, I am looking for expertise on how to design and utilize what I have left over from the r&d purchases, recent server configs, etc. I have the gear available, and would like to see some suggestions with benefits of the design/configuration.
-Super high performance data access in process in the r&d lab. For the time being what I have works for current necessity.
-I have the mission-critical data replicated on a mirrored system. So there is a safeguard in place for failure of the r&d lab system.
-The mission-critical data is also stored and triplicate on a backup array, which is also replicated off-site.
I'm in a decent spot, but I would like to have a tertiary system housed on-site, underground; for purposes of catastrophe meeting catastrophe, I don't want to be at the mercy of the time it would take to recover from the off-site replication. I've also encountered significant data corruption, from which I quickly recovered, because these were small data sets, cloud recovery was quick and non-problematic.
Currently, I am missing the protection from corruption, while snapshots are in place for some critical data, I need to establish snapshots for the entire system. That's why I've chosen ZFS.
It's still relatively new to me, and I've brought myself to about 50% of what I need to know. But I want to test out a few configurations and determine how to best design the hardware. I can worry about methodology once I have the system built.
Keys
-This array doesn't need to be highly scalable, but in the next year, I'll need to be able to scale on the order of petabytes. If the system can scale, great, it can be a long-term option. But if it can't, the funds will be there for a managed solution, if necessary. For now, I need something to complete the protection scheme between now and such a time I need to scale-up, at which point I can look at the cost of scaling this system vs a managed solution.
-The system needs to have the ability to hold 100TB cold data, 10-12TB of warm data, and I'd say about 3-4tb of hot data.
-I don't need the same type of performance that I need for r&d. I do, however need to be able to meet throughput of right at least 800 megabytes per second, 1.0-1.5 gigabytes per second would be ideal over two 10GbE connections, but I can live with saturating a single 10GbE connection, throughput wise.
Materials for use
-6 12tb WD Gold
-12 6tb WD Gold
-12 1tb Intel 545 SSD's
-2, maybe 3 2tb Intel NvME's
-2 LSI 9300 HBA's
-1 LSI 9207 HBA
-1 LSI 9201 HBA
-2 Chelsio T580 NIC's
-Server is a super micro with xeon scalable bronze, 128gb RAM, I have extra RAM, so I could expand it, but not likely needed.
Maybe ZFS is the right way to go, maybe not. Maybe I don't need to use all this one system. I'm open to design thoughts, as long as I can meet the key requirements.
With all the possible configs, I don't know which way to configure.
For the means of this challenge, I am looking for expertise on how to design and utilize what I have left over from the r&d purchases, recent server configs, etc. I have the gear available, and would like to see some suggestions with benefits of the design/configuration.
-Super high performance data access in process in the r&d lab. For the time being what I have works for current necessity.
-I have the mission-critical data replicated on a mirrored system. So there is a safeguard in place for failure of the r&d lab system.
-The mission-critical data is also stored and triplicate on a backup array, which is also replicated off-site.
I'm in a decent spot, but I would like to have a tertiary system housed on-site, underground; for purposes of catastrophe meeting catastrophe, I don't want to be at the mercy of the time it would take to recover from the off-site replication. I've also encountered significant data corruption, from which I quickly recovered, because these were small data sets, cloud recovery was quick and non-problematic.
Currently, I am missing the protection from corruption, while snapshots are in place for some critical data, I need to establish snapshots for the entire system. That's why I've chosen ZFS.
It's still relatively new to me, and I've brought myself to about 50% of what I need to know. But I want to test out a few configurations and determine how to best design the hardware. I can worry about methodology once I have the system built.
Keys
-This array doesn't need to be highly scalable, but in the next year, I'll need to be able to scale on the order of petabytes. If the system can scale, great, it can be a long-term option. But if it can't, the funds will be there for a managed solution, if necessary. For now, I need something to complete the protection scheme between now and such a time I need to scale-up, at which point I can look at the cost of scaling this system vs a managed solution.
-The system needs to have the ability to hold 100TB cold data, 10-12TB of warm data, and I'd say about 3-4tb of hot data.
-I don't need the same type of performance that I need for r&d. I do, however need to be able to meet throughput of right at least 800 megabytes per second, 1.0-1.5 gigabytes per second would be ideal over two 10GbE connections, but I can live with saturating a single 10GbE connection, throughput wise.
Materials for use
-6 12tb WD Gold
-12 6tb WD Gold
-12 1tb Intel 545 SSD's
-2, maybe 3 2tb Intel NvME's
-2 LSI 9300 HBA's
-1 LSI 9207 HBA
-1 LSI 9201 HBA
-2 Chelsio T580 NIC's
-Server is a super micro with xeon scalable bronze, 128gb RAM, I have extra RAM, so I could expand it, but not likely needed.
Maybe ZFS is the right way to go, maybe not. Maybe I don't need to use all this one system. I'm open to design thoughts, as long as I can meet the key requirements.
With all the possible configs, I don't know which way to configure.