Super storage madness. Speccing a 1.2 Petabyte setup.

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
I have the need for stupid amounts of storage and no redundancy (don't ask,not my call).

1.2 Petabytes spread over a max of 28 units.

I have already sorted a solution out using 4U Supermicro cases but would like to use this as a fallback and investigate other options available as this project is unlikely to require delivery for another 4 months.

I came across the BackBlaze pod blog a while ago and wondered to myself what could be done around that idea. I intend this thread to be a scratchpad / brainstorming / innovation place to see if it can be bettered for less than the 28x4U servers cost.

Please note I am in Singapore so prices over here are very different from the US, UK and many other places. Distribution is very limited and distributors work hard to meet the 'Charge as much as the market will allow' motto of business.

I am currently looking at DAS units for the bulk of the storage and separate 1 or 2 CPU servers (E3-1200 or E5-2600) running ESXi for the processing function required utilizing ESXi to create a number of VMs.

Requirements are;
  • 1.2 Petabytes of storage
  • Low end 4 core cpu per every 22 + 2 spare empty slots hard drives
  • Dual lan for every 24 drives (22 filled & 2 empty).
  • Each 24 drives (as above) to be directly controlled by the VMs (VT-d)
  • Less than 112U total size (4Ux28 units)
  • Each VM to run Win 7 Pro (again, out of my control).
  • Hard drives will be spanned or striped with no redundancy (I know, I know, not my call either).
  • Redundant PSUs required in all machines.
  • Less than S$250,000

The 22 populated and 2 spare slots (unpopulated) drives are based on 2TB drives. 3TB drives can be used with resulting changes in number or populated and free slots taken in to account. The drives are SATA.

I am currently looking at a custom 4U case with a couple of Supermicro 24drive hotswap backplanes on the bottom with the drives mounted vertically (tail end down). The backplanes have a SAS expander allowing dual mini-SAS connection (8 lanes total). I did look at three backplanes in the DAS case but you would need to remove the entire top of the case to swap dead drives out and that would be difficult in a fully populated rack.

The 4 SAS cables from the two backplanes would connect to int->ext converters which would then link to something like a Highpoint 2744 PCIe 2.0 x16 controller. This would be housed in a Supermicro X9DRD-iF motherboard (need to confirm ESXi ability to run on this board but the i350 network chipset seems to be supported) with dual E5-26XX cpus and a stack of ram (spec as yet undecided). The SAS controller is PCIe 2.0 x16 so can handle 8GB/s on the PCIe bus. The quad SAS cables from the backplanes can handle 9.6GB/s (600MB/s *4 channels *4 cables). This means the PCIe bus is the limit but still allows the drives to give 166MB/s (give or take) so plenty for the DAS.

After Patricks great review of the Silicom PEG6I 6 port network card it seems this may be ideal for the network connectivity. 6 ports shared via VT-d in pairs plus the two on the motherboard gives 4 VMs. Two more Highpoint 2722s will give 4 more external SAS connectors allowing a second DAS box to be connected so there will be enough drives for the 4 VMs. I would imagine low(ish) end E5s should be fine.

I will separate in to the different machine builds to make it clearer later when I get home.

Notes, suggestions, stupid mistakes pointed out all welcome.

RB
 

dba

Moderator
Feb 20, 2012
1,477
184
63
San Francisco Bay Area, California, USA
Madness indeed! Sounds like fun.

With their 2.0 storage pods, Backblaze did two important (but often overlooked) things well:

1) They tested all of their components extensively.
2) They thought about vibration. Drive vibration, power supply vibration, and fan vibration.

If my needs were substantially similar to theirs, and I didn't have the time or budget to do a large amount of my own testing, I'd simply copy as much of their design as possible. Specifically:

1) Use the Hitachi Deskstar 5K3000 HDS5C3030ALA630 drives. 1%/year failure (7 drives/year in your scenario) is really good for consumer-level drives.
2) Either use their storage pod design - especially the anti-vibration drive mounts and pads - or do something very similar. Personally I would not mount 24 consumer-grade SATA drives in a standard metal server case - the vibration can be expected to reduce performance and likely reliability. Enterprise-class drives have tools to manage vibration, but even then the effect is quite noticeable when you start racking them. A rack of hundreds of spinning drives mounted metal on metal would be a nightmare - see http://storagemojo.com/2010/05/19/shock-vibe-and-awe/.

Another tip is to take manufacturer speed claims with a grain of salt. The Highpoint card you mention won't really handle 8GB/s (more like 1GB/s running best-case mostly sequential reads - far less handling VM workloads), the expanders won't handle their rated throughput, and the PCIe bus won't let you fully saturate it. Be sure to test extensively.

I'd also spent a bit of time on logistics. Get a freight forwarder so that you can buy in the US (at US prices) and ship in bulk to your location.



I have the need for stupid amounts of storage and no redundancy (don't ask,not my call).

1.2 Petabytes spread over a max of 28 units.

I have already sorted a solution out using 4U Supermicro cases but would like to use this as a fallback and investigate other options available as this project is unlikely to require delivery for another 4 months.

I came across the BackBlaze pod blog a while ago and wondered to myself what could be done around that idea. I intend this thread to be a scratchpad / brainstorming / innovation place to see if it can be bettered for less than the 28x4U servers cost.

Please note I am in Singapore so prices over here are very different from the US, UK and many other places. Distribution is very limited and distributors work hard to meet the 'Charge as much as the market will allow' motto of business.

I am currently looking at DAS units for the bulk of the storage and separate 1 or 2 CPU servers (E3-1200 or E5-2600) running ESXi for the processing function required utilizing ESXi to create a number of VMs.

Requirements are;
  • 1.2 Petabytes of storage
  • Low end 4 core cpu per every 22 + 2 spare empty slots hard drives
  • Dual lan for every 24 drives (22 filled & 2 empty).
  • Each 24 drives (as above) to be directly controlled by the VMs (VT-d)
  • Less than 112U total size (4Ux28 units)
  • Each VM to run Win 7 Pro (again, out of my control).
  • Hard drives will be spanned or striped with no redundancy (I know, I know, not my call either).
  • Redundant PSUs required in all machines.
  • Less than S$250,000

The 22 populated and 2 spare slots (unpopulated) drives are based on 2TB drives. 3TB drives can be used with resulting changes in number or populated and free slots taken in to account. The drives are SATA.

I am currently looking at a custom 4U case with a couple of Supermicro 24drive hotswap backplanes on the bottom with the drives mounted vertically (tail end down). The backplanes have a SAS expander allowing dual mini-SAS connection (8 lanes total). I did look at three backplanes in the DAS case but you would need to remove the entire top of the case to swap dead drives out and that would be difficult in a fully populated rack.

The 4 SAS cables from the two backplanes would connect to int->ext converters which would then link to something like a Highpoint 2744 PCIe 2.0 x16 controller. This would be housed in a Supermicro X9DRD-iF motherboard (need to confirm ESXi ability to run on this board but the i350 network chipset seems to be supported) with dual E5-26XX cpus and a stack of ram (spec as yet undecided). The SAS controller is PCIe 2.0 x16 so can handle 8GB/s on the PCIe bus. The quad SAS cables from the backplanes can handle 9.6GB/s (600MB/s *4 channels *4 cables). This means the PCIe bus is the limit but still allows the drives to give 166MB/s (give or take) so plenty for the DAS.

After Patricks great review of the Silicom PEG6I 6 port network card it seems this may be ideal for the network connectivity. 6 ports shared via VT-d in pairs plus the two on the motherboard gives 4 VMs. Two more Highpoint 2722s will give 4 more external SAS connectors allowing a second DAS box to be connected so there will be enough drives for the 4 VMs. I would imagine low(ish) end E5s should be fine.

I will separate in to the different machine builds to make it clearer later when I get home.

Notes, suggestions, stupid mistakes pointed out all welcome.

RB
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,519
5,827
113
I priced this out awhile back. Pretty similar to the big 4U double-sided Supermicro storage when all is said and done. Also, not really a huge fan of relying upon all of those port multipliers. I should make the trek up to Backblaze one of these days.
 

MiniKnight

Well-Known Member
Mar 30, 2012
3,073
974
113
NYC
real question is not the 1.2pb

instead, how do you get decent performance from it. How much ram and ssd storage do you need to cache 1.2pb of data?
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
real question is not the 1.2pb

instead, how do you get decent performance from it. How much ram and ssd storage do you need to cache 1.2pb of data?
Performance was never the goal of this project and so was not a big issue.

If performance was a desire then a whole different approach would be needed I believe which would have to take in to account bandwidth limitations on the SAS cards, expanders, PCIe bus an hard drives themselves.

Caching would depend on usage patterns as if it was an archive system where the last small amount of data was regularly accessed then it would be easy to work out. If however, it was a massive patent databases in a hospital in the height of a crisis then the needs would most likely be a lot different. I agree, it would be interesting to run some 'what if' scenarios though :)

Just an an overall update.... my client bidding for the contract was outbid so this never got off the ground. Still, an interesting topic I think, to mull over :)

RB