RAIDzilla 2.5 (128TB) upgrade

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,066
503
113
New York City
www.glaver.org
Build’s Name: RAIDzilla 2.5
Operating System/ Storage Platform: FreeBSD 10.3 / ZFS
CPU: 2 * Intel Xeon E5620
Motherboard: Supermicro X8DTH-iF
Chassis: Supermicro SC836BA-R920B
Drives: 16 * HGST HUH728080AL4200 8TB (storage) + Samsung MZ-7KE256BW (OS)
RAM: 12 * HMT31GR7AFR4C-H9 (96GB total)
Add-in Cards: LSI 9201-16i (internal storage), LSI 9200-8e (for LTO library), Intel X540-T1 10GbE, OCZ Velodrive VD-HHPX8-300G PCIe SSD
Power Supply: 2 * 920W Supermicro SQ hot-swap
Other Bits: DVD drive, Dell PowerVault TL4000 48-slot LTO-4 library

Usage Profile: Backup server, video archiving (cameras on my race car), CD image storage, misc.

I had posted an earlier version of this build (RAIDzilla II) in the "show us your build" thread here. I've been working on a writeup of the new version since January and it still isn't completely done, so I figured I'd publish it now and document the build here as well.

Lots of interesting things going on - custom flat SAS cables, 4K native (not 512e) drives, and other stuff.

The initial goal was to reduce power consumption / heat without sacrificing performance. I then decided to go with a newer generation of CPU, twice the memory, and 4 times the disk capacity. I have upgraded two of my RAIDzilla II systems so far. I'll probably run with just those two right now, and keep the two non-upgraded systems powered off until I need more than 256TB of storage (hopefully, not for another 5 years).

RAIDzilla 2.5 upgrade article here. Original RAIDzilla II article here. For completeness, the first RAIDzilla (11 years ago) article here. Feel free to ask any question you'd like.

Inside:


Front:


Back:


RAIDzilla 2.5 above RAIDzilla II:
 
  • Like
Reactions: wsuff and b3nz0n8

xnoodle

Active Member
Jan 4, 2011
259
48
28
Very nice write up and very clean build!

Are your three zdev's in different pools, or a single pool?
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,066
503
113
New York City
www.glaver.org
Very nice write up and very clean build!
Thanks!

Are your three zdev's in different pools, or a single pool?
Single pool. This is a RAIDzilla II:
Code:
(0:2) rz1:/sysprog/terry# zpool status
  pool: data
 state: ONLINE
  scan: scrub repaired 0 in 9h34m with 0 errors on Wed Feb 11 06:27:20 2015
config:

  NAME  STATE  READ WRITE CKSUM
  data  ONLINE  0  0  0
  raidz1-0  ONLINE  0  0  0
  label/twd0  ONLINE  0  0  0
  label/twd1  ONLINE  0  0  0
  label/twd2  ONLINE  0  0  0
  label/twd3  ONLINE  0  0  0
  label/twd4  ONLINE  0  0  0
  raidz1-1  ONLINE  0  0  0
  label/twd5  ONLINE  0  0  0
  label/twd6  ONLINE  0  0  0
  label/twd7  ONLINE  0  0  0
  label/twd8  ONLINE  0  0  0
  label/twd9  ONLINE  0  0  0
  raidz1-2  ONLINE  0  0  0
  label/twd10  ONLINE  0  0  0
  label/twd11  ONLINE  0  0  0
  label/twd12  ONLINE  0  0  0
  label/twd13  ONLINE  0  0  0
  label/twd14  ONLINE  0  0  0
  spares
  label/twd15  AVAIL 

errors: No known data errors
 

StammesOpfer

Active Member
Mar 15, 2016
382
126
43
raidz1 across 5x 8TB drives. I suppose your offsite and tape backups will save you if a rebuild fails. That is living more dangerously than I am comfortable with.
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,066
503
113
New York City
www.glaver.org
raidz1 across 5x 8TB drives. I suppose your offsite and tape backups will save you if a rebuild fails. That is living more dangerously than I am comfortable with.
I can always just switch over to the offsite system - it is connected to the house with 2 * GigE links (dark fiber, I should probably change over to 10GbE).

My experience with the 2TB drives in 5+ years is single failures, and I would expect the HGST 8TB drives to be more reliable, not less. And with a rebuild speed of 1.3Gbyte/sec, this isn't the week-long rebuild that some people have to deal with.
 

alex1002

Member
Apr 9, 2013
518
18
18
I can always just switch over to the offsite system - it is connected to the house with 2 * GigE links (dark fiber, I should probably change over to 10GbE).

My experience with the 2TB drives in 5+ years is single failures, and I would expect the HGST 8TB drives to be more reliable, not less. And with a rebuild speed of 1.3Gbyte/sec, this isn't the week-long rebuild that some people have to deal with.
I can't believe you got two 10gbe connections to the house. That's beautiful.
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,066
503
113
New York City
www.glaver.org
I can't believe you got two 10gbe connections to the house. That's beautiful.
They're running as GigE at the moment. I just need to get another 10GbE switch and some SFP+ parts for the far end and I'll switch it over.

It is dark fiber that I put up between my house and my (at the time) office, on pole space leased from the phone company. While I no longer work there, I trade free advice for free rack space.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
6,988
1,566
113
CA
I take it you're in a major city to be able to lease pole space from the phone company or have connections to someone leasing a lot more space?
 

JeffroMart

Member
Jun 27, 2014
61
18
8
41
It's actually not too difficult or expensive to lease the space from the local telco or power providers. It's just more time consuming waiting for them to survey out and check the poles you want to lease to see if there is space available or moves lines around on them to make space for you. If I remember correctly they have to provide like 12" of space between customers. You basically pay per pole on a per year fee, around this rural area of WV it's something like $12 per pole / yr.
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,066
503
113
New York City
www.glaver.org
I take it you're in a major city to be able to lease pole space from the phone company or have connections to someone leasing a lot more space?
Yes to a major city, but it is the same phone company throughout the state (due to consolidation, the phone company covers several states, but when I did this they were just in the one state).

Part of my job back then was getting data to various buildings over a 3 x 5 block campus area. The phone company leased small numbers of poles for $5/pole/year with a $100/year minimum billing. Some of the runs were on paid leases. Others were trades with the phone company ("we'll let you use our conduits to get from A to B if you let us use your poles to get from C to D"). The run to my house was actually a combination - a few poles leased from the phone company, ending at a pole owned by the cable TV company. Then a conduit trade with the cable TV company for a few blocks, then back up onto phone company poles for a bit, then into one of the major campus buildings which has 48 strands of fiber back to the data center.

You need to either have an existing contract with the phone company or know somebody in order to get a pole lease - it isn't something they advertise, and the business office doesn't know anything about it. In my case, there was an existing contract going back many years, and I knew the people at the phone company.

In addition to paying the $100/year pole contract, I needed to have $1M liability insurance and a 24 x 7 x 365 maintenance contact (in case a drunk driver takes down a pole, I need to have someone onsite ASAP to relocate or disconnect my cable so a replacement pole can be installed).
 
  • Like
Reactions: T_Minus and Chuntzu

azev

Active Member
Jan 18, 2013
737
203
43
are you aware of any PCI-E bottleneck issue with that motherboard ?
I've had a similar board X8DAH, and I was getting super crappy PCI-E throughput (10Gb Nic & Raid Controller).
When I upgraded to X9 series motherboard with the budget E5-2670 I was getting screaming performance from the same parts.
Anyway, reason I asked, I am shopping around for a budget backup system for my brother house, and wonder what is your experience with the motherboard performance ?

Thanks,
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
6,988
1,566
113
CA
Yes to a major city, but it is the same phone company throughout the state (due to consolidation, the phone company covers several states, but when I did this they were just in the one state).

Part of my job back then was getting data to various buildings over a 3 x 5 block campus area. The phone company leased small numbers of poles for $5/pole/year with a $100/year minimum billing. Some of the runs were on paid leases. Others were trades with the phone company ("we'll let you use our conduits to get from A to B if you let us use your poles to get from C to D"). The run to my house was actually a combination - a few poles leased from the phone company, ending at a pole owned by the cable TV company. Then a conduit trade with the cable TV company for a few blocks, then back up onto phone company poles for a bit, then into one of the major campus buildings which has 48 strands of fiber back to the data center.

You need to either have an existing contract with the phone company or know somebody in order to get a pole lease - it isn't something they advertise, and the business office doesn't know anything about it. In my case, there was an existing contract going back many years, and I knew the people at the phone company.

In addition to paying the $100/year pole contract, I needed to have $1M liability insurance and a 24 x 7 x 365 maintenance contact (in case a drunk driver takes down a pole, I need to have someone onsite ASAP to relocate or disconnect my cable so a replacement pole can be installed).
Thanks for the info! I did a quick google and it looks like in CA they not only require insurance for liability but qualified att worker, workmans comp, auto insurance, and a bunch more hoops to jump through...

Def. something to consider if office is nearby house it seems like though, way cheaper than point to point leased linef rom ATT but maybe huge headache.
 

cheezehead

Active Member
Sep 23, 2012
714
174
43
WI
Thanks for the info! I did a quick google and it looks like in CA they not only require insurance for liability but qualified att worker, workmans comp, auto insurance, and a bunch more hoops to jump through...

Def. something to consider if office is nearby house it seems like though, way cheaper than point to point leased linef rom ATT but maybe huge headache.
Really depends on the bandwidth requirements. We've done dark aerial and and buried for distances of up to 10mi. Buried is more expensive but gets away from the monthly fees generally....horse-trading pipes or pairs really makes it viable. Aerial is pretty straight forward though every state and electric company is a bit different on what the process is and who is allowed to touch "their poles". Some fiber/copper installers have "pole rights" which may make coordinating everything a bit easier.

Running 20GB at under 1ms latency across town is just sexy :D:D
 
  • Like
Reactions: T_Minus

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,066
503
113
New York City
www.glaver.org
are you aware of any PCI-E bottleneck issue with that motherboard ?
Nope. I get wire speed (9.9Gbit/s) on the network card and 1.3GByte/s from the disk controller.
I've had a similar board X8DAH, and I was getting super crappy PCI-E throughput (10Gb Nic & Raid Controller).
Both the X8DTH-iF and the X8DAH have dual 5520 chipsets, so they should both have 72 PCIe lanes available, which should be more than enough, even counting the on-board stuff. The X8DTH-iF provides 56 lanes to the expansion slots (7 * x8 slots), while the X8DAH provides 54 lanes (in a variety of slot widths).
 

azev

Active Member
Jan 18, 2013
737
203
43
Nope. I get wire speed (9.9Gbit/s) on the network card and 1.3GByte/s from the disk controller.

Both the X8DTH-iF and the X8DAH have dual 5520 chipsets, so they should both have 72 PCIe lanes available, which should be more than enough, even counting the on-board stuff. The X8DTH-iF provides 56 lanes to the expansion slots (7 * x8 slots), while the X8DAH provides 54 lanes (in a variety of slot widths).
Wow that is interesting, I wonder then if I had a dud board. A while back I started this thread because I was getting crappy pci performance on my system. I had 2x dual port 10gb nic & 1 raid adapter, and regardless which PCI slot I install them on, I was getting really bad performance.
It was so bad that you can see performance dip during iperf testing if I started any disk benchmark.
The same if I ran more than 1 iperf test instance on different nic ports. It seems like the whole PCI bus is sharing 10gb throughput.

Did you do any concurrent test ?? (iperf and diskmark for instance)
Was there a special driver that you have to install ?? I was running windows 2012R2 if that makes a difference.
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,066
503
113
New York City
www.glaver.org
Did you do any concurrent test ?? (iperf and diskmark for instance)
I did a ZFS send/receive over the 10GbE LAN (of 20TB or so) and was getting ~ 750MB/sec, so I don't think there were any issues. All of those Compellent SC40's that are being sold on eBay also used the X8DTH-iF motherboard, so I don't think there are any problems with it.
Was there a special driver that you have to install ?? I was running windows 2012R2 if that makes a difference.
I'm running FreeBSD 10.x, so I can't say anything about Windows on this box.
 

alex1002

Member
Apr 9, 2013
518
18
18
What cages you using for these. I really like the cases you used. Can you post model numbers.


Sent from my iPhone using Tapatalk
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,066
503
113
New York City
www.glaver.org
I've been working on a writeup of the new version since January and it still isn't completely done, so I figured I'd publish it now and document the build here as well.
I have made a number of changes to the RAIDzilla 2.5 article on my site. The major change is a new "Replication and backups" section which includes the complete scripts I use for replication and backup. Replication runs at around 700MByte/sec (on a 10GbE connection).
 
  • Like
Reactions: rubylaser

jfoor

Member
Feb 4, 2017
81
20
8
32
Just wanted to say thanks for the write up(s.) I especially liked looking at your first build, the original RAIDZilla. In that 11yr old article you commented on how amazing it was to have 2GB DIMMs in such a small physical space! I guess I'm just used to it by now but it's truly amazing how much can change over a decade in IT. Anyway, thanks for documenting your systems!

Edit:

Just saw some pics of your setup..is that rackmount clock linked to NTP? Or just a "regular" clock. Got a link for it?
 
Last edited: