ScaleIO Hyperconverged Cluster Build

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Jake Sullivan

New Member
Oct 9, 2015
16
23
3
33
Sup crew!

New DR/DEV setup is finally rolling so I thought I'd share some of the build process. This is the first time I've posted anything like this so if there's any more information you all would like to see, just let me know!

Final Build:

4 Compute Nodes:

Operating System/ Storage Platform: ESXi 6.0 U1
CPU: 2x Intel Xeon X5560
Motherboard: X8DTN+
Chassis: 12 Bay SC826-E16
Drives: 4x HGST Deskstar NAS 4TB 7200 RPM 64MB and 2x Samsung 850 Pro 512GB
ADATA Premier SP550 128GB (boot media)
RAM: 256GB (16x 16GB)
Add-in Cards: LSI 9211-8i IT mode / 2x MHQH29B XTR Infiniband
Power Supply: 2x 800W PS

And 4 Storage Nodes:

Operating System/ Storage Platform:
CentOS 7
CPU: Single Intel Xeon X5560
Motherboard: X8DTN+
Chassis: 12 Bay SC826-E16
Drives: 4x HGST Deskstar NAS 4TB 7200 RPM 64MB and 2x Samsung 850 Pro 512GB
ADATA Premier SP550 128GB (boot media)
RAM: 32GB (8x4GB)
Add-in Cards:
LSI 9211-8i IT mode / 1x MHQH29B XTR Infiniband
Power Supply: 2x 800W PS

Current Capacity:
113TB



Backstory:

I work in the oil and gas industry, and man have things been rough over the past year. Management finally listened to our complaints about facilities and a cohesive DR strategy and gave us the go ahead to build out a secondary system that could do double duty as a failover system and development environment (new ERP system coming :(). Thing is, it's not exactly cheap to duplicate a UCS chassis and a Nimble CS400. We also needed a substantial amount of scaleability and capacity for our seismic and geology data. Short story is we ended up going with ScaleIO for the storage platform and Zerto for the replication component.

Our plan was to steadily add 4 drives per month until the system reached 256TB of capacity - we figured we would start at around 100TB to get things off the ground.
Everything excluding the drives was purchased off of eBay. I'll post up an FAQ for why we went this direction as I have more time free up.


Now being that I wasn't entirely sure any of this would work the way I wanted, I ended up going through a couple of build phases to prove the system out before asking for more money.


Phase 1:

I wanted the system to take advantage of a meshed Infiniband network for storage and vMotion traffic. So for the POC we ended up buying two compute nodes and a Mellanox IS5025. To see how flexible SclaleIO could be with hardware, I threw in a desktop (lol) and an older Dell PE server.





1gig ethernet on the client side, IB on the backend.



Using USB 2.0 drives as boot media in this setup blew...


Desktop hyyyype.





The office I was testing in got kinda warm, but storage testing and replication worked like a champ.





Phase 2:


I trashed all of my testing config in phase 1 and started from scratch with the new hardware. My plan was to build out a base 3 node cluster and expand node by node until everything made it to our rack.


Everything ordered



Three node setup (a bit cleaner this time)



The three nodes were racked and validated. By this time, I had added a second IS5025 and dual IB cards to each compute node. I'll post a diagram of the topology for those interested in how the storage fabric was configured.



Phase 3:

All racked and configured!

Part of the Nimble is shown on the very top and two UCS Fabric Interconnects are chillin on the bottom


Don't judge those labels. Accounting stole the legit printer so we had to make do :(



That's a 3750-G in the middle handling client side traffic.




IB cables were so freaking long.



Storage node on top, two compute nodes below it.


In the dark?



vCenter (ssh is enabled - haven't mass disabled the alerts yet)






vSwitch config



ScaleIO GUI











Zerto. not fully configured but hey...dashboard!




Had a blast implementing this project - if there's any info you guys want or you have any questions at all post away.


Have a Merry Christmas!
 
Last edited:

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Unix surplus tag on one of those 826 looked familiar :) I am staring at mine right now! I know there are certain restrictions on what benchmarking info you can share per emc but are both mass storage and ssd pools performing like you expected?
 

Jake Sullivan

New Member
Oct 9, 2015
16
23
3
33
Gotta rep Unix Surplus. They rock!

Since this system was built outside of the support bubble of EMC, I can share whatever. Lemme know what you'd like to see :)

Performance thus far has been directly related to spindle/drive count and has scaled in a linear fashion. . As we obtain more drives, the numbers should gradually improve.

From a DR perspective - failover testing for our primary VMs has taken about three minutes from shutdown at production to up and running in the cluster.
 

dswartz

Active Member
Jul 14, 2011
610
79
28
Interesting posts. I have been curious about scaleio. Particularly, low-count clusters (e.g. 3 hosts). It isn't clear to me how much storage you lose in this scenario. e.g. if you have say, 9 1TB disks, with 3 on each host, what is the total usable space?
 

Jake Sullivan

New Member
Oct 9, 2015
16
23
3
33
Interesting posts. I have been curious about scaleio. Particularly, low-count clusters (e.g. 3 hosts). It isn't clear to me how much storage you lose in this scenario. e.g. if you have say, 9 1TB disks, with 3 on each host, what is the total usable space?
From my understanding of scaleio, two copies of data are stored in the system. So in your example, you would lose the equivalent of one of your three nodes to data redundancy. Think of it as something like a distributed RAID10 with similar benefits. As you add more nodes, you do become a bit more space efficient.

BTW, you the dswartz that posts on the user group for ESOS?
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,417
468
83
Interesting posts. I have been curious about scaleio. Particularly, low-count clusters (e.g. 3 hosts). It isn't clear to me how much storage you lose in this scenario. e.g. if you have say, 9 1TB disks, with 3 on each host, what is the total usable space?
you loose exactly half of your storage (it is raid 1).

plus you will need to reserve at least 1 nodes worth of storage in case a node goes down. so you are left with 3 TB of usable in your scenario.

Chris
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
Such a cool project! How is the performance at this point? I have heard ScaleIO does not get really good until the number of disks/ nodes balloons.

Also something I saw:
RAM: 256GB (6x 16GB)
My math may be poor in late December but I think that might be a typo.
 

captain_fail

New Member
Dec 8, 2015
24
5
3
Thanks for sharing, this is highly educational! It's always cool to see what computing problems and solutions come up in other industries. A colleague of mine came from the energy sector - the most intense quant I've ever met. Our focus is consumer market research, sociology and economic research. So our idea of "big data" is a measly few million records :) But even so, the regressions and other models we run can exhaust the capabilities of a basic workstation. My MBP with 16GB of RAM will get the job done most of the time, but need to hit the big iron every now and then.
 

Jake Sullivan

New Member
Oct 9, 2015
16
23
3
33
Such a cool project! How is the performance at this point? I have heard ScaleIO does not get really good until the number of disks/ nodes balloons.

Also something I saw:


My math may be poor in late December but I think that might be a typo.

Haha! Yup typo. Fixed it.

Performance has been excellent thus far. From real world spinup and testing of dev machines, pretty much on par with our production side. (the Nimble has been a super solid performer for us so no hate there.) Since this is DR/DEV, if there's any specific benchmarks or numbers that you want to see, just say the word! I can pretty much do whatever I want with it.

It'll be interesting to see how performance scales as we add more disks in. I went a bit overkill with 8 nodes to start, but I didn't really want to mess with the rack in the future if we really needed to scale. Seismic data for us comes in huge waves.
 
  • Like
Reactions: Chuntzu

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Haha! Yup typo. Fixed it.

Performance has been excellent thus far. From real world spinup and testing of dev machines, pretty much on par with our production side. (the Nimble has been a super solid performer for us so no hate there.) Since this is DR/DEV, if there's any specific benchmarks or numbers that you want to see, just say the word! I can pretty much do whatever I want with it.

It'll be interesting to see how performance scales as we add more disks in. I went a bit overkill with 8 nodes to start, but I didn't really want to mess with the rack in the future if we really needed to scale. Seismic data for us comes in huge waves.
I would like to see sequential 1mb reads and writes from a single node as well as random 4k reads and writes from a single node on both the hdd and ssd pools if possible. Just incase this isn't very clear, I want to see what only a single node running an io benchmark would see. I have seen info and pictures of the scaleio dashboard running lots of concurrent io from alot of nodes but not what a single node would see. The reason I ask this is that I know this software scales but can a single node generate let's say 200000 read iops all by itelf. It seems there is a ceiling on how much io one node can supply to the pool (like 200,000 io even though it may be able to produce 4-500,000 io). Please and thank you for offering this up. Working on my own s2d setup right now and then going to load scaleio on the same equipment it would be nice to know what kind of perf to expect.
 

MACscr

Member
May 4, 2011
119
3
18
What was the performance like for your 3 node cluster? I have a small 3 node ceph storage cluster (10G ethernet,2 x 512G OSD's per host, 24G ram, dual 5520 cpus, replica 2) in each server and I was a bit disappointed in the low IOPS it was able to do for both read and write. Just curious how well scaleio compares to ceph on such a small scale.