Storage and Network Upgrade Advise

JustinH · Jan 21, 2015

Hi All,

Long Time Lurker, first time poster. Looking for feedback on a upgrade I'm desperately in need of! Sorry for the long post, I wanted to get the background in place, so at least you have some idea of what my workload consists of.

My Setup currently has 1 x Dell R510, 2 x Dell C6100, and a HP DS2600 DAS hooked upto the R510. Running oVirt KVM primarly for build/continuous integration services. They are all hanging off a Cisco SG200-18 Switch. - Mostly 1 * 1Gbe connection, but a few bonded 2 * 1Gbe connections (i'm pretty much out of ports on the switch when you include the rest of my home network)

I build/maintain linux images for various 3rd parties who ship their solutions out on Kiosks/Embedded Solutions. On a good day, I can have at least 10 builds running, which typically starts very IO intensive writes, then go CPU intensive, and finally, back to IO reads as its all packaged up. A typical build takes around 2 hours. The Builds are mostly automated (when I commit to a repository, the build server picks up the change and off it goes creating the image). for the type of workload a build is - Consider installing a Linux Distro, compiling a bunch of binaries and installing them, and then making a disc image of the distro with the installed software pre-configured. Typical Image sizes around around 600Mb compressed , but some go out to 2-3Gb compressed when its a multimedia kiosk etc.

My storage is a real mess. I have a big mix of drives (sizes, vendors 2.5/3.5, SAS, SATA, SSD, Platters 5.2K RPM through to 15K) etc. Most of it is local to either the nodes or the R510 (in the case of the DS2600)
Then I've got a real mix of drive setups - Some Raid 6 (on the R510), some software mdadm raid 0 (for the moderately critical stuff) and some hardware raid 0. Then a bunch of standalone drives dedicated to single VM images (for IO Ops isolation I'd call it - If one VM is thrashing the Disk while building a image - It doesn't affect other VM's).

I've pretty much hit a limit, where adding more build VM machines just slows down everything else, and obviously my storage is the limiting factor right now. My builds sometimes continue well into the night now and a few times are still going when I get up in the morning.

So I'm planning to replace my existing storage - namely the drives - My thoughts are:
1) Populate the DS2600 with at least 146G, 15K SAS drives. Data on these drives is "throwaway" VM images. - meaning if I loose it due to a HDD failure or whatnot, its not a concern. The Build VM images that are stored here are created from a template when a build is kicked off, and deleted when the build is finished. so at most, i'd just loose running builds.

What I really need here is some decent IO at a moderate price. I know I could go SSD (and will eventually when the invoices get paid!) but I need to do something in the short term.
I can't decide if I should just go for a huge 25 Spindle Array (Raid 0?), or split the drives across 3 arrays, or just go stand alone, and have 2-3 VM's dedicated to a single spindle.

Advise/Thoughts on that point?

2) For Business Data (repositories, built images) etc that does have a business impact to me, I'm planning to utilize the R510 12 bays to storage here. Here I'm currently utilizing about 6TB - I'd like to at least provision 12TB here. I was thinking 12 x 3TB WD Red, but I've seen a lot of talk about rebuild times etc here. I can't really got for ZFS here, so it would be hardware raid (via a H700 raid controller). Performance here is not really a concern - Availability is. This storage domain needs to be mounted by the build VM and where the completed images finish up, so exported via NFS (as it needs to be shared) Any Advise here?

3) for the remainder of the storage on the C6100's (they are the SFF version) - This would basically be my "infrastructure" VM's. Storage requirements here are around a modest 2 -3TB. Performance is not critical, but VM migration and availability is. I thought to setup a few nodes on the C6100 (not all) with 3 x 4TB WD Red in raid 5, and then put GlusterFS on top so I can mirror between nodes (and thus migrate VM's and shutdown nodes at night)

All up I have 71 SFF bays, and 12 LFF bays I can populate over the C6100's, R510, and D2600, but due to power concerns, I'd prefer not to have to use all of them! (Singapore is already hot enough as it is!)

All the Storage in 1 and 2 need to be mounted across VM's/Nodes, so final thing I'm looking at is Network Upgrade to 10GB. I have a opportunity to get a Cisco Nexus 5010 Switch cheaply (around $US500) - But I haven't pulled the trigger yet as I have a few concerns:
1) it seems like there are a bunch of license add-on's to enable features. Anybody have any experience with the Nexus Line? If I'm just planning todo Ethernet Switching and not get fancy with FCoE etc, I should be fine?
2) the Nexus I can get comes without any SFP's. I'd need at least 10 (9 servers/nodes + a uplink link to my SG200 Switch). A quick look shows me the SFP's would cost more than then Switch! Anybody know if there is any lockin with SFP's on this switch. I did read some spec sheet that alluded to 3rd party support, but it wasn't clear.
3) can the standard ports (the Nexus has no addon card installed) do a 1Gb SFP to the SG200? (which has 2 SFP ports, not not 10Gbe) I tried reading up the docs/specs etc, but its still not clear to me
4) Finally, I assume the 10Gb mez cards for the C6100 should be fine right?

Last Question - Backups - I have a little old LTO3 drive that barely keeps up. Should I just consider getting a few of the 6TB drives that are coming out, whack them in a cheap NAS and do disc 2 disk (I'm using LVM snapshots on my main "business" storage pool) or get a Amazon Glacier backup going?

And the obvious requirement - Cost Effective! I'd love to have a enterprise budget, but I need some real bang for the buck here. I'm also constrained by the fact that I live in Singapore, so shipping charges add up real quick...(dont ask how much it cost me to get the C6100's and the D2600 to Singapore from the US!)

I was initially budgeting around $US3K for this, but the 10Gb upgrade and Drives might push that, so I maybe could go a extra 1K if its really a compelling improvement

TuxDude · Jan 21, 2015

For #1 it depends on how much capacity you need. It sounds to me like you don't need a ton there as things are deleted at the end of the job, so only enough capacity as you need for the number of builds that run concurrently. From a $/GB standpoint SSD is expensive, but from a $/IOP point of view it is far cheaper than spinning disk. If you can get sufficient capacity by RAID-0'ing a few SSDs together then I think it would be your best option for #1.

For #2, hardware raid-6 of a bunch of 3TB drives as you suggest is probably fine. If you only need 12TB to start then you only need to purchase 6x 3TB drives to get that capacity and can grow/restripe your array as needed. The restripe jobs will take quite a while but it saves some $ up front.

I don't have much of an opinion on #3 right now, so I'm skipping it.

As to the Nexus, we have a few Nexus 5500's at work. They are ridiculously expensive to license if you need any addon features, ridiculously expensive to buy SFPs for (and yes they do verify serial numbers and lock you into Cisco SFPs), and are also picky about which direct-attach cables they work with. I don't know if they will support 1G speeds on any of their ports. The only advantage to them is that you can connect FEX's to them (Nexus 2K line) - basically like a line-card in a chassis switch, so the ports on the 5K are kind of like your backplane bandwidth, and you have a bunch of FEX's (1G RJ45, 10GbaseT, 10G optical, etc.) and end up with hundreds of ports for end user devices. FEX's are cheap because they are dumb - all traffic flows back up to the parent 5K, even if 2 ports on the same FEX are talking to each other. If you want to be able to grow to lots of ports using FEXs, then go with a Nexus. But otherwise stay away from them.

Patrick · Jan 21, 2015

Just wondering, how much data are you storing and how often does it change? I wonder if there is a way to streamline this significantly and inexpensively.

Mike · Jan 21, 2015

Depending on how big and sequential the compiliations are you could see if a ramdisk could be a solution here as they could speed up compilation a lot while Gluster will slow you down in that area.

JustinH · Jan 22, 2015

TuxDude said:
For #1 it depends on how much capacity you need. It sounds to me like you don't need a ton there as things are deleted at the end of the job, so only enough capacity as you need for the number of builds that run concurrently. From a $/GB standpoint SSD is expensive, but from a $/IOP point of view it is far cheaper than spinning disk. If you can get sufficient capacity by RAID-0'ing a few SSDs together then I think it would be your best option for #1.

Yeah - I had thought about this, but I believe the cost involved here is a bit more than I can budget right now. Let me explain my reasoning.

Sure, SSD can do a ton more IO than platters, but there would still be a upper limit. Considering 60% of the time the VM's are in pretty disc intensive operations (installing images, then compressing it all back up when finished) I'm assuming I could probably only get 3-4 VM's per Raid0 SSD. So I would want a few pools of Raid0 SSD.
1) Trouble lies here in that the way OVirt runs, a Pool of VM's can only be allocated to a single Storage Pool (ie, no dynamic selection where to start the VM's from the pool). Solution - Multiple VM pools - Unfortunately the Build Server isn't that intelligent to pick them up. Its a basic script that monitors the build queue, if more than 2 builds are queued, it makes a rest call to the oVirt Engine to to fire up a new VM from a pool. oVirt goes and does it, and once the VM is online, it registers with the build server, and it assigns the next 2 builds out.
2) Allocate More SSD's to the Raid0 to get more IO. Sure. More $$$

3) Smaller SSD's to keep the cost down -> Each VM is backed by a 50Gb VM. Copy that to the SSD pool, then all the IO, then destroy at the end of the build, I'm sure I'd run them into the write limit.

TuxDude said:
For #2, hardware raid-6 of a bunch of 3TB drives as you suggest is probably fine. If you only need 12TB to start then you only need to purchase 6x 3TB drives to get that capacity and can grow/restripe your array as needed. The restripe jobs will take quite a while but it saves some $ up front.

Yep. I just thought if the cash is still there, try to max out this storage pool. Its basically where my $$$ comes from, so i'm not shy of spending a bit more here.

TuxDude said:
I don't have much of an opinion on #3 right now, so I'm skipping it.

As to the Nexus, we have a few Nexus 5500's at work. They are ridiculously expensive to license if you need any addon features, ridiculously expensive to buy SFPs for (and yes they do verify serial numbers and lock you into Cisco SFPs), and are also picky about which direct-attach cables they work with. I don't know if they will support 1G speeds on any of their ports. The only advantage to them is that you can connect FEX's to them (Nexus 2K line) - basically like a line-card in a chassis switch, so the ports on the 5K are kind of like your backplane bandwidth, and you have a bunch of FEX's (1G RJ45, 10GbaseT, 10G optical, etc.) and end up with hundreds of ports for end user devices. FEX's are cheap because they are dumb - all traffic flows back up to the parent 5K, even if 2 ports on the same FEX are talking to each other. If you want to be able to grow to lots of ports using FEXs, then go with a Nexus. But otherwise stay away from them.

Ok. Good to know. This isn't going to grow to a massive amount of 10Gbe. 10 Ports is all I'll need, maybe if I add another build server in the future, that would jump to 14 (assuming another C6100). I thought it was a good deal, but obviously not cost effective when I look at the Cisco SFP's on ebay.

Thanks for your feedback. Much appricated.

JustinH · Jan 22, 2015

Patrick said:
Just wondering, how much data are you storing and how often does it change? I wonder if there is a way to streamline this significantly and inexpensively.

Storage Pool 1 - This is what I'm planning to use for the Build VM's. Its completely dynamic here, depending upon how many Builds are running, but I typically try to limit it to 12 VM for normal use (that about 3 VM's on a single node in 1 C6100). I only bring online the 2nd C6100 during mass rebuilds for clients - eg, the recent shell-shock bash vulnerability - that was a 3 day build window that turned out almost 400 different images). Eg Build Server VM is a 50Gb pre-allocated image file. (I tried using thin-provision, but from start to finish it would add about 45 minutes to a typical build versus just copying out pre-allocated 50Gb file from the VM Template at the start). As mentioned, here IO is king. There is no long term storage requirement here.

Storage Pool 2 - This is where the final results get copied to (and served up via HTTP or RSYNC to my clients). Depending upon the number of projects/clients I have dictates how much is used, but as mentioned above right now I'm around 6Tb. its growing by maybe 500Mb a month with my current clients, but of course, I'm always hoping business will improve - Hence, my 12Tb overprovision should be good for 12 months?
As far as changes go - Typically each project has a Monthly "Gold Image" - and I'll keep upto 3 months of them online - Older images I burn to BlueRay and store offline. Then depending upon the customer, we have ad-hoc images, that are usually updated with new content/programs etc as and when the customers sends them to me. These images are what get built the most, and usually for QA/Testing etc by the customer, and not actual deployments.

Storage Pool 3 - Basically this is where all the VM servers that drive my business as well as infra live. DNS/DHCP, Database (MySQL and Postgres - but they don't get hit very hard). The Build Server Node, and all the Templates for the Build VM's. its hovers around 2-3 TB depending upon what I'm doing. my only requirement is it would be nice for these servers to migrate between nodes (so I can shutdown the C6100 Nodes, and all the VM's move to the R510 on weekends etc). hence, local storage (so I'm not dependant upon a single server running NFS/iSCSI) but replicated between the R510 and a few C6100 nodes.

Probably to give a bit more context to all of this. I have two main sets of customers:
1) a Large 5* Hotel Chain. The Set top boxes you usually find hooked upto the TV's in the room - I'm handling the images for them. Challenge here is that each hotel has its own branding, and often even different systems providing VOD/RoomService/Guest Enquiry etc. The Larger hotels tried a stock image, and streaming all the customization from server to each set top box but bandwidth was a killer. In this particular case, the onsite IT support are not responsible for the in-room systems (its outsourced, same as the hotel internet access is in 90% of cases), hence we provide them images when they want to update (or run a new promotion) etc with stuff their marketing/PR/IT give us, and usually do a remote wipe and download a new image to the local storage via UBOOT TFTP when guests check out etc. those image sizes are around 4-5Gb usually, and I'm getting 5-6 update requests with new content etc a week. (I don't do the actual work here, its subcontracted to me via a 3rd party, I just build the images)

2) A digital advertising company. Those LCD Monitors you find in office blocks/lifts running the latest movie trailer or advertisement. I'm doing the base images for them. Some have Mobile Connections to get updates, but its slow and expensive, so they go sneakernet around all the offices/shopping centers etc with a bunch of 2.5" HDD and just replace the HDD in the LCD box with a new image. Takes all of about 2-3 minutes to do, and is quicker than downloading via the Mobile Data, or doing some sort of USB HDD copy. These images usually range 30-40Gb size, and I'm doing upwards of 50 images a week here.

I've got a few other customers mainly in the Digital Kiosk/Shopping/Office Directory Display but these are usually one off images with Branding and the actual content is loaded via USB/Text Files. those images I archive off to DVD once I get the greenlight from the customer that everything is ok.

JustinH · Jan 22, 2015

Mike said:
Depending on how big and sequential the compiliations are you could see if a ramdisk could be a solution here as they could speed up compilation a lot while Gluster will slow you down in that area.

Gluster would only be for my "infrastructure" VM's - ie - the magic glue that holds it all together. For the Image and VM Storage Pool its NFS currently, with a posibility of going iSCSI with this upgrade I'm thinking.
Ramdisk is a interesting idea though I haven't thought about, but only my R510 would have enough memory. Some images range 40-50Gb when content is included, and my C6100 Nodes only have 24Gb each.

mrkrad · Jan 22, 2015

why don't you consolidate to one large server with newer cpu's and use local SSD caching to reduce iops to the external JBOD? LSI/Adaptec with "cachecade" dealio!

JustinH · Jan 22, 2015

I outgrew my R510 about 9 months ago, and added the C6100's. CPU doesn't really seem the bottleneck so I don't think that would help much. Cachecade is a good idea though. I even read recently about bcache - a Linux software only cache solution that works with SSDs. If I could get that to work with the oVirt, it might be a winner.

Any idea if any of the LSI/Adaptec cards with cachecade could fit in a C6100 Node? It's pretty tight in there.

Search

Storage and Network Upgrade Advise

JustinH

Active Member

TuxDude

Well-Known Member

Patrick

Administrator

Mike

Member

JustinH

Active Member

JustinH

Active Member

JustinH

Active Member

mrkrad

Well-Known Member

JustinH

Active Member