If you've read some of my threads you've probably noticed that the software we create and use runs on baremetal (or virtual machines for that matter) and we've never been concerned about idle processing time/power because when we're processing data (which could be 24/7 all year) we run near capacity.
However, we're now looking at expanding the processing software into 3-5 other things as well, and now that the virtualization bug has bitten, and I've played around with it the last few months learning more I'm wondering what you guys think of something like this.
Servers/VMs/#s = estimates
# of Baremetal servers: 30
# of servers running a hyper-visor: 6 (ESXI, XEN, ETC)
If we have say 10TB of data to analyze we load up the queue with our jobs, let the BM & VM 'servers' take the jobs, process, then move on... works great, been doing this for years locally and with RackSpace and DO "clouds" (VMs).
It's rather easy to do this with 1 'worker job' possible for load balance / queue management, but when we want to start integrating other software that's unrelated all we care about is if it's in use, or not, and if not we can use it.
This is rather simple to do with APIs at RS, DO, Linode, and eventually VMWARE (I hope, reading up more on this)... we just roll out the VM from image that is the worker, and off we go... However, we're not going to manage and run hypervisors on every baremetal server.
So, that leaves me with thinking about using Docker or ??? and a load balancer.
As you can probably tell the virtualization, docker, etc is still new to me But I see it def. as the future for eating up as much resources as I can and saving money.
Opinions, ideas, thoughts, ideas?
All of the "workers" analyze some sort of data, be it raw text, html, javascript, image, etc and then spits out the output to the storage server, AWS, local storage then zip file, etc...
However, we're now looking at expanding the processing software into 3-5 other things as well, and now that the virtualization bug has bitten, and I've played around with it the last few months learning more I'm wondering what you guys think of something like this.
Servers/VMs/#s = estimates
# of Baremetal servers: 30
# of servers running a hyper-visor: 6 (ESXI, XEN, ETC)
If we have say 10TB of data to analyze we load up the queue with our jobs, let the BM & VM 'servers' take the jobs, process, then move on... works great, been doing this for years locally and with RackSpace and DO "clouds" (VMs).
It's rather easy to do this with 1 'worker job' possible for load balance / queue management, but when we want to start integrating other software that's unrelated all we care about is if it's in use, or not, and if not we can use it.
This is rather simple to do with APIs at RS, DO, Linode, and eventually VMWARE (I hope, reading up more on this)... we just roll out the VM from image that is the worker, and off we go... However, we're not going to manage and run hypervisors on every baremetal server.
So, that leaves me with thinking about using Docker or ??? and a load balancer.
As you can probably tell the virtualization, docker, etc is still new to me But I see it def. as the future for eating up as much resources as I can and saving money.
Opinions, ideas, thoughts, ideas?
All of the "workers" analyze some sort of data, be it raw text, html, javascript, image, etc and then spits out the output to the storage server, AWS, local storage then zip file, etc...