Help me with a high compute/high mem build

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

dataoscar

Member
Dec 2, 2013
68
10
8
Hello,

So the wife is wanting to try some of her work analytics software at home. To do so she needs a cluster that is fast and with enough memory for the processing. The cluster will be running Hadoop, Pig, and some other software as well.

Ideally we want a 3 vm cluster with each vm having at least 4 cpu's and 15GB of ram. I have looked at both AMD and Intel options and so far these are my observations:

Ebay has the L5639 and L5640 at great prices. These support HT which puts each proc at "12" cores. I could get a dual cpu mobo and expand later. The Supermicro motherboards I have seen support lots of RAM as well. Besides Supermicro, there seems to be other 1366 motherboards that are capable of running those processors, which makes this a cost effective solution.

As for AMD, the 12 core Opterons seem to be pricier and so are the motherboards. This does not seem like a cost effective alternative but I may be wrong.

The cluster will NOT always be running, higher power consumption could be ok. I would prefer a tower instead of a 1u or 2u rack, but I am open to ideas.

Thanks for the help/feedback.
 

Jeggs101

Well-Known Member
Dec 29, 2010
1,529
241
63
You might think I'm crazy here, but if you needed 4C/ 15GB - you could buy a E5 v2 or v3 12 or 14 core single CPU with HT, get 8x 8GB or 4x 16GB and just have it all in one box. You'd prob get better performance not having to hit the network.
 

dataoscar

Member
Dec 2, 2013
68
10
8
What sort of analytics? Can you share the total data size, the kind of computing that will be done, etc?
It will be using a columnar database that runs on top of Hadoop. The size of the dataset is about 2 billion records ( ~ 2 TB ). Typically her use cases deal with aggregation type queries.

You might think I'm crazy here, but if you needed 4C/ 15GB - you could buy a E5 v2 or v3 12 or 14 core single CPU with HT, get 8x 8GB or 4x 16GB and just have it all in one box. You'd prob get better performance not having to hit the network.
I was definitely thinking the same thing. I would love to be able to do just 1 server and possibly have 1 or 2 cpu's with many cores. I thought about having many NUC's, but that does not seem like a cost effective solution. The reason I brought up the Nahalem based Intels is because they seem to be the best bang for the buck at the moment.
 

dba

Moderator
Feb 20, 2012
1,477
184
63
San Francisco Bay Area, California, USA
It will be using a columnar database that runs on top of Hadoop. The size of the dataset is about 2 billion records ( ~ 2 TB ). Typically her use cases deal with aggregation type queries.



I was definitely thinking the same thing. I would love to be able to do just 1 server and possibly have 1 or 2 cpu's with many cores. I thought about having many NUC's, but that does not seem like a cost effective solution. The reason I brought up the Nahalem based Intels is because they seem to be the best bang for the buck at the moment.
With 2TB and 2 billion records, you would very likely get the best performance from a single server with a nice Xeon E5 CPU or two plus a big pile of SSD drives. In fact, it seems very likely that disk IO will be your limiting factor period. I'd first figure out how much SSD disk IO I can afford, and then decide how much money I had left over to buy a server.

For comparison, I have a quad Xeon E5 database server that could perform a basic aggregation query, not using indexes, on every row and column of your 2TB worth of data in around 90 seconds, or perform a single-column aggregation on all 2 billion rows in around two seconds. This server uses 80 SSD drives. Take the exact same server and the exact same data, but swap the SSD drives for say five 1TB non-SSD SAS drives in RAID5 and the queries would take at least 100 times longer.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,520
5,828
113
For comparison, I have a quad Xeon E5 database server... This server uses 80 SSD drives.
I saw my lights dim the other night. I think it was running a bit query and it impacted the PG&E grid.

Seriously though, when I read this I was thinking disk + network I/O would be limiting factors. If you can fit it into one system that will give you lower latency/ higher bandwidth which will help quite a bit. 2TB (even doubling for growth) is fairly easily doable in a single machine. Lower complexity is good.
 

dataoscar

Member
Dec 2, 2013
68
10
8
Thanks all for the feedback. I will post the new build if I am luck enough to get the chance to do it.