4-node Open Compute Cluster Build

polyfractal · Apr 17, 2016

First post here, but long time lurker. This entire build was motivated by the excellent Open Compute guide provided by NorCalm over in deals.

Fair warning: I'm a developer, not an ops guy, so apologies for improper terminology and a slightly janky setup

Background

My primary development machine for the last few years has been a Macbook Air, which is decidedly underpowered. It was fine when I first joined the company, but as I've moved onto new projects it struggles with our builds/tests/benchmarks. We make a clustered search/analytics engine, so a lot of our tests involve spinning up multiple nodes and can be pretty resource intensive. To date I've been renting a beefy server at Hetzner for these kinds of tasks.

I was recently given budget to update my setup. Most people get a Macbook Pro or build a desktop. I opted for a third option: build a mini cluster based on E5-2670's.

The Build

Given how cheap E5-2670's are, the challenge was finding cheap motherboards. I opted for the Open Compute route, and essentially followed this STH guide. The chassis + 2x motherboard + 4x heatsinks + power supply all together together was cheaper than most LGA2011 motherboards alone, which made it a hard deal to pass up.

Final price came out to $504 per node ($469 without shipping).

Since I don't really have any other home infra other than a FreeNAS tower, I wasn't worried about the non-standard rack size. I'll probably build a true homelab some day, but my current living circumstances (renting, moving soonish) prevent me from expanding. I'll deal with other rack sizes later.

I'm currently using one node as a desktop machine, and leave the other three powered off unless I need them. The BIOS claims to support suspend-to-RAM, but I'm still trying to get that to work. For now they are either physically off, or suspended to disk. Also working to get SOL and WOL working so that cluster booting is a bit more automated.

Node specs:

2x Xeon E5-2670
96gb DDR3 1333MHZ ECC RAM (12 of the 16 slots used)
1TB 7200rpm Seagate HDD
Ubuntu 14.04.4 LTS server / desktop

The "desktop" node also has a HD6350 video card and a PCI usb expander. I purchased an SSD for it as well, but need to sort out how to mount it (the rack only has two 3.5" caddies)

Cluster stats:

64 cores, 128 with hyperthreading
384gb RAM
4TB hdd

Practical considerations

Temperatures idle around 32-35C. At 100% utilization, the temp cranks up to ~93C before the fans kick into high gear, then settle down to ~80C.
Noise is a pleasant 35dB idle, which honestly is quieter than the Thinkserver on the floor. At 100% utilization the fans go into "angry bee" mode and noise rises to around 56dB. It's loud, but not unbearable, and I'll only be using the full resources for long tests, likely overnight. While idling, the 60x60x25 fan in the PSU is far louder than the node fans, and gives off a bit of a high-pitch whine due to thinner size.
Unsure of power consumption, I don't own a kill-a-watt (yet). The spec sheet claims 90-300W per node depending on activity. So probably around 360-1200W when the full cluster is running, plus some overhead of the step-up transformer
As someone who has never worked with enterprise equipment before...these units were a pleasure to assemble! Only tool required was a screwdriver to install the heatsinks, everything else is finger-accessible. Boards snap into place with a latch, baffles lock onto the sushi-boat, PSU pops out by pulling a tab, hard drive caddies clip into place. Just a really pleasant experience, I kept saying "oh, that's a nice feature" while assembling the thing.
The only confusing aspect was the boot process and what the lights mean. As NorCalm discovered, blue == power, followed after a few seconds by yellow which equals HDD activity. If you boot and it stays blue, something is wrong. In my case it was a few sticks of bad RAM.
Nodes are configured to boot in staggered sequence. The PSU turns on, fans slow down, first node powers on, about 30 seconds later the second node powers on.
There are two buttons on the board: red (power), grey (reset). I think... it's confusing because the boards like to reboot themselves if they lose power, even when you manually switch them off. The power-on preference can be changed in BIOS, default is "last state". I haven't played with it yet

Photos!

The components all laid out.

PCI USB hub - Radeon HD6350
x4 Seagate 1TB 7200 RPM hdd (ST1000DM003)
TP-LINK TL-SG108 8-Port unmanaged GigE switch
SanDisk Ultra II 120GB SSD
x50 Kingston 8GB 2Rx4 PC3L-10600R
x8 Xeon E5-2670
ELC T-2000 2000-Watt Voltage Converter Transformer

One of the OCP nodes, unpacked before installation of components.

The final setup, including the second node. Setup is temporary for now, until I can build a rack.

The first node sits on three strips of neoprene rubber. The second node sits on top of the first, also separated by a layer of neoprene. The top is covered by some cardboard (temporary). It isn't needed for airflow, since the nodes have plastic baffles... It's just for my peace of mind, so I don't accidentally drop/spill something.

128 threads burning power

More photos of the build:

gigatexal · Apr 17, 2016

Whoa hmm. So it's quiet even on the desktop? That's insane. Welcome to the forums!

polyfractal · Apr 17, 2016

Thanks! Yeah, I was pretty surprised how quiet it is idling. It's loud when at 100% for sure (~56dB), but transient loads or partial loading only some of the cores remains pretty quiet, as the fans don't really spin up until it hits 90C.

gigatexal · Apr 17, 2016

out of curiosity could you fire up a make -j 32 or 64 of the linux kernel and see if the fans spin up? if I wasn't going to be using a full height, double wide gpu for cuda I'd definitely go this route.

polyfractal · Apr 17, 2016

Just checked, no fan spinup. -j 32 peaked around 50C, -j 64 peaked around 70C (and ran right after the 32, so perhaps influenced by that). Either way the fans held steady at their "low" mode

gigatexal · Apr 17, 2016

I need to take another look at that thread. I want that for my own rig

nickscott18 · Apr 18, 2016

Looking at those two cases stacked like that seems like it would make the ideal set-up for a tower sized, lab in a box. With it built into a wooden tower case (providing those boxes don't mind being run on their side), it would make the ideal home lab for those without a rack. Now, how much to ship a couple of those boxes to the bottom of the world . . .

polyfractal · Apr 18, 2016

nickscott18 said:
Looking at those two cases stacked like that seems like it would make the ideal set-up for a tower sized, lab in a box. With it built into a wooden tower case (providing those boxes don't mind being run on their side), it would make the ideal home lab for those without a rack. Now, how much to ship a couple of those boxes to the bottom of the world . . .

I was actually considering exactly this, since it'd be more convenient to place these on the floor next to my Thinkserver. All the components seem to lock down tight, the motherboard trays don't have any slack when latched, and the HDDs seem secure too.

I was just afraid to mount them on their side because....could you? But it seems like a good idea...

TLN · Apr 18, 2016

Wonder if you can measure power consumption, while idle and under load.
ps. subscribed, really wanna see two racks in "tower case"

nickscott18 · Apr 18, 2016

polyfractal said:
I was actually considering exactly this, since it'd be more convenient to place these on the floor next to my Thinkserver. All the components seem to lock down tight, the motherboard trays don't have any slack when latched, and the HDDs seem secure too.

I was just afraid to mount them on their side because....could you? But it seems like a good idea...

I suppose the only thing that might be affected by running it on it's side is how the cooling would work. But I'm inclined to think because it's got ducts, and plenty of airflow running through it, that might not be an issue.

Also - picture 3 - would it seem like the main 4 fans could be swapped out with 120mm units to reduce fan noise (ie, work across both nodes - may require some alterations to the chassis. And with the power supply fans, a similar thing could possibly be achieved with some duicting

DonJon · Apr 18, 2016

Could you please tell me the power up sequence. This is what happens with my unit. I'm wondering if it is defective.

Background on my setup
1. I got the same ELC transformer to power this server.
2. I installed 4 E5-2660 on them. Installed the heat sinks.
3. Just to start, I added 4 sticks (8GB each 1600Mhz RDIMM ECC), 2 on each node. One on A0 and one on B0 slots of each node. I only had 4 in hand to test these the rest of the modules are in order. All those 4 sticks were pulled from another working server, so they should be working no doubt about that.
4. Added a Asus Radeon 5450 video card to one node. It has VGA, HDMI and DVI output. Connected VGA and HDMI to two outputs. Normally the VGA would work by default, I have seen it on other systems.
5. Connected USB keyboard to the external port on the same node where video card is installed.

Powered it up....
a. Green LEDs blink a few times on the power supply. It was very quick and then changes to Yellow. The bottom LED near the hard drive cage also lit up Yellow.
b. On one node, Blue LED flashes and goes off.
c. On the other node, Blue LED is solidly lit. But no power to USB. No Video output.
d. After about 4 minutes, Fans start spinning on this node.
e. No fans spinning on the other node or any LEDs lit up.

What is the sequence you see on your units? I really suspect if the LED on the power supply should be lit GREEN instead of YELLOW on mine.

And I have no idea why one of the nodes doesn't respond with any indication.

Is this server defective by any chance?

polyfractal · Apr 19, 2016

TLN said:
Wonder if you can measure power consumption, while idle and under load.
ps. subscribed, really wanna see two racks in "tower case"

My kill-a-watt arrived today, here are my numbers. One PSU, one node:

idle: 85W / 1amp
1 core: 126W
2 core: 156W
4 core: 177W
8 core: 218W
16 core: 292W

All four nodes running:

idle: 232W
64 cores: 1165W
64 cores and all fans at max: 1260W / 11 amps

imchipz · Apr 20, 2016

Wondering how does it pull 1260 W , isn't the PSU 700W rated?

Also thanks to (what I assume is you guys

) The ebay price shot from 220 to 279! I was watching it

Mkvarner · Apr 20, 2016

imchipz said:
Wondering how does it pull 1260 W , isn't the PSU 700W rated?

2 systems with 2x nodes and 1x 700W PSU each. In total 4 nodes and 2 PSUs

snake eyes · Jul 1, 2016

Hi guys, I'm really worried about the temperature of my server. Loading 100% the temp is 85 C, which is in my opinion, incredibly hot! Please tell me, if this is a normal temp for this kind of server?

RobertFontaine · Jul 1, 2016

That's certainly a nice compact rig.

polyfractal · Jul 1, 2016

ivan said:
Hi guys, I'm really worried about the temperatures. Loading 100% the temp is 85 C, is hot! Please tell me, is normal temp for this server?

It seems to be normal for my four nodes at least. I've been using them pretty regularly since this post (one all the time as a desktop, the others sporadically as needed).

Under full, 100% load, the temperature does indeed hover around 80-85C. It appears to be on purpose, as the fans are only running at maybe 70% capacity. The behavior I always see is:

1. Load pegs at 100%
2. Temperature creeps up to 80-85, fans rev to around 40-50%.
3. Temp keeps going until about 90, fans hit 100%
4. Temp quickly drops to 70-85 range, fan returns to 70%
5. Temp basically stays consistent, same with fan

So it seems the board is happy to be in this range, since there is spare cooling capacity not being used.

Also note, under "real 100% load" situations, temps are a lot lower. E.g. if a database is chewing 90% cpu and hitting some disk IO, temps are a lot lower since the pulses of inactivity allow it to cool down quickly, despite being very short.

Rus Taliss · Sep 14, 2017

polyfractal said:
It seems to be normal for my four nodes at least. I've been using them pretty regularly since this post (one all the time as a desktop, the others sporadically as needed)...

Under 100% load, temperature does indeed hover around 80-85C... u said above temp 93degC b4 fans .. Great article can we advance this to a small cluster benchmark in a system as i earlier posted in conversations .. dont kno how to add peop .. in fact cant even start a thread .. how opaque is blog
if u have the time do u need IB switches and 1U servers to act as the PF_ring front end ? happy to send them to u (if u own yor own home) no renters.. pls

One saving is compact size for 2 nodes , and the mezzanine connector .. does it fit the 341X3 NIC ?
that i have been buying up ?

PolyFrACTAL .. was about to plunge in .. with a rack order ..one caution hit me .. (the ludicrous temp peak of 93deg C .. before fans kick to high speed .. i dont want thermal fatigue ruining my 2670 's ..and it will .. eventually. Where is the BIOS in all this .. cant change fanspeed / or kick in temp ? As i really need a cluster not a boiler , i bought 2 Foxconn 2650 's .. with the copper on those babies, i doubt temps higher than 60C flat out .. waiting for the null modem cable delivery.. even wide geo 32nm foundry develops fatigue .. Once the price comes down

Search

4-node Open Compute Cluster Build

polyfractal

Member

gigatexal

I'm here to learn

polyfractal

Member

gigatexal

I'm here to learn

polyfractal

Member

gigatexal

I'm here to learn

nickscott18

Member

polyfractal

Member

TLN

Active Member

nickscott18

Member

DonJon

Member

polyfractal

Member

imchipz

Member

Mkvarner

Member

snake eyes

New Member

RobertFontaine

Active Member

polyfractal

Member

Rus Taliss

New Member