Dell C6100, Hadoop, Infiniband - A New project log.

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
Overview

I have been asked to work out a Dell C6100 based Linux / Hadoop based solution.

This is intended as a record of the journey.

Hardware;
  • 1x Dell C6100 (4 node TY3 variant)
  • 2x L5520 (per node)
  • 24GB ram (per node)
  • 1x QDR Infiniband Mez card
  • 1x QDR Infiniband Switch
  • 1x LSI 9200-8i SAS controller
  • 1x Dell MD1000 DAS
  • 6 x 3TB Seagate Constellation ES 7.2k ML SATA drives (for testing Mirrored, RaidZ, RaidZ2 setups)
  • SSD (as yet not confirmed).

Software;
  • CentOS (with ZFS & Infiniband).
  • Hadoop (add-ons to be confirmed).


Configuration of the Dell 4 node C6100
  • Node 1: SAN – CentOS zfs
  • Node 2: Hadoop Name node / Jobtracker
  • Nodes 3 & 4: Hadoop Data nodes / Task Trackers.

 
Last edited:

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
Tasks.

Outstanding;
  1. Setup test vlan:
  2. Setup test lan DHCP server:
  3. BIOS / BMC flash on all 4 nodes: 3 nodes completed, one node possibly bricked by previous owner. (Investigating fix).
  4. CentOS install on SAN node: Awaiting hard drives (ETA 1-2 days for delivery).
  5. Infiniband install on all 4 nodes (Hardware): Awaiting Infiniband equipment (ETA 7 days delivery).
  6. Infiniband install on SAN node (Software / drivers):
  7. Configure DAS / Drives: Awaiting SAS card (ETA 7 days delivery).
  8. Setup SRP / iSER targets (SAN):
  9. Build base PXE images (CentOS 6.4):
  10. Add Infiniband support to PXE images:
  11. Install Hadoop name server / job tracker (node 2 image):
  12. Install Hadoop data server / task tracker (node 3 & 4 images):
  13. Install PXE server on SAN:
  14. Configure nodes 2-4 for PXE boot:
  15. Boot and verify nodes 2-4:


Completed;


New tasks to be added as they come to light.
 
Last edited:

nry

Active Member
Feb 22, 2013
312
61
28
Subscribed!

Not heard of Hadoop before, may have to have a little read into that
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
Subscribed!

Not heard of Hadoop before, may have to have a little read into that
In the Silicon Valley VC/ startup sphere Hadoop and similar technologies are what a lot of companies are being built on right now. If this becomes a big trend on the site can always make its own sub-forum.
 

gtallan

New Member
Apr 25, 2013
12
0
1
Minnesota
I wonder how a C6100 will work out as a ZFS node - it doesn't seem like a single node has enough drives to take full advantage of it (unless using external drive enclosure?). Of course still great as a testing environment...

We have been setting up some C6100s with hadoop to supplement an existing cluster of R410s. We mainly use only the HDFS component of hadoop though, distributing the computational jobs with condor. Sounds a bit like you are doing the opposite, so I probably don't have much useful to say, but I'll be interested to watch your updates!I expect the C6100s should behave very similar to the R410s. We are using regular desktop drives for the HDFS storage (WD Reds are the current purchase) and they are probably the biggest bottleneck - I don't see enough network utilization to feel strongly about going beyond 1Gbps for the interconnect (even so, I guess there may still be a latency penalty too). Based on that we just put our money into doubling the C6100 node count rather than going for any exotic interconnect. But our computational work is trivially parallel - no real i/o other than the HDFS (and some NFS) traffic.
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
I wonder how a C6100 will work out as a ZFS node - it doesn't seem like a single node has enough drives to take full advantage of it (unless using external drive enclosure?). Of course still great as a testing environment...
I am currently looking at a couple of possibilities. Something like a M1015 (9220-8i) and wiring the 12 bays internally which will be neater but not really what I really want or using something like a 9202-16e (difficult to source as it is OEM only) or maybe the 9200-8e for connectivity to a 16 drive DAS box with expander (HDD) or 8 drive without expander (SDD).

I am hoping the Hadoop nodes can be diskless and that QDR will offset any issues with a fast enough storage backend but have to keep in mind that this is meant to be more of an entry level (all be it with Infiniband) rather than a top of the line system.

This project is being sponsored by a partner (hence the QDR stuff) but is meant to be sourced from reconditioned rather than new items.

I am currently awaiting the other parts.

Current tasks;
Sort out ZFS on CentOS. Hopefully not so different from on Solaris.
Sort out Infiniband on CentOS. Hopefully easier as the OS is all CentOS rather than a mixture like on my own private setup.
Sort out PXE boot for the nodes.
Install Hadoop and get the nodes talking via Infiniband (SRP most probably but maybe iSER if supported).

My knowledge of Hadoop is slowly getting up to speed but my initial thoughts are SRP / iSER PXE boot and storage for the Hadoop nodes (as previously mentioned). MPI over Infiniband for node communication (two switches, and the Mez cards have dual ports so may be able to run two Infiniband networks, one for each).

As for DBAs very valid question, the answer is err.....

This is a bit of a sticking point for me as the partner has not been able to provide any specifics so it will be difficult to tune the system to any specific use at this point.

At this stage I think we are looking to get all the part working and then start working on fine tuning the parts (SAN for disks, cache etc, nodes for storage, ram and processing power etc).

Of course, first I have to update he BMC / Bios on the C6100 which looks like it may be problematic (incompatible version reported by my partners tech team). I have been reading the excellent info on the "Anyone bricked their C6100 yet" thread though and should be ready to try over the weekend.

RB
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
Currently investigating if there is a possibility of booting the Hadoop nodes from SRP or iSER targets.

iSER is reported to be available in the new OFED 3 package out soon so it may not be a possibility at this time.

The SAS controllers will be LSI 9200-8is due to availability and the fact they were going for US$180 each on EBay yesterday (all gone now).

Looks like I will also have a Dell MD1000 DAS to play with as well. Hopefully it will have dual EMMs so I can connect with both 9200-8i ports and run the 15 disks at full speed.

RB
 

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
I played around trying to get PXE to work so that I could boot nodes diskless. Gave up (mostly 'cuz I was too lazy). Ended up using a card that mounts a small mSATA drive to the PCIe slot. It works really well.
 

nry

Active Member
Feb 22, 2013
312
61
28
I managed to get PXE booting running in about an hour from iSCSI targets.

That was done by chain loading iPXE then using that to initiate the iSCSI targets.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
I managed to get PXE booting running in about an hour from iSCSI targets.

That was done by chain loading iPXE then using that to initiate the iSCSI targets.
Main site guide time! This is on my list of articles I always want to experiment with and never get around to doing/ writing up.
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
I managed to get PXE booting running in about an hour from iSCSI targets.

That was done by chain loading iPXE then using that to initiate the iSCSI targets.
There are a couple of places with instructions on booting a PXE image with Infiniband support in the image but not booting the actual image over PXE.

At worst I will be looking at PXE over Ethernet and then mounting SRP / iSER targets.

Guess this brings me to another potential issue... volume sizing for various mounts (/var/log, /tmp, /home etc) on a Hadoop node.
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
Well things have been moving on a little hardware wise but not so much on the software side.

I now have;
4x C6100s (2 currently only spares)
1x MD1000 (15bay DAS)
A selection of Dell H200 and HP 9200-8E SAS cards
33x 3TB Enterprise SATA drives (reconditioned HP)
A HP Procurve 3500yl-48G switch
A Cisco 2801 router.
A Voltaire 4036 Infiniband QDR 36 port switch.

Two of the C6100s are fully populated with Infiniband QDR cards.

I currently have two sets of plans.

One node SAN



One C6100 Lustre cluster (plus 2 C6100 Hadoop servers - 6 task nodes + job tracker and management node).



Biggest issue I have been having so far is CentOS not booting in to the latest kernel sometimes. I have installed CentOS (6.0 minimal), yum updated and rebooted and it just sits there after the grub boot menu with just the cursor in the top right of the screen.

This has been going on all week with multiple installs in different ways on different disks and hardware configs for the C6100 node until today I discovered it would boot to the old kernel.

I then tried the OS drive in a different node (without the SAS card) and the new kernel booted fine. I then added the SAS card and again it was fine. I then went back to the original node and it booted the latest kernel there as well.

This really has my head banging against the table trying to find out what the issue is. My own install of CentOS on my own C6100 never had any issues although I have read a number of people have been seeing issues with 6.4.

Tomorrow I will put ZFS back on and install the mellanox OFED packages.

I need to find a way to access the Voltaire CLI. I have a USB to serial cable that the server can see and I have configured the connection in Minicom with the settings in the user manual but to no avail. It is possible that they have wired the serial plug in a funky way so as to ensure sales of their management cable but I cannot yet confirm. The ethernet management connection seems not to be registering with my DHCP server so I am guessing the previous owner set a static IP address.

RB
 

elements

New Member
May 22, 2013
2
0
0
Make sure you have a null modem cable and the baudrate might have been changed from the default 38400.
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
I will see if I can find a null modem cable around and give that ago at some point, thanks.

I have 4 nodes now with CentOS 6.4 installed and the Mellanox packages setup. Checking the fabric with ibnetdiscover reports all nodes although the pingpong commands are not working (cannot find node name).

A couple of issues have again cropped up...
  • ZFS on linux has not implemented iSCSI targets which makes it tricky to share out the ZFS volumes
  • The SCST rpm build instructions I am following are for the previous rather than current version of SCST so there are some tweeks that I am working through to get it running.
  • Lustres next version should be supporting zfs backend filesystems but it is not out yet although it is being tested by the Lawrence Livermore National Laboratorys Sequoia facility on a 55PB configuration.
  • One of the nodes on the C6100 has had its password for IPMI root changed and I have not yet found a way to reset it to default (CMOS reset jumper, password reset jumper and CMOS battery pull do not seem to have worked).
  • Second C6100 has 3 nodes flashing orange when they are powered on.... more hardware troubleshooting before I can actually carry on with the build.

I have had a couple of Dell MD1000s arrive for my own home setup so I am working on getting those up and running now but will return to this shortly.

RB
 
Last edited:

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
Current Hadoop build hardware



Top to bottom;
TP-Link router (to manage DHCP allocation and keep this on a separate subnet).
Voltaire 4036 QDR Infiniband switch
HP Procurve 3500yl-48G
Dell MD1000 DAS (populated with 3TB Enterprise SATA drives).
Two Dell C1000 XS23-TY3 servers.

The MD1000 will only allow up to 2TB as it stands (connected to a Dell H200 SAS controller) but I have another DAS box coming my way, hopefully, that will sort that issue out.

RB
 

shindo

New Member
May 7, 2013
26
0
1
WI, USA
RimBlock,

Have you gotten your Infiniband/ZFS squared away on CentOS? If so, do you have a writeup? I'm attempting to get them running as well, but there seems to be a lot of conflicting information out there and I keep hitting roadblocks. I, too, am running a C6100 with the mezzanine IB cards.

Thanks,
Shindo