Stepping up in the world - Upgrading the rack.

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
Current setup.

The rack


ML110 G7 with USB backup drive. The USB stick have the latest HP SPP and Win 2011 SBS Standard installs on them.


Monitor, Keyboard and a 4 way KVM and a 2 way KVM (need to change cables between the KVMs currently). The 4 way is for the top C6100. The cables to the left are the Infiniband 8mtr cables (5 of them).


My Norco DAS and 2x Dell C6100s


DDR Infiniband switch.
HP 1810-24G GbE switch (red cable is the internet feed, coloured cables are for the first C6100, grey for the MioTV IPTV feed, black cables for everything else).
Patch panel to the rest of the apartments rooms.
Bottom left is the fibre box (internet / IPTV / telephone feed).
Right bottom up - TPlink 8 port switch.
Right bottom down - Service providers ADSL modem.
Below that are my tools (hammers, saws, drills etc).


Note the liberal use of green gardeners Velcro (much cheaper than the black clothing Velcro where I am).

RB
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,519
5,826
113
Great pictures! Are those infiniband cables next to the monitor because you got longer ones inexpensively?
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
They sure are.

Big chunky heavy cables. Luckily I had the space to 'hang' them.

The rack is on wheels and I can still just about move it out and get around the back or to the sides. I could do with a longer fibre cable (bottom white box -> thin yellow cable to its left) as it is fairly tight and part of the reason the switches are in the middle rather than at the top of the rack. The other reason is to have the ML110 G7 next to the cabinet top fans to try and get the heat away from it as quickly as possible. The fans ramp up at 32Deg C and the ambient temp here is around 32Deg C a lot of the time. It does not quite work but it was worth a try :D.

The Norco has two drive sleds missing. One was crumpled :eek:, not sure where the other has gone :confused:. Kids I suspect.

RB
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
Thanks for the great deal. It seems to be working very well give or take my failed attempts to get SRP running.

How / what are you connecting together with your Infiniband network (SRP / Solaris ??).
 

dba

Moderator
Feb 20, 2012
1,477
184
63
San Francisco Bay Area, California, USA
Right now, my IB-connected nodes are all running Windows - and working well - with a Mellanox QDR switch. I do have plans on Solaris, but using the Oracle pile of protocols - iDB/RDSv3/ZDP.

Thanks for the great deal. It seems to be working very well give or take my failed attempts to get SRP running.

How / what are you connecting together with your Infiniband network (SRP / Solaris ??).
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
It seems my flint flashed A1 revision infiniband cards are now reporting possible incorrectly set mkeys. I am investigating. My rev A2 cards are working fine without any flashing. I got 4 cards and it turns out two were Rev A1 and two were Rev A2 :(.

RB
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
So after a bit of testing, this is what I have found.

The cards are arranged like so (ports are the switch port numbers).
Port 1 - Solaris 11.1 - ConnectX-2 (SAN)
Port 2 - ESXi 5.1: ConnectX A1 (flashed)
Port 3 - ESXi 5.1: ConnectX A2 (unflashed)
Port 4 - CentOS 6.4 (OpenSM): ConnectX A2 (unflashed)
Port 5 - ESXi 5.1: ConnectX A1 (flashed)

The server on port 2 could see the targets presented by the server on port 1 but after changing the card to an A1 (suspect it was previously an A2) it cannot.

The server on port 5 has never been able to see the targets made available by the server on port 1. It is also the server where bare metal Windows was installed and I could not get the Windows SRP target working.

On investigation I happened to look at the OpenSM log files and it was reporting IB_Timeouts on port 2 (server on port 5 was turned off). The error stated there could be an issue with the mkeys on the HCA.

sm_mad_ctrl_send_err_cb: ERR 3120 Timeout while getting attribute 0x15 (PortInfo); Possible mis-set mkey?
I then changed the cards around.
Port 1 - Solaris 11.1 - ConnectX-2 (SAN)
Port 2 - ESXi 5.1: ConnectX A2 (unflashed)
Port 3 - ESXi 5.1: none
Port 4 - CentOS 6.4 (OpenSM): ConnectX A1 (flashed)
Port 5 - ESXi 5.1: ConnectX A2 (unflashed)

Now both the servers on ports 2 and 5 can see the targets prosented by the server on port 1 and there are no errors in the OpenSM logs. The fact it is running on an A1 card seems to make no difference. I suspect that if I tried to mount the targets it may well fail though.

The lesson is to make sure you get A2 or newer version cards of the MHGH28-XTC cards.

Interestingly, port 5 servers ESXi install did not see the ver A2 card after swapping out the A1 card (which it did see) and I had to remove and re-install the Mellanox vib for it to appear which is a bit of a pain. There may be an easier way to get it to 'refresh' but this seemed a fairly good bet to work so went that way.

The target appeared in the Port 5 server and more surprisingly, the datastore also appeared without needing to import it (it is also mounted on the port 2 server). Now I should have the ability to start VMs stored on that datastore on two servers (haven't tried it yet though ;) and trying it at the same time would probably be a bad thing :D).

I also found that the ver A1 cards will not work with ESXi 5.1 passthrough even after applying the latest patch to ESXi. After doing all the passthrough, reboot, assigning to VM stuff, on booting the VM it errors and wont start. No PSOD thankfully but still not usable this way. I have not tried A2 cards but expect the same issue.

I would need to reinstall Windows SBS 2011 Ess bare metal to the server in order to try out the Windows SRP function. After so many reinstalls of Windows servers and domain leaving and rejoining I have inflicted on my Wife and kids I am fairly loath to have to go through it all again.

I may pick up another ConnectX-2 card or two as the Dell C6100 mez cards are going for around US$150 each now which is a steal as standard PCIe ConnectX-2 cards are around US$250+ on the second hand market. Cables could double the price though :(.

Update: Ok well I have just nabbed a Mellanox MHQH29C-XTR ConnectX-2 for US$189 so when it arrives I think I will put that in the ML110 G7 (port 5) and see if I can get Windows Server to work with SRP. I did consider NFSoRDMA but the Windows server is the only machine that would need access to the data and has an Infiniband connection. All the other machines are desktops without Infiniband cards. I therefore believe it will be better to take the space as block storage and then share from the Windows server via GbE.

RB
 
Last edited:

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
After a fair amount of testing by myself and Chuckleb, the answer seems to be that anything newer than v 2.7 firmware on the Connect-X cards does not work with the ESXi Infiniband VIM (drivers).

Someone has also mentioned that they have found that this issue also seems to be related to VT-d (passthrough). If it is disabled then v 2.9 of the Connect-X firmware works.

Mellanox no longer supports the Connect-X cards (only the Connect-X2 & X3 cards) so there is unlikely to be a fix.

I have finally managed to get a couple of Dell MD1000 DAS units which came in around US$400 each with drive caddies. The cables were not cheap as they are the older SFF-8470 to SFF 8644 (micro SAS). I also managed to get a kvm drawer (17") with an 8 port KVM built in (Belkin) and a 16 port KVM expander for US$250 which was a bit of a bargain. No uplink cable between the KVM and Expander but came with KVM cables for up to 16 machines.

I will do some new pics when the equipment is a bit more settled in.
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
Nuts, wrong thread, this is the Hadoop cluster I am working on :D. Meant to post in the Hadoop thread.

Top to bottom;
TP-Link router (to manage DHCP allocation and keep this on a separate subnet).
Voltaire 4036 QDR Infiniband switch
HP Procurve 3500yl-48G
Dell MD1000 DAS (populated with 3TB Enterprise SATA drives).
Two Dell C1000 XS23-TY3 servers.

The MD1000 will only allow up to 2TB as it stands (connected to a Dell H200 SAS controller) but I have another DAS box coming my way, hopefully, that will sort that issue out.

Ok, my personal current setup.



My new KVM drawer (US$250 - 17" inc 8 port KVM and a 16port KVM extender with cables for around 10 ports).



Everything below the KVM drawer is the same as previous photos (Infiniband DDR switch, HP Procurve switch, patch panel, ISP equipment).

Some may also notice the HP ML110 G7 has gone. I sold it off as it really was too noisy. Now I have the C6100 I can do everything I like with that.

RB
 
Last edited:

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
Oh, the Cisco router is a 2800 Series router. It was for the Hadoop cluster but rather than learn how to configure it (so much work piled up at the moment) I got the TP-Link S$80 router which does just what I need so it is sitting idle and unused.

I took the chance and got some new backplanes for the Norco (without any help from Norco on what to buy that would be compatible) and after a bit of 'adjustment' they fitted and worked fine (with one casualty of war). Unfortunately they are not compatible with the Intel expander I have in that chassis. I am now using the Norco for SSDs direct attacked to the Solaris SANs 9202-16E on 2 connectors (8 lanes) while the MD1000s have hard drives (only the bottom one at the moment) and has one connector to each (4 lanes).

RB
 
Last edited:

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
Well I am seeing a pattern on my storage now. It is not good....

On average, every 5 days my Windows Server 2012 Essentials VM hangs. The VM cannot be cleanly shutdown via the vSphere client or via the ESXi CLI. The ESXi console keyboard also stops responding (both on the direct linked KVM rack console and via IPMI). I have to reboot via vSphere or sometimes hard reboot from the server poer switch to get the server up and running correctly again. Both my Linux VMs on the same server have not hung. The data transfer is also pausing every now and then with copying or reading large amounts of data too or from the Windows server.

The VMs drives are;
Win Server 2012 - VM FIles (Local), 500GB boot (VMFS - remote SRP), 4TB media (RDM - remote SRP)
Linux CentOS 6.4 - VM FIles (Local), 8GB boot (VMFS - remote SRP), 1TB media (RDM - remote SRP)
Linux CentOS 6.4 - VM FIles (Local), 8GB boot (RDM SSD - remote SRP), 50GB Minecraft (RDM SSD - remote SRP).

SAN (Solaris 11)
ZFS1 ZPools
5x 2TB Seagate Barracudas (120GB Cache SSD & 120GB Log SSD) - Healthy.
4x 1.5TB 7200.11 Seagate Barracudas (120GB Cache SSD & 120GB Log SSD) - Degraded (one drive failed).

Mirrored Zpool
2x 60GB OCZ Vertex II.

From the logs ont he ESXi server, it is the 2TB Barracudas that are timing out which is seen as they have the Windows Media share on them. The 1.5TB 7200.11s are running fine even whilst degraded with a little hiccup now and then but no lasting visible effect. The SSDs have not reported any sort of latency and the VM boots as if it has a SSD directly linked in to it. Everything is over DDR Infiniband.

I have some used 2TB Enterprise SATA drives (Ultrastars) coming in tonight and will be swapping my decktop drives out. I may use the desktop drives as backup drives in a raid 10 config or using MS storage spaces to pool them. Not sure at the moment but I am not confident they are coping well with ZFS. One of the 2TB Seagate drives failed and has been replaced.

Enterprise grade drives are not that expensive now especially if you look out for them on EBay. Avoiding having to play about with 10+ TB of data and moving it from one place to another with limited resources (spare disks, cash etc) is worth the extra for me.

RB
 

xnoodle

Active Member
Jan 4, 2011
258
48
28
Is anything else set up to use the 2tb barracuda zpool? Can try perf runs on it or individual drives (after you rotate them out?)
 

RimBlock

Active Member
Sep 18, 2011
837
28
28
Singapore
Not sure.

I need to track the SRP targets to the LUNs to the ZPools :(.

I am actually pretty happy the degraded ZPool is running very well even degraded. Sure there is no redundancy but it still seems reasonibly fast.

Having SSDs in a SAN and being able to use them as a remote boot drive is great. The speed of SSD over IB to multiple machines with a small number of SSD drives and redundancy.

I actually have IOZone compiled so I will eventually get round to some benchmarks.

RB