Mellanox IS5030 - Managed QDR Infiniband Switch Write-up

Dajinn

Active Member
Jun 2, 2015
512
78
28
31
Hi folks. I wanted to take the time to do a write up for the infiniband switch I got. I'm aware there is official documentation out there for the product, however, I wanted to walk through the processes as I receive my hardware and set everything up to maybe hopefully help others who choose to implement infiniband at work or at home and maybe to help those who purchase the same switch and are confused on how to set it up or configure it.

I would prefer to span some of the different sections into their own posts so I'll reserve the next 3 posts for that and will update each accordingly and delete un-used posts.

First things first: the switch.

The switch I'll be using for this is specifically the MIS5030Q-1SRC, otherwise known as the InfiniScale IV IS5030.






The rear of the unit houses the power supplies(max of 2)and the fan unit. My switch only came with one power supply. Both objects are easily hot swappable and removable.

The fan unit:






The PSU:




After receiving the switch the first thing I did was power on and connect my RJ45 to RS232 cable. I found these 1 dollar cisco cables on ebay and bought 2 in case one was busted, not bad for 2 bucks and the first one I tried worked perfectly.

New Cisco Console Cable RJ45 to DB9 Cable Switch Router SHIP Today 744664241835 | eBay

When you first initially get a switch that has either been defaulted to factory settings or is brand new the Mellanox documentation states you must use the console port for the setup before you can use the "Mgmt" port. The Mgmt port is how you connect to the web interface of the switch and manage it.

Since I did this on a Windows machine I found HyperTerminal immensely useful and efficient for connecting to the switch's serial interface. I don't remember at what exact point in Microsoft's OS lineup they began removing HyperTerminal from the add/remove features but luckily I found a standalone download somewhere on the net that works just fine in Server 2012 R2 DataCenter.

Dropbox - HyperTerminal.zip

Connecting to the interface is very simple. If you've never used any kind of COM1 interface before just use these settings on the initial popup you get when HyperTerminal opens:



After you click apply and okay, after a few seconds the switch will output information into the console asking if you want to run the wizard for the initial configuration. Type 'y' here. **If you get a login prompt and suspect the switch was reset by the seller prior to shipment the default login is username admin, password admin.

Additionally, it is possible after all that some sellers do not reset switches. They simply may sell "working/tested pulls". If you need to reset the switch there is a very small paper-clip-like hole on the front-left housing of the switch. This reset switch performs two types of resets:

- A quick, 1 second press of this reset switch brings all ports down and brings them back up

- Holding the reset switch for 15 seconds the entire switch restarts and the login password is deleted. You will then be able to enter without a password and set a new one*.

*Based on Mellanox's documentation it is unclear is this procedure completely resets all of the settings of the managed switch software itself. Further testing is needed.

The following information is taken directly from the IS5030 installation guide and serves to explain all of the possible prompts and outcomes you get when configuring the switch via the wizard.

Wizard Prompts:




Other setup configs:

a) zeroconf config -

b) static ip config -



Once your switch is properly configured you should be able to access the web management interface from any browser using the IP it was assigned.

The IS5030 and IS5035 switches are both internally managed in that I believe this means the switch can act as a subnet manager on its own instead of relying on the user to individually initialize OpenSM nodes on clients to hand out IPs to infiniband interfaces. I'll confirm this once I have more of my equipment.

In most of, if not all cases, the 5030/5035 switches utilize FabricIT for their internally managed software interface. No additional software needs to be installed on a client side, however, I believe it is possible that FabricIT is a premium software offering from Mellanox and may require licensing when utilized separately or with switches that don't natively include FabricIT. In the midst of my reading this switches documentation it is stated that the switch includes a license to this internally managed switch firmware/OS.

It is true that it does include a license;it is both embedded into the firmware and it is physically located underneath the informative pull out tab located under the USB port(see the above pics)in case you needed to reinstall the key or need to reflash FabricIT for whatever reason.

This is the default login page you get when accessing the web interface:



After logging in you get a quick summary page with quite a decent amount of options both up top and to the side.



The summary views for things like the temperature and fans are actually quite informative and nice to look at.



Surprisingly there are a lot of different settings you can change. Frankly I didn't expect to get this level of configuration.

Port Configuration:



Now this next bit is interesting. If you haven't already noticed, upon initial configuration of the switch the subnet manager is not actually set to run by default and this is made clear by the message in the top right hand corner stating "subnet manager is not running". Under Fabric MGMT is where your options for the subnet manager reside.

I'm not exactly clear at this point on if the settings are optimal but to me it reads as such: enable the subnet manager on this node(dv-is5030), set the SM priority for this node to be the highest(meaning when other nodes are present this particular unit will handle subnetting and then trickle down through the priority list on hardware failures).



The subnet settings get even more advanced under "Advanced"(go figure) and "Expert".

This pretty much wraps up where I'm at right now. Shout out in a reply if you want to see a specific image from one of the configuration pages in the web interface and I'd be happy to help out. Or if you want me to check for anything specific(such as a feature).

The next steps in my project will be outlined in the following reserved posts. I'm waiting on my QSFP cables and ConnectX-2 VPI cards to come in and then I'll be installing them and configuring them in a Server 2012 R2 environment.
 
Last edited:

Dajinn

Active Member
Jun 2, 2015
512
78
28
31
My QSFP cables arrived! Now I just need my ConnectX-2 cards.



Got 18 3m cables. Don't need that many yet but I will. Just hope all of them work.

The web interface shows the correct identification for the cables which I find interesting. It matches the information tag on the cables themselves. IBM - Amphenol. I plugged each end into a different port on the switch just to made sure both ends of this one random cable worked and to see how it would react in the management UI.

 
Last edited:

Dajinn

Active Member
Jun 2, 2015
512
78
28
31
Update for the installation of the cards and software installs.

So I already ran into a little unexpected hiccup and something I had no idea about until I even got around to begin the driver update of these cards.

On the Mellanox website the latest driver package is version 4.9 for WinOF. There's a slight problem here. This driver package updates the drivers but doesn't update the adapter firmware. It errors out at the end of the install with code 1005 saying that there weren't any adapters available for update.

Ramifications:

1. You need a firmware higher than 2.9.8350 for RDMA capability
2. The most recent firmware version available for download through the Mellanox site is 2.9.1000.

I already saw the guide on this site about burning a custom firmware of the most recent version but I haven't so far been able to procure the firmware file as it doesn't get extracted in the temp folders in the more recent downloads.

update:

I located an installer package that had a higher enough version of the firmware required and built/burnt the firmware image. RDMA now shows as "true" for the capability of the adapter.

Update 9/19:

I installed all of my VPI cards into my servers. Windows detected them without an issue and used its own drivers but we just can't have that.

The next steps for any prospective IB users are to update the drivers and update the firmware so you get RDMA capability.

Go ahead and download the latest(as of this post)ConnectX-2 Win-OF VPI drivers:

http://www.mellanox.com/downloads/WinOF/MLNX_VPI_WinOF-4_95_All_win2012R2_x64.exe

Run the installer and let it complete. Unless you have a purely, came-from-Mellanox, IB adapter, you'll likely run into an error at the end of the installer saying the firmware could not be burned to the card, error 1005. Do not worry about this as we will tackle this next.

Now that the drivers are up to date we'll update the firmware to a version that enables the RDMA capability. For this you'll need two files, a firmware file(.mlx) and an .ini file.

Download all 3 of the files below and put them in an easy to reach location via the command prompt.

.ini file - Dropbox - MT_0FC0110009.ini
.mlx firmware (10.2.720) - Dropbox - fw-ConnectX2-rel.mlx
Firmware tools - Dropbox - WinMFT_x64_4_1_0_34.exe

1. Install the Mellanox firmware tools using the provided downloaded above.

2. Verify that you don't need to reboot by opening a command prompt and typing mst status. If you get an unrecognized command you'll need to reboot for the updated environmental variable in Windows to take hold. Otherwise, continue to enter mst status and observe the output.

Take note of the first line in the output. Your results may vary but in this case we're interested in mt26428_pci_cr0



3. Next, run the command flint -d <pci_id> query. Fill in the pci_id, without the enclosing symbols, with whatever output you got from the previous command. In this case the final command would be flint -d mt26428_pci_cr0 query



Take note of the output on the line PSID. Your results may vary but in this case we're interested in MT_0FC0110009.

**I've read a resource that suggests you need to find your "board ID" but as you can see running the command does not return one. However, the board ID and the PSID are usually the same it's a non-issue.

4. Now you're almost ready to burn the firmware. Take the .ini file above and do two things to it:

4a. Rename the file to be the same as your PSID/Board ID. In this case the .ini file will be named MT_0F0C110009.ini

4b. Open the .ini file and adjust 2 things and save the file: the name under [PS_INFO] and the PSID under [ADAPTER], see the screenshot below:



5. Now you're ready to burn the firmware to the adapter. Make sure you're in an elevated command prompt and navigate to the directory that contains both the Mellanox firmware file and the .ini file. Run the command mlxburn.exe -dev mt26428_pci_cr0 -fw fw-ConnectX2-rel.mlx

6. The "custom" firmware will be prepared and burned to the adapter. Once the process finishes reboot and verify that the firmware now properly reflects the newer version of 10.2.270 by running flint -d <pci_id> query or checking out the details in Device Manager > Network Adapters > right-click (Mellanox ConnectX-2 IPoIB Adapter) > Properties > Information tab.

6a. You can also verify that the adapater is now RDMA capable by running the PowerShell cmdlet(in an elevated prompt) Get-SmbServerNetworkInterface.

Now sit back and enjoy some high speed networking.



I'd like to give credit to JEFF SHUKIS of ServeTheHome for the initial article he wrote detailing the process for locating/burning the custom firmware to the Mellanox adapters. I was not able to follow his process step by step as I had to hunt down the firmware via other methods but the process is more or less 98% the same and it is not my intent to steal his thunder only to keep alive a process which more users are sure to go through. Thanks Jeff for the awesome write up.
 
Last edited:

Chuckleb

Moderator
Mar 5, 2013
1,017
331
83
Minnesota
This is great! It's also really timely since we are about to do some IB cleanup so I just sent the link to my staff for them to follow through! Thanks!
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,292
1,756
113
CA
I hope the IB Switch I received has a nice web management interface like that, wow impressive!
 

Dajinn

Active Member
Jun 2, 2015
512
78
28
31
The real challenge will be making sure that I get all of my transactions over IB and not GbE!
 

Dajinn

Active Member
Jun 2, 2015
512
78
28
31
It's not too loud. As chuckleb said in another thread it does sound like a jet engine on power-on and then gets silent. I still can't hear anything else over my damn 1200W gold supermicro PSU. Unfortunately I don't have a decibel meter so I can't give you any quantifying answers...yet. If I had to guess I'd say it's just a bit louder than my Dell switch which is just kind of like a quiet, yet annoying, idle low hum. Funnily enough I would say out of my complete stack of hardware so far my R610s are the quietest...

Updated second post.
 

Dajinn

Active Member
Jun 2, 2015
512
78
28
31
Another update...the next hurdles of this are to find out why transfers always plummet at the exact same time and how to keep networks separated.

 

neo

Well-Known Member
Mar 18, 2015
672
362
63
Now you've got me thinking of playing around with Infiniband in my home lab. I wonder what the differences are between this switch and a Mellanox MTS3600Q - which I see for cheap on eBay.
 

bds1904

Active Member
Aug 30, 2013
271
76
28
Another update...the next hurdles of this are to find out why transfers always plummet at the exact same time and how to keep networks separated.
How's iperf testing? I've run into issues on file transfers before, they are caused by a drive's inability to maintain writes over a period of time. Iperf and testing to a ramdrive is the way to go.
 

Dajinn

Active Member
Jun 2, 2015
512
78
28
31
How's iperf testing? I've run into issues on file transfers before, they are caused by a drive's inability to maintain writes over a period of time. Iperf and testing to a ramdrive is the way to go.
Neither really yield great results which is a bit surprising.

Here's just iperf testing, bound to the IP addresses on my IB adapters, and testing from client to server and server to client transfers.



RAM Disk Test - 35GB RAM Disks, 48GB RAM capacity on the servers, perhaps that's an issue?



edit/update:

I reduced the size of the ram disks to ~20GB so that my systems would have some breathing room and performed a transfer on a 19GB file. It seems to be more consistent but unfortunately RAM disks right now aren't a reliable way to gauge file transfer throughput since I can't dedicate a large amount to large files without running into that RAM wall where transfer speeds plummet. And there's not enough time in smaller file transfers to tell if it's going to dip.
 
Last edited:

bds1904

Active Member
Aug 30, 2013
271
76
28
It defiantly looks like it's filling a buffer and when it is full the transfer just caps at a certain speed.

That being said your speeds aren't anything to be ashamed of.:cool:

4.5-6.0 Gb/sec is about the max you will see on an mtu of 1500. That's the limit where latency becomes the deciding factor. An mtu of 9000 and a test running at 9000mtu should yield 8.2Gb/sec or so. Because of the TCP overhead you most likely won't see much higher than that.
 

Dajinn

Active Member
Jun 2, 2015
512
78
28
31
It defiantly looks like it's filling a buffer and when it is full the transfer just caps at a certain speed.

That being said your speeds aren't anything to be ashamed of.:cool:

4.5-6.0 Gb/sec is about the max you will see on an mtu of 1500. That's the limit where latency becomes the deciding factor. An mtu of 9000 and a test running at 9000mtu should yield 8.2Gb/sec or so. Because of the TCP overhead you most likely won't see much higher than that.
How do I increase the MTU on these IB cards? I've been doing some research and see some talk about connected mode but I don't think that's supported via IPoIB and the max MTU I can set is 4096 at the card and switch level.

Also, just to play devil's advocate, IMO the speeds aren't worth the money spent, I could've loaded all my rigs up with 4-6 port GbE cards and gotten the same results over SMB multi channel, lol. :cool:
 
  • Like
Reactions: Chuntzu

bds1904

Active Member
Aug 30, 2013
271
76
28
Sounds like that's your MTU limit, set it to that on everything and try:

Iperf3 -c 172.35.255.1 -M 4056 -m

If you were looking just for fast SMB xfers then IB wasn't the cheapest choice by far. IB really shines as shared iSCSI or NFS storage. The reliable latency and sustained speed IB delivers in those applications is why IB is a solid shoice for them.

That being said I still prefer FC to this day for shared storage. I can get 8Gb throughput on $15 cards, a $100 switch (trancievers included) and $5 fiber patch cables.
 
Last edited:

Stanza

Active Member
Jan 11, 2014
205
41
28
What do you get with

iperf -s 172.35.255.1 -i 2 -t 20 -w 1024k -P 6

Iperf
update interval every 2 seconds
running for 20 seconds
packet size 1024k
running 6 concurrent thread

.
 

Dajinn

Active Member
Jun 2, 2015
512
78
28
31
Sounds like that's your MTU limit, set it to that on everything and try:

Iperf3 -c 172.35.255.1 -M 4056 -m

If you were looking just for fast SMB xfers then IB wasn't the cheapest choice by far. IB really shines as shared iSCSI or NFS storage. The reliable latency and sustained speed IB delivers in those applications is why IB is a solid shoice for them.

That being said I still prefer FC to this day for shared storage. I can get 8Gb throughput on $15 cards, a $100 switch (trancievers included) and $5 fiber patch cables.
The result to that command is this:



About the same if I do (to client - from sender) or (from client - to sender).
 

Dajinn

Active Member
Jun 2, 2015
512
78
28
31
Interesting, running the iperf base command with no switches/options seems to reveal slightly better speeds...



Shouldn't I still be getting like at least 20 Gbit?