Dell C6100 and Windows Server 2012 no RDMA

Toddh

Member
Jan 30, 2013
120
8
18
I installed ESOS on the C6100 the other day looking to use it as a SAN. The install went well and ESOS even recognized the infiniband card. This process led me however to do some research on connecting Win12k to ESOS and what protocols are available in the Win12k drivers. In that process I uncovered that the MCQH29-XDR firmware does not support RDMA under Win12k. Firmware ver 2.9.8350 or higher is required.

Minimum version of Mellanox firmware required for running SMB Direct in Windows Server 2012 - Jose Barreto's Blog - Site Home - TechNet Blogs

Since I am using Win12k this was disappointing news. Not a killer as the short term plan has nothing that uses RDMA but still unexpected.



.
 

cactus

Moderator
Jan 25, 2011
829
77
28
CA
I am lost. The technet blog just states you need a newer than 2.9.8350 firmware for SMB direct. Since ESOS looks to be Linux and Linux uses Samba, there is no SMB Direct. SRP is your next best bet for raw performance.

Edit: Can you do an ibstat on one of the Dell cards? I think you found this, but the MCQH29 looks to be an old ASIC. (MT25408) I believe it is a QDR ConnectX chip with VPI and not a true ConnectX-2.
 
Last edited:

cactus

Moderator
Jan 25, 2011
829
77
28
CA
OT: ESOS looks cool. I was thinking of doing something similar just out of a minimal Linux install.
 

Toddh

Member
Jan 30, 2013
120
8
18
ESOS is cool. SCST iSCSI SAN with SAN replication built in. Simple clean iSCSI target. Project is under active development and changes are being added regularly.

OT: ESOS looks cool. I was thinking of doing something similar just out of a minimal Linux install.
 

Toddh

Member
Jan 30, 2013
120
8
18
You are right on cactus. Often my reading leads me all over the place. Every page I read seems to lead me to 2 - 3 others so I often end up in a totally different place than I started.

So ESOS runs over SRP. In reading up on the Win12k drivers it does not appear that SRP is supported like it was in the OFED 3.2 drivers for Win28k. I am not positive about this but SRP is not referenced in the 4.2 driver release notes they as it is for OFED 3.2.

I am lost. The technet blog just states you need a newer than 2.9.8350 firmware for SMB direct. Since ESOS looks to be Linux and Linux uses Samba, there is no SMB Direct. SRP is your next best bet for raw performance.

Edit: Can you do an ibstat on one of the Dell cards? I think you found this, but the MCQH29 looks to be an old ASIC. (MT25408) I believe it is a QDR ConnectX chip with VPI and not a true ConnectX-2.
In any case I read the post to indicate that RDMA was not supported( although the blog is about Win12k ). Correct me if I am wrong but Win12k would need RDMA to do SRP - SCSI RDMA?

My goal is an HA Storage solutions for a Hyper-V 2012 Cluster. Windows SMB 3.0 is an option but I am not 100% sold. Besides I like open source.


.
 

cactus

Moderator
Jan 25, 2011
829
77
28
CA
From my travels over the last year, SRP never really gained traction. It is cool because it is fast, but the spec wasn't completed and it would be hard to manage large scale. iSER, iSCSI using RDMA for data transfer and Ethernet for management, seems to be a better solution, but no iSER on Windows.:confused: So it seems we are left with iSCSI or NFS, does Win2k12 have a client?, over IPoIB.

SMB Direct is a really cool transport, but we wont see it included into Samba anytime soon.
 

Toddh

Member
Jan 30, 2013
120
8
18
I'm not looking at too large of a scale, 8 - 12 Hyper-v nodes. In order to Cluster Hyper-V my understanding is you need iSCSI or SMB 3.0. iSCSI over IPoIB is supported in Win12k. From there it gets unclear. There is no mention of SRP 4.2 in the driver docs and I don't see it in the driver install options. Same with SDP although there is the mention of and SDPconnect tool. I have not used either before so I not entirely sure what to look for.

IPoIB seems to be the defacto fallback that everyone uses but it is less than ideal. It has no RDMA and is slow compared to other methods. My best efforts so far between Win12k and Linux are about 300mb reads and 600mb writes. It does have the advantage that it can used for cluster communications.



.
 

seang86s

Member
Feb 19, 2013
147
12
18
I was wondering how you guys made out with your infiniband implementation? What method did you guys use and what kind of performance did you get without RDMA?

I'm not looking at too large of a scale, 8 - 12 Hyper-v nodes. In order to Cluster Hyper-V my understanding is you need iSCSI or SMB 3.0. iSCSI over IPoIB is supported in Win12k. From there it gets unclear. There is no mention of SRP 4.2 in the driver docs and I don't see it in the driver install options. Same with SDP although there is the mention of and SDPconnect tool. I have not used either before so I not entirely sure what to look for.

IPoIB seems to be the defacto fallback that everyone uses but it is less than ideal. It has no RDMA and is slow compared to other methods. My best efforts so far between Win12k and Linux are about 300mb reads and 600mb writes. It does have the advantage that it can used for cluster communications.

.
 

cactus

Moderator
Jan 25, 2011
829
77
28
CA
RDMA is being used for everything over Infiniband. Windows 2012 seems to not support RDMA native storage protocols like SRP, SMB Direct, and iSER with ConnectX and older ASICs. Like I said before, the Dell MCQH29 is not a true ConnectX-2 ASIC and thus does not support native RDMA storage protocols. Unless you have found otherwise Todd; I have seen you have been active on Mellanox's site, but I haven't been keeping up.
 

Toddh

Member
Jan 30, 2013
120
8
18
According to Mellanox the cards do not support RDMA with the 2.9.1000 bios. They have hinted they may fix that but nothing firm.

Since I am using Win12k for the cluster and mostly likely a Linux SAN it will have to be IPoIB for now. I have not totally ruled out using Win12k for the 1st CSV. I am mostly a Windows guy but I have worked a bit with linux. There are a couple universal truths with Windows, there will always be updates and there will always be reboots. As much as I would like to toy with all the new technologies in Win12k I dread the idea of having to shut down a group of VMs because the SAN needs a restart.



.
 

RimBlock

Member
Sep 18, 2011
788
8
18
Singapore
Interestingly I have got what I believe to be SRP working for ESXi. I have not tested speed but as I am not listing any iSCSI targets on the Solaris box and have a mapped lun on the ESXi box from the Solaris box it looks positive. I may have to VM my Windows 2012 Server install as a temp workaround.

I also read that some people have regressed some of the drivers to mix Wins 2012 & 2008r2 in order to reactivate SRP on Windows but that was for the pre-ConnectX HBAs and it didn't go well when I tried on my ConnectX.

I am just trying out Win Server 2011 Essentials now but Microsofts Partner site is down so I cant get my key :(.

Am wiring up the Supermicro low noise fans for my C6100 now to see if they make a difference. Initial results are good so far.

RB
 

britinpdx

Active Member
Feb 8, 2013
355
159
43
Portland OR
I got my Mellanox Infiniband Mezz cards installed and operational tonight running on Server 2012 on 2 nodes of a C6100 (Each node 2x L5520, 24GB RAM). I used the guide from a post on Jose Barreto's Blog.

This is a simple direct connection, Peer to Peer test of basic connectivity. I didn't attempt to get into the whole Hyper-V over SMB thing.

From memory my installation steps were as follows ( I bet I missed something ) ....

On both nodes ..

1) installed hardware & powered up
2) installed the Server 2012 WinOF VPI from here
The software complained about the firmware being old (or something like that). When going through the same steps on a different test box with a Mellanox MHQH19B card, the latest firmware was automatically downloaded and installed. Not so in this case.
3) Downloaded the Dell C6100 specific firmware from here
4) Reboot
5) From a CMD shell, manually installed the fw "flint -d mt26428_pciconf0 -i fw-ConnectX2-rel-2_9_1000-059MP7.bin burn"
6) Reboot

On node #1
7) Using Powershell, setup opensm as a service..
SC.EXE delete OpenSM
New-Service –Name "OpenSM" –BinaryPathName "`"C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\opensm.exe`" --service -L 128" -DisplayName "OpenSM" –Description "OpenSM" -StartupType Automatic
Start-Service OpenSM


On both nodes
8) Control Panel -> device manager -> system devices -> Mellanox ConnectX -> Properties -> Port Protocol > select IB for both ports
9) Disable IPV6 and manually set the IPV4 IP addresses to a subnet other than Ethernet (192.168.1.x). Set Mellanox Adapter IP to 192.168.10.1 and 192.168.10.2 on one node, 192.168.10.3 and 192.168.10.4 on the other node.
10) may have done a reboot here ... can't remember:confused:

11) Connected a single QSFP/QSFP cable between one port on each node.
12) ping to verify connection over the 192.168.10.x subnet

At this point I had basic connectivity, green and yellow LED's on the mezz cards ....

13) Port information reported from Control Panel -> device manager -> network adapters -> Mellanox ConnectX-2 -> Properties -> Information reports as follows ( gotta love that link speed !! ) ..


14) Here's the interesting thing, Get-NetAdapterRDMA reports that RDMA is enabled ..


15) Setup a StarWind RamDisk on each node, setup a share for each RamDisk so that each node could access local and remote RamDisks.

15) On Node 1, run Atto on the local RamDisk ..


16) On node 1, run Atto onto the remote RamDisk (mapped as "Z" ). At this time, node 1 and 2 are connected over Ethernet and IB. SMB 3.0 is apparently smart enough to figure out that the IB path is the fastest and use it over Ethernet...


17) Just for giggles, pull the QSFP cable and run Atto again on the remote RamDisk ..

There's my old friend, the 120 MB/s bandwidth limit that Gigabit Ethernet poses on Array to Array backups. It's pretty neat, though, that SMB 3.0 simply fell back to the slower connection automatically.

Plugged the QSFP cable back in, and after a few secs the connection is alive again.

So RamDisk to RamDisk copies are just crazy fast over IB, copying 1.5GB sized .mv4 files peaking at about 950MB/s ...


Probably old news to a lot of folks on this forum, but this is my first dabble with IB and I can't believe that I've missed something this fast for so long.

I'm hooked, but in the foreseeable future I'm never really going to be able to saturate this kind of a link :cool:
 

dba

Moderator
Feb 20, 2012
1,478
181
63
San Francisco Bay Area, California, USA
Gotta love it when you need to talk about throughput in gigaBYTES per second!

Just for fun, try setting up four ramdisks on your node instead of one. In my testing, that provided even better throughput than one ramdisk, and since we want to test IB and not StarWind, it's a fair test.

...
15) Setup a StarWind RamDisk on each node, setup a share for each RamDisk so that each node could access local and remote RamDisks.
...
 
Last edited:

RimBlock

Member
Sep 18, 2011
788
8
18
Singapore
So RamDisk to RamDisk copies are just crazy fast over IB, copying 1.5GB sized .mv4 files peaking at about 950MB/s ...
Thanks for the detailed instructions. I am also looking to document when / if I can get SRP running on Windows.

So with that speed you are doing 10Gbps speeds so i guess it is using IPoIB unless you are being hampered by ram speed :).

I am currently failing to get the SRP miniport driver up on WIndows 2008r2 but have a SRP connection between Solaris and ESXi 5.1 which flies.

If anyone would like to suggest the best way to benchmark the speed between Solaris and ESXi over Infiniband SRP then please let me know and I will stick up some figures.

@dba

Well If I can load up a game from a Solaris ZFS cache made of 7 SATS III SSDs a bit faster than others then that is bragging rights that is :D.

More seriously, 3 servers with multiple VMs pulling from a single storage server and having no local disks can eat in to the bandwidth fairly fast, as long as the storage server is scaled to be able to deliver that volume of data of course ;).
 

markpower28

Active Member
Apr 9, 2013
405
103
43
britinpdx:

Thanks for the instruction. I just get Mellanox dual port QDR Infiniband Daughter Card for C6100 as well. But I am having trouble to install the card. Unlike the expansion-card has a slot cover, which we can remove for the PCIE card, I am not sure how can I fit the Mezz card to the server.

Any help is greatly appreciated.
 

britinpdx

Active Member
Feb 8, 2013
355
159
43
Portland OR
I am not sure how can I fit the Mezz card to the server.
Glad to help .. this gave me a chance to test out my new speedlites (photography is my other (and way more expensive) hobby !)

It's not too difficult, a small cross head screwdriver is all you need.

Here's the sled before the mezz card install..



The 5 screws indicated in the red circles are the targets for removal. It's easier to start at the two at the top right and then remove the 3 back screws.

With all 5 screws removed you can now remove the blanking plate ..



Here's the mezz card ready to be installed into the sled. You can see he 3 threaded holes on the back plate that will be used for mounting ..



Note that the small pcie adapter card is installed into the mezz card prior to install ..



Using a very slight pull, I lifted up the edge of the pcie x16 slot retainer in order to carefully slide the edge of the mezz card underneath. This is probably the only step where care is needed to get things aligned. You can see the QSFP connectors on the mezz card fit into "openings" on the sled shield.



You also need to take care to get the pcie adapter card aligned ..



Once everything is in place, a gentle push down on the top of the mezz card to seat the connector is all that is needed. Now to button it all down.

retaining screw on the side ..



Replace the 5 screws that you removed in the earlier step and you're done ..

 

Smalldog

Member
Mar 18, 2013
62
2
8
Goodyear, AZ
britinpdx:

Thanks for the detail procedure and pictures. Looks like I have a version of the C6100 that does not allow me to install the daughter cardhttps://docs.google.com/file/d/0BynPQojkIX8rLUpmd2Z5MnZic1k/edit?usp=sharing

I need to figure something out :(.

Thanks again!

Is this the 2.5" 24 bay chassis? I bought some of the IB mezz cards and just realized after looking at your picture that I am in the same boat, but mine is the 24 bay model. And I think it's order than the two 12 bay models I have, both of which allow the IB mezz card.

Jeff
 

Smalldog

Member
Mar 18, 2013
62
2
8
Goodyear, AZ
Aside from eBay, where would be a good inexpensive source for the cables to go with this card, and does the mezz card only take QSFP cables? Are QSFP+ and QSFP compatible with each other?

I am a noob when it comes to this. I picked up a 10 pack of single port IB DDR cards on eBay for a decent price, and they work great, and I also scooped up a few of these, and wanted to get them up and running, but I am stuck on which cables to buy.

Thanks,
Jeff
 

markpower28

Active Member
Apr 9, 2013
405
103
43
mine is 12 bay chassis. I think there may be different versions of the node, maybe some model only allow SAS card.