Hi all
I'm new to this forum but have been reading it for a while and I have this forum to thank (much to the missus disgust lol) for all the money I have just spent on a basic IB setup for home, this is what I have purchased:
1. Voltaire GridDirector 9024D (not M) - I plan on getting a QDR switch after xmas when I've gained a bit more knowledge
2. 2 x MHGH28-XTC REV X1 (old revision) flashed to 2.7000 firmware from 2.6
3. VMware ESXi 5.1
4. 2 x CX4 Cables
5. 2 x Custom built desktop machines housing the HCA's, 1 x 8 Core AMD system with 32 GB RAM and 2 x Intel 335 180GB SSDs (working on this, buying 2 more end of the month)
Ok so far this is what I have done to setup Infiniband in VMware ESXi 5.1:
1. Install the Mellanox / VMware drivers
esxcli software vib install -d /tmp/mlx4_en-mlnx-1.6.1.2-offline_bundle-471530.zip
esxcli software vib install -d /tmp/MLNX-OFED-ESX-1.8.2.0.zip
2. Installed the OpenSM VIB
esxcli software acceptance set --level=CommunitySupported
esxcli software vib install -v /tmp/ib-opensm-3.3.16.x86_64.vib
3. Reboot
4. Change MTU and create partitions.conf
vi /tmp/partitions.conf
Default=0x7fff,ipoib,mtu=5:ALL=full;
cp partitions.conf /scratch/opensm/0x001a4bffff0c1399/
cp partitions.conf /scratch/opensm/0x001a4bffff0c139a/
esxcli system module paramters set -m=mlx4_core -p=mtu_4k=1
5. Reboot again just to make sure
6. Create a Virtual Network vswitch using the vmnic_ib0 interface (shows as 20000 Full)
7. Built two Windows 7 machines via each of the 2 SSD's and both sitting on the 20000 Full vswitch, so theoretically I should be able to get close to 500MB/s copying data between them (maybe?) - both interfaces show as 10G nics on each Windows server.
8. Copied a file between them on the IB network, copy speed is hitting around 290MB/s (15 secs to copy a 3.6GB ISO)
Some notes:
1. I realise that both machines will share that single 20Gbps interconnect (10GB) but I still should get more than 290MB/s right - that is a poor speed considering the potential bandwidth - I expected to max out the SSDs? (I also realise IPoIB is not fully optimised right now and V2 is coming - Mellanox)
2. I cannot raise the MTU any higher than 2000, it errors telling me the uplink rejected the higher value (I think because the cards are old)
3.Are there any other areas I can do to push the SSD's to the max - I expected to be able to do this or are the age of the HCA cards holding me back?
4. I was considering using a spare machine as a Linux based SAN and using SRP Linux <--> ESXi, would I be better off performance wise with this and using RAID on the 2 (soon to be 4 maybe 5 SSDs)? I know I only have 2 SSDs but my goal is to get as close to a SSD based SAN as possible.
Would really appreciate your input guys from the IB gurus on here, thanks!
PS: Windows vNICs are VMXNET3 using netsh to set the custom MTU to 2k.
I'm new to this forum but have been reading it for a while and I have this forum to thank (much to the missus disgust lol) for all the money I have just spent on a basic IB setup for home, this is what I have purchased:
1. Voltaire GridDirector 9024D (not M) - I plan on getting a QDR switch after xmas when I've gained a bit more knowledge
2. 2 x MHGH28-XTC REV X1 (old revision) flashed to 2.7000 firmware from 2.6
3. VMware ESXi 5.1
4. 2 x CX4 Cables
5. 2 x Custom built desktop machines housing the HCA's, 1 x 8 Core AMD system with 32 GB RAM and 2 x Intel 335 180GB SSDs (working on this, buying 2 more end of the month)
Ok so far this is what I have done to setup Infiniband in VMware ESXi 5.1:
1. Install the Mellanox / VMware drivers
esxcli software vib install -d /tmp/mlx4_en-mlnx-1.6.1.2-offline_bundle-471530.zip
esxcli software vib install -d /tmp/MLNX-OFED-ESX-1.8.2.0.zip
2. Installed the OpenSM VIB
esxcli software acceptance set --level=CommunitySupported
esxcli software vib install -v /tmp/ib-opensm-3.3.16.x86_64.vib
3. Reboot
4. Change MTU and create partitions.conf
vi /tmp/partitions.conf
Default=0x7fff,ipoib,mtu=5:ALL=full;
cp partitions.conf /scratch/opensm/0x001a4bffff0c1399/
cp partitions.conf /scratch/opensm/0x001a4bffff0c139a/
esxcli system module paramters set -m=mlx4_core -p=mtu_4k=1
5. Reboot again just to make sure
6. Create a Virtual Network vswitch using the vmnic_ib0 interface (shows as 20000 Full)
7. Built two Windows 7 machines via each of the 2 SSD's and both sitting on the 20000 Full vswitch, so theoretically I should be able to get close to 500MB/s copying data between them (maybe?) - both interfaces show as 10G nics on each Windows server.
8. Copied a file between them on the IB network, copy speed is hitting around 290MB/s (15 secs to copy a 3.6GB ISO)
Some notes:
1. I realise that both machines will share that single 20Gbps interconnect (10GB) but I still should get more than 290MB/s right - that is a poor speed considering the potential bandwidth - I expected to max out the SSDs? (I also realise IPoIB is not fully optimised right now and V2 is coming - Mellanox)
2. I cannot raise the MTU any higher than 2000, it errors telling me the uplink rejected the higher value (I think because the cards are old)
3.Are there any other areas I can do to push the SSD's to the max - I expected to be able to do this or are the age of the HCA cards holding me back?
4. I was considering using a spare machine as a Linux based SAN and using SRP Linux <--> ESXi, would I be better off performance wise with this and using RAID on the 2 (soon to be 4 maybe 5 SSDs)? I know I only have 2 SSDs but my goal is to get as close to a SSD based SAN as possible.
Would really appreciate your input guys from the IB gurus on here, thanks!
PS: Windows vNICs are VMXNET3 using netsh to set the custom MTU to 2k.
Last edited: