Infiniband/opensm on Debian strange behaviour

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

solon

Member
Apr 1, 2021
61
5
8
I have an infiniband network providing a SAN to my workstation, debian 10 server (runs opensm), ubuntu 20.04 workstation.

Without a partitions.conf file, everything works, ipoib pings fine, except that ibdiagnet gives an error message that the group speed is only 10Gb, while node speed is 40Gb.

Searching around (from here), I found that making a partitions.conf file with this line:
Code:
Default=0x7fff, ipoib, mtu=5, rate=7, defmember=full : ALL=full, ALL_SWITCHES=full,SELF=full;
Then restarting opensm makes the warning by ibdiagnet go away.

All fine so far.

However, at boot, it's not possible to ipoib ping a different node. The way to get everything working again, is disabling the partitions.conf config, restarting opensm, pinging another node (which has then become possible), rewriting the conf file, restarting opensm again, and then everything works without errors again.

I've made a few scripts to expedite this and messed with every permutation of the settings line I could come up with, but I cannot get the system to boot directly into a state where it's at both 40Gb and capable of ipoib communications at the same time.

My experience with infiniband is limited, and I'm hoping someone has some idea what's going wrong.
 

necr

Active Member
Dec 27, 2017
156
48
28
124
This partitions.conf limitation is essentially a rate limiter to IPoIB multicast groups (which is declared, but it’s a challenge to go beyond 30Gb/s). If you use native RDMA for your SAN like SRP, you can leave that partition at default values, IPoIB is not used. If you want to experiment, I’d suggest keeping default partition at default with just increased MTU - otherwise managed switches will generate errors.

Default=0x7fff, ipoib, mtu=5, : ALL=full

extra partition with the following:

Storage=0x7ffa, ipoib, mtu=5, rate=7, : ALL_CAS=full

For a secondary partition you don’t need switches, you just need the cards(all_cas), and you’ll need to create a new sub interface on your hosts.
 

solon

Member
Apr 1, 2021
61
5
8
Thanks for the suggestion. I'll give this a go. I'm using iscsi/iser for my SAN, which I think does use ipoib to establish the connection? Not sure the ipoib thing will matter for the data speeds though as it seems to hand the actual data transfer over to pure infiniband?

In the meantime I've also noticed that the windows boxes which I want to be able to use smb direct from the debian server have no connectivity with the partitions.conf I'm currently running, and my current forays into the settings there I haven't even been able to find where to specify mtu or connected mode. Is it me or is the winOF manual from mellanox a little lacking?

I have a simple IS5022 switch which doesn't have a subnet manager. I may change it in future but the IS5022 was easy to make silent and uses alot less power than a 5023 or a 6-series switch.

I'll give these settings a go and see if the interfaces actually have connectivity at boot. I assume I'm also going to have to specify which partition to use in the iser/iscsi settings somewhere.
 

solon

Member
Apr 1, 2021
61
5
8
With these settings ibdiagnet reports a suboptimal rate for group again though. I'll fiddle around a bit though and see what I can come up with.
 

solon

Member
Apr 1, 2021
61
5
8
These settings seem to make ibdiagnet happy, and everybody including the windows boxes can be pinged. Have to see if it all works without restarting opensm at boot, I'll report findings then.

Code:
Default=0x7fff, ipoib, mtu=5, rate=7 : ALL=full;
Storage=0x7ffa, ipoib, mtu=5, rate=7 : ALL_CAS=full;
 

solon

Member
Apr 1, 2021
61
5
8
After some more testing it seems the mtu=5 setting is what's causing the lack of connectivity. If I remove that, everything works at boot and ibdiagnet is happy.

I assume everything is in order that way, I've set 65520 as mtu on the interface with ifconfig and I have no real reason to assume it isn't working as expected, but I'll have to do some performance tests to be sure.
 

solon

Member
Apr 1, 2021
61
5
8
Should anyone still reading this, I'm understanding opensm less and less I think. With the latest set of settings in the partition.conf as described above, infiniband connectivity is fine at boot and ibdiagnet reports full speed across all nodes.

However, opensm still intermittently requires a restart on the server when my workstation powers up in order to make iscsi/iser accessible, even though ssh communication and ping is possible without restarting opensm. Opensm status and journalctl simply report that the iscsi connection couldn't be established.

That makes absolutely no sense to me.
 

necr

Active Member
Dec 27, 2017
156
48
28
124
General rule of thumb is to have a running SM as daemon somewhere in the fabric. SRP, for example, won’t establish new connections if SM is down, iSER wants to open an RDMA session which requires SM as well. Do you run some weird SM, or it runs once and exits?
 

solon

Member
Apr 1, 2021
61
5
8
I run standard opensm as a service on my debian rig which is the server for most things on the network. Or at least, I start the service... your post makes me wonder whether it might actually only be running once... I'll check.

If I shut down my workstation in the evening, and start it back up, it's got about a 50/50 chance of not being able to reconnect the iser/iscsi shares, which seems to be related to trying to do so too quickly. The same rig, rebooted has so far worked the second time 100%, or it also works if I restart opensm on the server and open-iscsi on the workstation.
 

solon

Member
Apr 1, 2021
61
5
8
The plot thickens somewhat. The most recent bout of not being able to connect to the iscsi/iser target seems to have been caused by targetcli randomly losing all configuration. Restored a backup and all works again. Haven't taken the time to search for a cause for that yet.