This is very helpful. So it sounds like the worker nodes do not really need local storage; it's only the master node that periodically outputs a 5GB solution file. In which case gluster is not really needed (as opposed to NFS or just local storage on the master node), and also that network bandwidth might not be a critical issue (although perhaps latency, depending on the frequency and nature of inter-node MPI communication). Algebraic solvers tend to be very very CPU-intensive, secondarily RAM, then network and lastly storage.The pi test is just a simple “can I get the basics of MPI to work” test. The real use case involves solving large systems of equations using a software package called Petsc. Solving the system of equations typically takes 80% of the cpu time in serial and these types of calculations scale very well on shared memory machines and optimized distributed memory clusters. So the algorithm involves the master node spawning an MPI calculation over multiple nodes, each node builds its local chunk of the system (typically ~20-150G of RAM per node) and the system is solved with internode communication. When this is done a 1-5G solution file is saved to disk that is read by the main program running on the master node. I am using Glusterfs volume as a network file system from which MPI code is executed. I am open to alternatives for spawning parallel MPI calculations and am only using GlusterFS because the examples above suggested this as a better alternative to NFS.
I do not have huge storage needs since this cluster is really about performing calculations that involve large amounts of RAM as quickly as possible. After calculations are completed I copy the results over the network (~100-300G) to my Threadripper workstation for visualization and post-processing. This cluster is used by a very small team so there is no need to worry about managing many users etc..
Have you considered offloading BLAS operations to GPUs?