...
MILC Profile based on 32 Nodes Helios cluster.
MPI Communication
...
MPI Message Sizes
91% of MPI Communication spent on MPI_Wait while 5% MPI Allreduce 8 bytes. In addition, we see async send and receive MPI communication.
...
MPI time
About 20% of imbalance of the applicaiton can be seen.
...
MPI time among the 256 ranks sorted by time spent in MPI shows that rhere is a load imbalance of about 20% between the ranks that spend the most and the least time in MPI:
...
MPI time ordered by rank, shows imbalanced imbalance between sockets (4 processes MPI ranks per socket, 5 OpenMP threads per MPI rank).
...
Communication Matrix
...
Memory Footprint
...
Summary
95% scaling was achieved from 16 to 32 nodes for the medium benchmark on Helios Cluster, using HDR InfiniBand network. A 3% difference was seen comparing 5 OpenMP threads comparing to 10 OpenMP threads on Helios.
...