...
2. Login to the head node and raise the relevant modules.
Code Block |
---|
$ module load intel/2018.1.163gcc/11 $ module load hpcx/2.1.019 |
3. Make sure that both directories (HPCX_SHARP_DIR, OMPI_HOME) are available
Code Block |
---|
$ echo $HPCX_SHARP_DIR /global/software/centos-7rocky-9.x86_64/modules/intel/2018.1.163gcc/11/hpcx/2.1.019/sharp $ echo $OMPI_HOME /global/software/centos-7rocky-9.x86_64/modules/intel/2018.1.163gcc/11/hpcx/2.1.019/ompi |
4. Set the parameters sharp_enabled 2 and routing_engine ftree,updn (or any other routing algorithm) in the opensm configuration file /etc/opensm/opensm.conf (or any other location)
...
Code Block |
---|
sudo $HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh -s -d sharp_am Copying /global/software/centos-7rocky-9.x86_64/modules/intel/2018.1.163gcc/11/hpcx/2.1.019/sharp/systemd/system/sharp_am. service to /etc/systemd/system/sharp_am.service Service sharp_am is installed |
...
2. Start SHARP AM service
Code Block |
---|
$ sudosystemctl servicestart sharp_am start$ Redirectingsystemctl to /bin/systemctl startstatus sharp_am ● sharp_am.service - # /global/software/centos-7/modules/gcc/4.8.5/hpcx/2.7.0/sharp/bin/sharp_am -O /etc/sysconfig/SHARP Aggregation Manager (sharp_am).cfg -B |
3. Setup sharpd on all cluster nodes (using pdsh or any other method)
Code Block |
---|
$ sudo $HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh -s -d sharpd Copying /global/software/centos-7/modules/intel/2018.1.163/hpcx/2.1.0/sharp/systemd/system/sharpd.service to Version: 3.7.0 Loaded: loaded (/etc/systemd/system/sharp_am.service; enabled; preset: disabled) Drop-In: /etc/systemd/system/sharpdsharp_am.service.d Service sharpd is installed |
4. Start the SHARP daemon on all cluster nodes
Code Block |
---|
$ sudo service sharpd start Redirecting to /bin/systemctl start sharpd.service└─Service.conf Active: active (running) |
SHARP Parameters
1. hcoll must be enabled to use SHARP. Add -mca coll_hcoll_enable 1 to your mpirun.
...
Code Block |
---|
$ mpirun -np 32 -npernode 1 -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -mca coll_fca_enable 0 -mca coll_hcoll_enable 0 osu_barrier -i 100000 # OSU MPI Barrier Latency Test v5.4.1 # Avg Latency(us) 4.17 $ mpirun -np 32 -npernode 1 -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -mca coll_fca_enable 0 -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx5_0:1 osu_barrier -i 100000 # OSU MPI Barrier Latency Test v5.4.1 # Avg Latency(us) 3.29 $ mpirun -np 32 -npernode 1 -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -mca coll_fca_enable 0 -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx5_0:1 -x HCOLL_ENABLE_SHARP=1 osu_barrier -i 100000 # OSU MPI Barrier Latency Test v5.4.1 # Avg Latency(us) 1.64 |
...