Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

2. Login to the head node and raise the relevant modules.

Code Block
$ module load intel/2018.1.163gcc/11
$ module load hpcx/2.1.019


3. Make sure that both directories (HPCX_SHARP_DIR, OMPI_HOME) are available 

Code Block
$ echo $HPCX_SHARP_DIR
/global/software/centos-7rocky-9.x86_64/modules/intel/2018.1.163gcc/11/hpcx/2.1.019/sharp
$ echo $OMPI_HOME
/global/software/centos-7rocky-9.x86_64/modules/intel/2018.1.163gcc/11/hpcx/2.1.019/ompi

4. Set the parameters sharp_enabled 2 and routing_engine ftree,updn (or any other routing algorithm) in the opensm configuration file /etc/opensm/opensm.conf (or any other location)

...

Code Block
sudo $HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh -s -d sharp_am
Copying /global/software/centos-7rocky-9.x86_64/modules/intel/2018.1.163gcc/11/hpcx/2.1.019/sharp/systemd/system/sharp_am.                                               service to /etc/systemd/system/sharp_am.service
Service sharp_am is installed

...

2. Start SHARP AM service 

Code Block
$ sudosystemctl servicestart sharp_am
start$ Redirectingsystemctl to /bin/systemctl startstatus sharp_am
● sharp_am.service - # /global/software/centos-7/modules/gcc/4.8.5/hpcx/2.7.0/sharp/bin/sharp_am -O /etc/sysconfig/SHARP Aggregation Manager (sharp_am).cfg -B

3. Setup sharpd on all cluster nodes (using pdsh or any other method)

Code Block
$ sudo $HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh -s -d sharpd
Copying /global/software/centos-7/modules/intel/2018.1.163/hpcx/2.1.0/sharp/systemd/system/sharpd.service to Version: 3.7.0
     Loaded: loaded (/etc/systemd/system/sharp_am.service; enabled; preset: disabled)
    Drop-In: /etc/systemd/system/sharpdsharp_am.service.d
 Service sharpd is installed

4. Start the SHARP daemon on all cluster nodes

Code Block
$ sudo service sharpd start Redirecting to /bin/systemctl start sharpd.service└─Service.conf
     Active: active (running)


SHARP Parameters 

1. hcoll must be enabled to use SHARP. Add -mca coll_hcoll_enable 1 to your mpirun.

...

Code Block
$ mpirun -np 32 -npernode 1 -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -mca coll_fca_enable 0 -mca coll_hcoll_enable 0 osu_barrier -i 100000

# OSU MPI Barrier Latency Test v5.4.1
# Avg Latency(us)
             4.17

$ mpirun -np 32 -npernode 1 -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -mca coll_fca_enable 0  -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx5_0:1 osu_barrier -i 100000

# OSU MPI Barrier Latency Test v5.4.1
# Avg Latency(us)
             3.29

$ mpirun -np 32 -npernode 1 -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -mca coll_fca_enable 0 -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx5_0:1 -x HCOLL_ENABLE_SHARP=1 osu_barrier -i 100000

# OSU MPI Barrier Latency Test v5.4.1
# Avg Latency(us)
             1.64

Note: in above testing the first command runs osu_barrier without HCOLL, the second command runs it with HCOLL but without SHARP, the third command runs it with HCOLL and SHARP.

...