Set up the cluster and OpenSM
1. Make sure that your cluster meet with the minimum requirements, see the deployment guide here.
2. Login to the head node and raise the relevant modules.
$ module load intel/2018.1.163 $ module load hpcx/2.1.0
3. Make sure that both directories (HPCX_SHARP_DIR, OMPI_HOME) are available
$ echo $HPCX_SHARP_DIR /global/software/centos-7/modules/intel/2018.1.163/hpcx/2.1.0/sharp $ echo $OMPI_HOME /global/software/centos-7/modules/intel/2018.1.163/hpcx/2.1.0/ompi
4. set the parameters sharp_enabled 2 and virt_enabled 2 on the opensm configuration file under /etc/opensm/opensm.conf (or any other location)
sharp_enabled 2 virt_enabled 2
5. Start the OpenSM using this file (you can add other flags as well, e.g. priority).
sudo opensm -F /etc/opensm/opensm.conf -p 3 ------------------------------------------------- OpenSM 5.0.0.MLNX20180219.c610c42 Config file is `/etc/opensm/opensm.conf`: Reading Cached Option File: /etc/opensm/opensm.conf Loading Cached Option:sharp_enabled = 2 Loading Cached Option:virt_enabled = 2 Command Line Arguments: Priority = 3 Log File: /var/log/opensm.log ------------------------------------------------- OpenSM 5.0.0.MLNX20180219.c610c42 Using default GUID 0x248a070300964da6 Entering DISCOVERING state Entering MASTER state
6. Make sure that this is the sm running on the cluster, and there is no other sm.
$ sudo sminfo sminfo: sm lid 11 sm guid 0x248a070300964da6, activity count 5303 priority 14 state 3 SMINFO_MASTER
7. Make sure that the activation nodes were activated by OpenSM
$ sudo ibnetdiscover | grep "Agg" [37] "H-7cfe900300a5a2c8"[1](7cfe900300a5a2c8) # "Mellanox Technologies Aggregation Node" lid 73 4xEDR [37] "H-ec0d9a03001c7068"[1](ec0d9a03001c7068) # "Mellanox Technologies Aggregation Node" lid 70 4xEDR Ca 1 "H-7cfe900300a5a2c8" # "Mellanox Technologies Aggregation Node" Ca 1 "H-ec0d9a03001c7068" # "Mellanox Technologies Aggregation Node"
Note: Using OpenSM v4.9 or later doesn't require any special configuration in the Aggregation manager for fat-tree topologies, for other topologies or older OpenSM refer to the deployment guide.
Enable SHARP Deamons
1. Setup SHARP Aggregation Manager (sharp_am) on the opensm node
sudo $HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh -s -d sharp_am Copying /global/software/centos-7/modules/intel/2018.1.163/hpcx/2.1.0/sharp/systemd/system/sharp_am.service to /etc/systemd/system/sharp_am.service Service sharp_am is installed
2. Start SHARP AM service
$ sudo service sharp_am start Redirecting to /bin/systemctl start sharp_am.service
3. Setup sharpd on all cluster nodes (using pdsh or any other method)
$ sudo $HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh -s -d sharpd Copying /global/software/centos-7/modules/intel/2018.1.163/hpcx/2.1.0/sharp/systemd/system/sharpd.service to /etc/systemd/system/sharpd.service Service sharpd is installed
4. Start the SHARP daemon on all cluster nodes
$ sudo service sharpd start Redirecting to /bin/systemctl start sharpd.service
0 Comments