Basic Slurm Script Example

Here is a basic example of a slurm script that runs namd over InfiniBand EDR.


In this example, we assume that the application (e.g. namd) has already compiled properly and a module is available to be use.

Once you are logged in to one of the login nodes (head nodes) of the lab.


Make sure you have some knowledge about slurm and scripting, here are few links to help getting there:


1. Create a similar slurm script file to this one :

#!/bin/bash
#
#SBATCH --job-name=namd2
#SBATCH --output=results.txt
#
#SBATCH --time=10:00
#SBATCH --partition=thor
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=32

INPUT_FILE=/global/home/groups/allhands/applications/namd/apoa1/apoa1.namd
EXE=/global/software/centos-7/modules/apps/md/namd/2.12-hpcx-2.0.0-intel-2018.1.163/bin/namd2


MPI_FLAGS="--display-map --map-by node --display-topo --report-bindings "
UCX_FLAGS="-mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_TLS=all"
HCOLL_FLAGS="-mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx5_0:1 "


module load md/namd/2.12-hpcx-2.0.0-intel-2018.1.163

mpirun $MPI_FLAGS $UCX_FLAGS $HCOLL_FLAGS $EXE $INPUT_FILE


Check HPCAC Best Practices GIT for the latest updates.


In this example we are using the apoa1 input file for namd.


2. Run the slurm script (filename: slurm_namd2):


$ sbatch slurm_namd2
Submitted batch job 203


3. Check the status of the job using squeue:

$ squeue 
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
204 thor namd2 ophirm R 0:02 4 thor[005-008]


4. Once done, the results will be available in the results.txt file as mentioned in the script above.

$ ls
apoa1 results.txt slurm_namd2


5. Additional mpirun flags could be added to the script to utilize the relevant modules, accelerators.


References