Cluster access request

To request access to Thea clusters fill this form

Once you have username and can access the login nodes, follow this example here: Getting Started with HPC-AI AC Clusters to allocate GH nodes.

Connect to the lab

Once you got your username, login to the clusters:

$ ssh <userid>@gw.hpcadvisorycouncil.com

To check available GH nodes using slurm commands.

$ sinfo -p gh
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
gh         up   infinite      8   idle  gh[001-008]

Allocate nodes

Slurm is the system job scheduler. Each job has a maximum walltime of 12 hours and nodes are allocated by default in exclusive mode (one user allocating always a full node, no sharing). GPU is always visible once a job allocated a node, no need to use any gres options.

Please avoid allocating nodes interactively if possible or set the time limit short because we are sharing the resources with multiple users.

Allocating Examples

How to allocate one GH200 node:

salloc -n 72 -N 1 -p gh -t 1:00:00

How to allocate two GH200 nods:

salloc -n 144 -N 2 -p gh -t 1:00:00

How to allocate one specific GH200 node:

salloc -n 72 -N 1 -p gh -w gh004 -t 1:00:00

How to allocate two specific GH200 nodes:

salloc -n 144 -N 2 -p gh -w gh002,gh004 -t 1:00:00

How to allocate 4 GH200 nodes but force to exclude a specific one (gh001):

salloc -n 288 -N 4 -p gh -x gh001 -t 1:00:00

How to allocate one Grace-only node:

salloc -n 144 -N 1 -p gg -t 1:00:00

How to allocate four Grace-only nodes:

salloc -n 576 -N 4 -p gg -t 1:00:00

How to submit a batch job

$ sbatch -N 4 -p gh --time=1:00:00 <slurm script>

Batch job

Example of job batch script running on 2 GH200 nodes and 2 task per node via mpirun

#!/bin/bash -l
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=36
#SBATCH --nodes=2
#SBATCH --partition=gh
#SBATCH --time=1:00:00
#SBATCH --exclusive

. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge
module load openmpi/4.1.6-gcc-12.3.0-wftkmyd

mpirun -np 4 --map-by ppr:2:node:PE=36 \
   --report-bindings uname -a

Example of job batch script running on 2 GH200 nodes and 2 task per node via srun

#!/bin/bash -l
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=36
#SBATCH --nodes=2
#SBATCH --partition=gh
#SBATCH --time=1:00:00
#SBATCH --exclusive

. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge

srun --mpi=pmi2 uname -a

Example of job batch script running on 2 Grace-only nodes and full MPI-only via mpirun

#!/bin/bash -l
#SBATCH --ntasks=288
#SBATCH --ntasks-per-node=144
#SBATCH --cpus-per-task=1
#SBATCH --nodes=2
#SBATCH --partition=gg
#SBATCH --time=1:00:00
#SBATCH --exclusive

. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge
module load openmpi/4.1.6-gcc-12.3.0-wftkmyd

mpirun -np 288 --map-by ppr:144:node:PE=1 \
   --report-bindings uname -a

Example of job batch script running on 4 Grace-only nodes and MPI+OpenMP combination via mpirun

#!/bin/bash -l
#SBATCH --ntasks=64
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=9
#SBATCH --nodes=4
#SBATCH --partition=gg
#SBATCH --time=1:00:00
#SBATCH --exclusive

. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge
module load openmpi/4.1.6-gcc-12.3.0-wftkmyd

export OMP_NUM_THREADS=9
mpirun -np 64 --map-by ppr:16:node:PE=9 \
   --report-bindings uname -a

Storage

When you login you are in the $HOME. There is also extra scratch space.

Please run jobs from the scratch partition. It is a Lustre filesystem and it is mounted over InfiniBand.

nfs home -> /global/home/users/$USER/
Lustre -> /global/scratch/users/$USER/

Getting Started with Thea Clusters