Getting Started with Thea Clusters
Â
Cluster access request
Â
To request access to Thea clusters fill this form
Once you have username and can access the login nodes, follow this example here: Getting Started with HPC-AI AC Clusters to allocate GH nodes.
Connect to the lab
Once you got your username, login to the clusters:
$ ssh <userid>@gw.hpcadvisorycouncil.com
Â
To check available GH nodes using slurm commands.
$ sinfo -p gh
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
gh up infinite 8 idle gh[001-008]
Running Jobs
Slurm is the system job scheduler. Each job has a maximum walltime of 12 hours and nodes are allocated by default in exclusive mode (one user allocating always a full node, no sharing). GPU is always visible once a job allocated a node, no need to use any gres
options.
Please avoid allocating nodes interactively if possible or set the time limit short because we are sharing the resources with multiple users.
Allocating Examples
Â
How to allocate one GH200 node:
salloc -n 72 -N 1 -p gh -t 1:00:00
How to allocate two GH200 nods:
salloc -n 144 -N 2 -p gh -t 1:00:00
How to allocate one specific GH200 node:
salloc -n 72 -N 1 -p gh -w gh004 -t 1:00:00
How to allocate two specific GH200 nodes:
salloc -n 144 -N 2 -p gh -w gh002,gh004 -t 1:00:00
How to allocate 4 GH200 nodes but force to exclude a specific one (gh001):
salloc -n 288 -N 4 -p gh -x gh001 -t 1:00:00
How to allocate one Grace-only node:
salloc -n 144 -N 1 -p gg -t 1:00:00
How to allocate four Grace-only nodes:
salloc -n 576 -N 4 -p gg -t 1:00:00
How to submit a batch job
$ sbatch -N 4 -p gh --time=1:00:00 <slurm script>
Â
Batch job
Example of job batch script running on 2 GH200 nodes and 2 task per node via mpirun
#!/bin/bash -l
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=36
#SBATCH --nodes=2
#SBATCH --partition=gh
#SBATCH --time=1:00:00
#SBATCH --exclusive
. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge
module load openmpi/4.1.6-gcc-12.3.0-wftkmyd
mpirun -np 4 --map-by ppr:2:node:PE=36 \
--report-bindings uname -a
Example of job batch script running on 2 GH200 nodes and 2 task per node via srun
#!/bin/bash -l
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=36
#SBATCH --nodes=2
#SBATCH --partition=gh
#SBATCH --time=1:00:00
#SBATCH --exclusive
. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge
srun --mpi=pmi2 uname -a
Example of job batch script running on 2 Grace-only nodes and full MPI-only via mpirun
#!/bin/bash -l
#SBATCH --ntasks=288
#SBATCH --ntasks-per-node=144
#SBATCH --cpus-per-task=1
#SBATCH --nodes=2
#SBATCH --partition=gg
#SBATCH --time=1:00:00
#SBATCH --exclusive
. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge
module load openmpi/4.1.6-gcc-12.3.0-wftkmyd
mpirun -np 288 --map-by ppr:144:node:PE=1 \
--report-bindings uname -a
Example of job batch script running on 4 Grace-only nodes and MPI+OpenMP combination via mpirun
#!/bin/bash -l
#SBATCH --ntasks=64
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=9
#SBATCH --nodes=4
#SBATCH --partition=gg
#SBATCH --time=1:00:00
#SBATCH --exclusive
. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge
module load openmpi/4.1.6-gcc-12.3.0-wftkmyd
export OMP_NUM_THREADS=9
mpirun -np 64 --map-by ppr:16:node:PE=9 \
--report-bindings uname -a
Â
Working with Singularity containers
Singularity is the only container engine present at the moment. Docker or enroot workflows need to be adapted to run (as user) on Thea.
Example 1: Run interactively pre-staged Singularity containers
(1) Allocate an interactive node
salloc -n 1 -N 1 -p gh -t 1:00:00
(2) Select container and invoke singularity run
export CONT="/global/scratch/groups/gh/sif_images/pytorch-23.12-py3.sif"
singularity run --nv "${CONT}"
NOTE - Accessing a SIF container is usually fast enough also when the file is locate on the lustre filesystem. Copying it on /local
will improve the bootstrap time marginally.
Example 2: Run interactively pre-staged Singularity containers
export CONT="/global/scratch/groups/gh/sif_images/pytorch-23.12-py3.sif"
srun --mpi=pmi2 -N 1 -n 1 --ntasks-per-node=1 -p gh -t 4:00:00 \
singularity -v run --nv "${CONT}" python my_benchmark_script.sh
NOTE - The current path where srun and singularity are executed is automatically exposed inside the container.
Â
Example 3: How to squash and run a NGC container into a new read-only Singularity image
TIP - Building a container is a very intense I/O operation, it is better to leverage /local
when possible but remember to copy your sif image or sandbox folder back to ${SCRATCH}
before the job is completed otherwise all files are lost.
1. Allocate an interactive node
salloc -n 1 -N 1 -p gh -t 1:00:00
2. Set additional env variables
Make sure singularity pull operates entirely from /local
for performance reasons and capacity constrains
mkdir /local/tmp_singularity
mkdir /local/tmp_singularity_cache
export APPTAINER_TMPDIR=/local/tmp_singularity
export APPTAINER_CACHEDIR=/local/tmp_singularity_cache
3. Pull locally singularity image
singularity pull pytorch-23.12-py3.sif docker://nvcr.io/nvidia/pytorch:23.12-py3
Â
Example 4: How to create a Singularity Sandbox and run / repackage a new container image
1. Grab one node in interactive mode
salloc -n 1 -N 1 -p gh -t 2:00:00
2. Identify which container to extend via a sandbox and prep the environment
export CONT_DIR=/global/scratch/groups/gh/sif_images
export CONT_NAME="pytorch-23.12-py3.sif"
mkdir /local/$SLURM_JOBID
export APPTAINER_TMPDIR=/local/$SLURM_JOBID/_tmp_singularity
export APPTAINER_CACHEDIR=/local/$SLURM_JOBID/_cache_singularity
rm -rf ${APPTAINER_TMPDIR} && mkdir -p ${APPTAINER_TMPDIR}
rm -rf ${APPTAINER_CACHEDIR} && mkdir -p ${APPTAINER_CACHEDIR}
3. Make a copy of base container as reading and verifying it is faster on local disk
cp ${CONT_DIR}/${CONT_NAME} /local/$SLURM_JOBID/
4. Create a Singularity definition file
Start with the original NGC container as base image and add extra packages in the %post
phase
cat > custom-pytorch.def << EOF
Bootstrap: localimage
From: /local/${SLURM_JOBID}/${CONT_NAME}
%post
apt-get update
apt-get -y install python3-venv
pip install --upgrade pip
pip install transformers accelerate huggingface_hub
EOF
After this there are two options:
5A. Create the sandbox on the persistent storage
TIP - Use this method if you want to customise your image by bulding manually software or debugging a failing pip
command.
cd /global/scratch/users/$USER
singularity build --sandbox custom-python-sandbox custom-pytorch.def
When completed, run on an interactive node via
singularity run --nv custom-python-sandbox -bash-command /bin/bash
5B. Create a new SIF image
TIP - Use this method if you want to create a read-only image to run workloads and you are confident all %post
steps can run successfully without manual intervention.
cd /global/scratch/users/$USER
singularity build custom-python.sif custom-python.def
When completed, run on an interactive node via
singularity run --nv custom-python.sif
Â
Storage
When you login you are in the $HOME. There is also extra scratch space.
Please run jobs from the scratch partition. It is a Lustre filesystem and it is mounted over InfiniBand.
nfs home -> /global/home/users/$USER/
Lustre -> /global/scratch/users/$USER/
Â