Cluster access request
To request access to Thea clusters fill this form
Once you have username and can access the login nodes, follow this example here: Getting Started with HPC-AI AC Clusters to allocate GH nodes.
Connect to the lab
Once you got your username, login to the clusters:
$ ssh <userid>@gw.hpcadvisorycouncil.com
To check available GH nodes using slurm commands.
$ sinfo -p gh PARTITION AVAIL TIMELIMIT NODES STATE NODELIST gh up infinite 8 idle gh[001-008]
Running Jobs
Slurm is the system job scheduler. Each job has a maximum walltime of 12 hours and nodes are allocated by default in exclusive mode (one user allocating always a full node, no sharing). GPU is always visible once a job allocated a node, no need to use any gres
options.
Please avoid allocating nodes interactively if possible or set the time limit short because we are sharing the resources with multiple users.
Allocating Examples
How to allocate one GH200 node:
salloc -n 72 -N 1 -p gh -t 1:00:00
How to allocate two GH200 nods:
salloc -n 144 -N 2 -p gh -t 1:00:00
How to allocate one specific GH200 node:
salloc -n 72 -N 1 -p gh -w gh004 -t 1:00:00
How to allocate two specific GH200 nodes:
salloc -n 144 -N 2 -p gh -w gh002,gh004 -t 1:00:00
How to allocate 4 GH200 nodes but force to exclude a specific one (gh001):
salloc -n 288 -N 4 -p gh -x gh001 -t 1:00:00
How to allocate one Grace-only node:
salloc -n 144 -N 1 -p gg -t 1:00:00
How to allocate four Grace-only nodes:
salloc -n 576 -N 4 -p gg -t 1:00:00
How to submit a batch job
$ sbatch -N 4 -p gh --time=1:00:00 <slurm script>
Batch job
Example of job batch script running on 2 GH200 nodes and 2 task per node via mpirun
#!/bin/bash -l #SBATCH --ntasks=4 #SBATCH --ntasks-per-node=2 #SBATCH --cpus-per-task=36 #SBATCH --nodes=2 #SBATCH --partition=gh #SBATCH --time=1:00:00 #SBATCH --exclusive . /global/scratch/groups/gh/bootstrap-gh-env.sh module purge module load openmpi/4.1.6-gcc-12.3.0-wftkmyd mpirun -np 4 --map-by ppr:2:node:PE=36 \ --report-bindings uname -a
Example of job batch script running on 2 GH200 nodes and 2 task per node via srun
#!/bin/bash -l #SBATCH --ntasks=4 #SBATCH --ntasks-per-node=2 #SBATCH --cpus-per-task=36 #SBATCH --nodes=2 #SBATCH --partition=gh #SBATCH --time=1:00:00 #SBATCH --exclusive . /global/scratch/groups/gh/bootstrap-gh-env.sh module purge srun --mpi=pmi2 uname -a
Example of job batch script running on 2 Grace-only nodes and full MPI-only via mpirun
#!/bin/bash -l #SBATCH --ntasks=288 #SBATCH --ntasks-per-node=144 #SBATCH --cpus-per-task=1 #SBATCH --nodes=2 #SBATCH --partition=gg #SBATCH --time=1:00:00 #SBATCH --exclusive . /global/scratch/groups/gh/bootstrap-gh-env.sh module purge module load openmpi/4.1.6-gcc-12.3.0-wftkmyd mpirun -np 288 --map-by ppr:144:node:PE=1 \ --report-bindings uname -a
Example of job batch script running on 4 Grace-only nodes and MPI+OpenMP combination via mpirun
#!/bin/bash -l #SBATCH --ntasks=64 #SBATCH --ntasks-per-node=16 #SBATCH --cpus-per-task=9 #SBATCH --nodes=4 #SBATCH --partition=gg #SBATCH --time=1:00:00 #SBATCH --exclusive . /global/scratch/groups/gh/bootstrap-gh-env.sh module purge module load openmpi/4.1.6-gcc-12.3.0-wftkmyd export OMP_NUM_THREADS=9 mpirun -np 64 --map-by ppr:16:node:PE=9 \ --report-bindings uname -a
Working with Singularity containers
Singularity is the only container engine present at the moment. Docker or enroot workflows need to be adapted to run (as user) on Thea.
Example 1: Run interactively pre-staged Singularity containers
(1) Allocate an interactive node
salloc -n 1 -N 1 -p gh -t 1:00:00
(2) Select container and invoke singularity run
export CONT="/global/scratch/groups/gh/sif_images/pytorch-23.12-py3.sif" singularity run --nv "${CONT}"
NOTE - Accessing a SIF container is usually fast enough also when the file is locate on the lustre filesystem. Copying it on /local
will improve the bootstrap time marginally.
Example 2: Run interactively pre-staged Singularity containers
export CONT="/global/scratch/groups/gh/sif_images/pytorch-23.12-py3.sif" srun --mpi=pmi2 -N 1 -n 1 --ntasks-per-node=1 -p gh -t 4:00:00 \ singularity -v run --nv "${CONT}" python my_benchmark_script.sh
NOTE - The current path where srun and singularity are executed is automatically exposed inside the container.
Example 3: How to squash and run a NGC container into a new read-only Singularity image
TIP - Building a container is a very intense I/O operation, it is better to leverage /local
when possible but remember to copy your sif image or sandbox folder back to ${SCRATCH}
before the job is completed otherwise all files are lost.
1. Allocate an interactive node
salloc -n 1 -N 1 -p gh -t 1:00:00
2. Set additional env variables
Make sure singularity pull operates entirely from /local
for performance reasons and capacity constrains
mkdir /local/tmp_singularity mkdir /local/tmp_singularity_cache export APPTAINER_TMPDIR=/local/tmp_singularity export APPTAINER_CACHEDIR=/local/tmp_singularity_cache
3. Pull locally singularity image
singularity pull pytorch-23.12-py3.sif docker://nvcr.io/nvidia/pytorch:23.12-py3
Example 4: How to create a Singularity Sandbox and run / repackage a new container image
1. Grab one node in interactive mode
salloc -n 1 -N 1 -p gh -t 2:00:00
2. Identify which container to extend via a sandbox and prep the environment
export CONT_DIR=/global/scratch/groups/gh/sif_images export CONT_NAME="pytorch-23.12-py3.sif" mkdir /local/$SLURM_JOBID export APPTAINER_TMPDIR=/local/$SLURM_JOBID/_tmp_singularity export APPTAINER_CACHEDIR=/local/$SLURM_JOBID/_cache_singularity rm -rf ${APPTAINER_TMPDIR} && mkdir -p ${APPTAINER_TMPDIR} rm -rf ${APPTAINER_CACHEDIR} && mkdir -p ${APPTAINER_CACHEDIR}
3. Make a copy of base container as reading and verifying it is faster on local disk
cp ${CONT_DIR}/${CONT_NAME} /local/$SLURM_JOBID/
4. Create a Singularity definition file
Start with the original NGC container as base image and add extra packages in the %post
phase
cat > custom-pytorch.def << EOF Bootstrap: localimage From: /local/${SLURM_JOBID}/${CONT_NAME} %post apt-get update apt-get -y install python3-venv pip install --upgrade pip pip install transformers accelerate huggingface_hub EOF
After this there are two options:
5A. Create the sandbox on the persistent storage
TIP - Use this method if you want to customise your image by bulding manually software or debugging a failing pip
command.
cd /global/scratch/users/$USER singularity build --sandbox custom-python-sandbox custom-pytorch.def
When completed, run on an interactive node via
singularity run --nv custom-python-sandbox -bash-command /bin/bash
5B. Create a new SIF image
TIP - Use this method if you want to create a read-only image to run workloads and you are confident all %post
steps can run successfully without manual intervention.
cd /global/scratch/users/$USER singularity build custom-python.sif custom-python.def
When completed, run on an interactive node via
singularity run --nv custom-python.sif
Storage
When you login you are in the $HOME. There is also extra scratch space.
Please run jobs from the scratch partition. It is a Lustre filesystem and it is mounted over InfiniBand.
nfs home -> /global/home/users/$USER/ Lustre -> /global/scratch/users/$USER/
0 Comments