Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Cluster access request

Customer Request Access Form: https://forms.gle/uhDUGg6uemdER7wv8

  • Used to qualify external Customers to access the system for benchmarking and evaluation purposes aiming toward a procurement.

 

Developer Request Access Form: https://forms.gle/gZivCptmPY1p9xEw6 

...

To request access to Thea clusters fill this form

Once you have username and can access the login nodes, follow this example here: Getting Started with HPC-AI AC Clusters to allocate GH nodes.

...

Code Block
$ sinfo -p gh
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
gh         up   infinite      8   idle  gh[001-008]

...

Running Jobs

Slurm is the system job scheduler. Each job has a maximum walltime of 12 hours and nodes are allocated by default in exclusive mode (one user allocating always a full node, no sharing). GPU is always visible once a job allocated a node, no need to use any gres options.

Please avoid allocating nodes interactively if possible or set the time limit short because we are sharing the resources with multiple users.

Code Block
$ salloc -N 2 -p gh --time=1:00:00 

To

Allocating Examples

How to allocate one GH200 node:

Code Block
salloc -n 72 -N 1 -p gh -t 1:00:00

How to allocate two GH200 nods:

Code Block
salloc -n 144 -N 2 -p gh -t 1:00:00

How to allocate one specific GH200 node:

Code Block
salloc -n 72 -N 1 -p gh -w gh004 -t 1:00:00

How to allocate two specific GH200 nodes:

Code Block
salloc -n 144 -N 2 -p gh -w gh002,gh004 -t 1:00:00

How to allocate 4 GH200 nodes but force to exclude a specific one (gh001):

Code Block
salloc -n 288 -N 4 -p gh -x gh001 -t 1:00:00

How to allocate one Grace-only node:

Code Block
salloc -n 144 -N 1 -p gg -t 1:00:00

How to allocate four Grace-only nodes:

Code Block
salloc -n 576 -N 4 -p gg -t 1:00:00

How to submit a batch job

Code Block
$ sbatch -N 4 -p gh --time=1:00:00 <slurm script>

Batch job

Example of job batch script running on 2 GH200 nodes and 2 task per node via mpirun

Code Block
#!/bin/bash -l
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=36
#SBATCH --nodes=2
#SBATCH --partition=gh
#SBATCH --time=1:00:00
#SBATCH --exclusive

. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge
module load openmpi/4.1.6-gcc-12.3.0-wftkmyd

mpirun -np 4 --map-by ppr:2:node:PE=36 \
   --report-bindings uname -a

Example of job batch script running on 2 GH200 nodes and 2 task per node via srun

Code Block
#!/bin/bash -l
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=36
#SBATCH --nodes=2
#SBATCH --partition=gh
#SBATCH --time=1:00:00
#SBATCH --exclusive

. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge

srun --mpi=pmi2 uname -a

Example of job batch script running on 2 Grace-only nodes and full MPI-only via mpirun

Code Block
#!/bin/bash -l
#SBATCH --ntasks=288
#SBATCH --ntasks-per-node=144
#SBATCH --cpus-per-task=1
#SBATCH --nodes=2
#SBATCH --partition=gg
#SBATCH --time=1:00:00
#SBATCH --exclusive

. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge
module load openmpi/4.1.6-gcc-12.3.0-wftkmyd

mpirun -np 288 --map-by ppr:144:node:PE=1 \
   --report-bindings uname -a

Example of job batch script running on 4 Grace-only nodes and MPI+OpenMP combination via mpirun

Code Block
#!/bin/bash -l
#SBATCH --ntasks=64
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=9
#SBATCH --nodes=4
#SBATCH --partition=gg
#SBATCH --time=1:00:00
#SBATCH --exclusive

. /global/scratch/groups/gh/bootstrap-gh-env.sh
module purge
module load openmpi/4.1.6-gcc-12.3.0-wftkmyd

export OMP_NUM_THREADS=9
mpirun -np 64 --map-by ppr:16:node:PE=9 \
   --report-bindings uname -a

Working with Singularity containers

Singularity is the only container engine present at the moment. Docker or enroot workflows need to be adapted to run (as user) on Thea.

Example 1: Run interactively pre-staged Singularity containers

(1) Allocate an interactive node

Code Block
salloc -n 1 -N 1 -p gh -t 1:00:00

(2) Select container and invoke singularity run

Code Block
export CONT="/global/scratch/groups/gh/sif_images/pytorch-23.12-py3.sif"
singularity run --nv "${CONT}"

NOTE - Accessing a SIF container is usually fast enough also when the file is locate on the lustre filesystem. Copying it on /local will improve the bootstrap time marginally.

Example 2: Run interactively pre-staged Singularity containers

Code Block
export CONT="/global/scratch/groups/gh/sif_images/pytorch-23.12-py3.sif"
srun --mpi=pmi2 -N 1 -n 1 --ntasks-per-node=1 -p gh -t 4:00:00 \ 
    singularity -v run --nv "${CONT}" python my_benchmark_script.sh

NOTE - The current path where srun and singularity are executed is automatically exposed inside the container.

Example 3: How to squash and run a NGC container into a new read-only Singularity image

TIP - Building a container is a very intense I/O operation, it is better to leverage /local when possible but remember to copy your sif image or sandbox folder back to ${SCRATCH} before the job is completed otherwise all files are lost.

1. Allocate an interactive node

Code Block
salloc -n 1 -N 1 -p gh -t 1:00:00

2. Set additional env variables

Make sure singularity pull operates entirely from /local for performance reasons and capacity constrains

Code Block
mkdir /local/tmp_singularity
mkdir /local/tmp_singularity_cache
export APPTAINER_TMPDIR=/local/tmp_singularity
export APPTAINER_CACHEDIR=/local/tmp_singularity_cache

3. Pull locally singularity image

Code Block
singularity  pull pytorch-23.12-py3.sif docker://nvcr.io/nvidia/pytorch:23.12-py3 

Example 4: How to create a Singularity Sandbox and run / repackage a new container image

1. Grab one node in interactive mode

Code Block
salloc -n 1 -N 1 -p gh -t 2:00:00

2. Identify which container to extend via a sandbox and prep the environment

Code Block
export CONT_DIR=/global/scratch/groups/gh/sif_images
export CONT_NAME="pytorch-23.12-py3.sif"
mkdir /local/$SLURM_JOBID
export APPTAINER_TMPDIR=/local/$SLURM_JOBID/_tmp_singularity
export APPTAINER_CACHEDIR=/local/$SLURM_JOBID/_cache_singularity
rm -rf ${APPTAINER_TMPDIR} && mkdir -p ${APPTAINER_TMPDIR}
rm -rf ${APPTAINER_CACHEDIR} && mkdir -p ${APPTAINER_CACHEDIR}

3. Make a copy of base container as reading and verifying it is faster on local disk

Code Block
cp ${CONT_DIR}/${CONT_NAME} /local/$SLURM_JOBID/

4. Create a Singularity definition file

Start with the original NGC container as base image and add extra packages in the %post phase

Code Block
cat > custom-pytorch.def << EOF
Bootstrap: localimage
From: /local/${SLURM_JOBID}/${CONT_NAME}
 
%post
    apt-get update
    apt-get -y install python3-venv
    pip install --upgrade pip
    pip install transformers accelerate huggingface_hub
EOF

After this there are two options:

5A. Create the sandbox on the persistent storage

TIP - Use this method if you want to customise your image by bulding manually software or debugging a failing pip command.

Code Block
cd /global/scratch/users/$USER
singularity build --sandbox custom-python-sandbox custom-pytorch.def

When completed, run on an interactive node via

Code Block
singularity run --nv custom-python-sandbox -bash-command /bin/bash

5B. Create a new SIF image

TIP - Use this method if you want to create a read-only image to run workloads and you are confident all %post steps can run successfully without manual intervention.

Code Block
cd /global/scratch/users/$USER
singularity build custom-python.sif custom-python.def

When completed, run on an interactive node via

Code Block
singularity run --nv custom-python.sif

Storage

When you login you are in the $HOME. There is also extra scratch space.

...