...
Code Block |
---|
#!/bin/bash -l #SBATCH --ntasks=64 #SBATCH --ntasks-per-node=16 #SBATCH --cpus-per-task=9 #SBATCH --nodes=4 #SBATCH --partition=gg #SBATCH --time=1:00:00 #SBATCH --exclusive . /global/scratch/groups/gh/bootstrap-gh-env.sh module purge module load openmpi/4.1.6-gcc-12.3.0-wftkmyd export OMP_NUM_THREADS=9 mpirun -np 64 --map-by ppr:16:node:PE=9 \ --report-bindings uname -a |
Working with Singularity containers
Singularity is the only container engine present at the moment. Docker or enroot workflows need to be adapted to run (as user) on Thea.
Run interactively pre-staged Singularity containers
(1) Allocate an interactive node
Code Block |
---|
salloc -n 1 -N 1 -p gh -t 1:00:00 |
(2) Select container and invoke singularity run
Code Block |
---|
export CONT="/global/scratch/groups/gh/sif_images/pytorch-23.12-py3.sif"
singularity run --nv "${CONT}" |
NOTE - Accessing a SIF container is usually fast enough also when the file is locate on the lustre filesystem. Copying it on /local
will improve the bootstrap time marginally.
Run interactively pre-staged Singularity containers
Code Block |
---|
export CONT="/global/scratch/groups/gh/sif_images/pytorch-23.12-py3.sif"
srun --mpi=pmi2 -N 1 -n 1 --ntasks-per-node=1 -p gh -t 4:00:00 \
singularity -v run --nv "${CONT}" python my_benchmark_script.sh |
NOTE - The current path where srun and singularity are executed is automatically exposed inside the container.
How to squash and run a NGC container into a new read-only Singularity image
TIP - Building a container is a very intense I/O operation, it is better to leverage /local
when possible but remember to copy your sif image or sandbox folder back to ${SCRATCH}
before the job is completed otherwise all files are lost.
1. Allocate an interactive node
Code Block |
---|
salloc -n 1 -N 1 -p gh -t 1:00:00 |
2. Set additional env variables
Make sure singularity pull operates entirely from /local
for performance reasons and capacity constrains
Code Block |
---|
mkdir /local/tmp_singularity
mkdir /local/tmp_singularity_cache
export APPTAINER_TMPDIR=/local/tmp_singularity
export APPTAINER_CACHEDIR=/local/tmp_singularity_cache |
3. Pull locally singularity image
Code Block |
---|
singularity pull pytorch-23.12-py3.sif docker://nvcr.io/nvidia/pytorch:23.12-py3 |
How to create a Singularity Sandbox and run / repackage a new container image
1. Grab one node in interactive mode
Code Block |
---|
salloc -n 1 -N 1 -p gh -t 2:00:00 |
2. Identify which container to extend via a sandbox and prep the environment
Code Block |
---|
export CONT_DIR=/global/scratch/groups/gh/sif_images
export CONT_NAME="pytorch-23.12-py3.sif"
mkdir /local/$SLURM_JOBID
export APPTAINER_TMPDIR=/local/$SLURM_JOBID/_tmp_singularity
export APPTAINER_CACHEDIR=/local/$SLURM_JOBID/_cache_singularity
rm -rf ${APPTAINER_TMPDIR} && mkdir -p ${APPTAINER_TMPDIR}
rm -rf ${APPTAINER_CACHEDIR} && mkdir -p ${APPTAINER_CACHEDIR} |
3. Make a copy of base container as reading and verifying it is faster on local disk
Code Block |
---|
cp ${CONT_DIR}/${CONT_NAME} /local/$SLURM_JOBID/ |
4. Create a Singularity definition file
Start with the original NGC container as base image and add extra packages in the %post
phase
Code Block |
---|
cat > custom-pytorch.def << EOF
Bootstrap: localimage
From: /local/${SLURM_JOBID}/${CONT_NAME}
%post
apt-get update
apt-get -y install python3-venv
pip install --upgrade pip
pip install transformers accelerate huggingface_hub
EOF |
After this there are two options:
5A. Create the sandbox on the persistent storage
TIP - Use this method if you want to customise your image by bulding manually software or debugging a failing pip
command.
Code Block |
---|
cd /global/scratch/users/$USER
singularity build --sandbox custom-python-sandbox custom-pytorch.def |
When completed, run on an interactive node via
Code Block |
---|
singularity run --nv custom-python-sandbox -bash-command /bin/bash |
5B. Create a new SIF image
TIP - Use this method if you want to create a read-only image to run workloads and you are confident all %post
steps can run successfully without manual intervention.
Code Block |
---|
cd /global/scratch/users/$USER
singularity build custom-python.sif custom-python.def |
When completed, run on an interactive node via
Code Block |
---|
singularity run --nv custom-python.sif |
Storage
When you login you are in the $HOME. There is also extra scratch space.
...