...
Singularity is the only container engine present at the moment. Docker or enroot workflows need to be adapted to run (as user) on Thea.
Example 1: Run interactively pre-staged Singularity containers
(1) Allocate an interactive node
...
NOTE - Accessing a SIF container is usually fast enough also when the file is locate on the lustre filesystem. Copying it on /local
will improve the bootstrap time marginally.
Example 2: Run interactively pre-staged Singularity containers
Code Block |
---|
export CONT="/global/scratch/groups/gh/sif_images/pytorch-23.12-py3.sif" srun --mpi=pmi2 -N 1 -n 1 --ntasks-per-node=1 -p gh -t 4:00:00 \ singularity -v run --nv "${CONT}" python my_benchmark_script.sh |
NOTE - The current path where srun and singularity are executed is automatically exposed inside the container.
Example 3: How to squash and run a NGC container into a new read-only Singularity image
TIP - Building a container is a very intense I/O operation, it is better to leverage /local
when possible but remember to copy your sif image or sandbox folder back to ${SCRATCH}
before the job is completed otherwise all files are lost.
1. Allocate an interactive node
Code Block |
---|
salloc -n 1 -N 1 -p gh -t 1:00:00 |
2. Set additional env variables
Make sure singularity pull operates entirely from /local
for performance reasons and capacity constrains
Code Block |
---|
mkdir /local/tmp_singularity mkdir /local/tmp_singularity_cache export APPTAINER_TMPDIR=/local/tmp_singularity export APPTAINER_CACHEDIR=/local/tmp_singularity_cache |
3. Pull locally singularity image
Code Block |
---|
singularity pull pytorch-23.12-py3.sif docker://nvcr.io/nvidia/pytorch:23.12-py3 |
Example 4: How to create a Singularity Sandbox and run / repackage a new container image
1. Grab one node in interactive mode
Code Block |
---|
salloc -n 1 -N 1 -p gh -t 2:00:00 |
2. Identify which container to extend via a sandbox and prep the environment
Code Block |
---|
export CONT_DIR=/global/scratch/groups/gh/sif_images export CONT_NAME="pytorch-23.12-py3.sif" mkdir /local/$SLURM_JOBID export APPTAINER_TMPDIR=/local/$SLURM_JOBID/_tmp_singularity export APPTAINER_CACHEDIR=/local/$SLURM_JOBID/_cache_singularity rm -rf ${APPTAINER_TMPDIR} && mkdir -p ${APPTAINER_TMPDIR} rm -rf ${APPTAINER_CACHEDIR} && mkdir -p ${APPTAINER_CACHEDIR} |
3. Make a copy of base container as reading and verifying it is faster on local disk
Code Block |
---|
cp ${CONT_DIR}/${CONT_NAME} /local/$SLURM_JOBID/ |
4. Create a Singularity definition file
Start with the original NGC container as base image and add extra packages in the %post
phase
...
After this there are two options:
5A. Create the sandbox on the persistent storage
TIP - Use this method if you want to customise your image by bulding manually software or debugging a failing pip
command.
...
Code Block |
---|
singularity run --nv custom-python-sandbox -bash-command /bin/bash |
5B. Create a new SIF image
TIP - Use this method if you want to create a read-only image to run workloads and you are confident all %post
steps can run successfully without manual intervention.
...