Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Singularity is the only container engine present at the moment. Docker or enroot workflows need to be adapted to run (as user) on Thea.

Example 1: Run interactively pre-staged Singularity containers

(1) Allocate an interactive node

...

NOTE - Accessing a SIF container is usually fast enough also when the file is locate on the lustre filesystem. Copying it on /local will improve the bootstrap time marginally.

Example 2: Run interactively pre-staged Singularity containers

Code Block
export CONT="/global/scratch/groups/gh/sif_images/pytorch-23.12-py3.sif"
srun --mpi=pmi2 -N 1 -n 1 --ntasks-per-node=1 -p gh -t 4:00:00 \ 
    singularity -v run --nv "${CONT}" python my_benchmark_script.sh

NOTE - The current path where srun and singularity are executed is automatically exposed inside the container.

Example 3: How to squash and run a NGC container into a new read-only Singularity image

TIP - Building a container is a very intense I/O operation, it is better to leverage /local when possible but remember to copy your sif image or sandbox folder back to ${SCRATCH} before the job is completed otherwise all files are lost.

1. Allocate an interactive node

Code Block
salloc -n 1 -N 1 -p gh -t 1:00:00

2. Set additional env variables

Make sure singularity pull operates entirely from /local for performance reasons and capacity constrains

Code Block
mkdir /local/tmp_singularity
mkdir /local/tmp_singularity_cache
export APPTAINER_TMPDIR=/local/tmp_singularity
export APPTAINER_CACHEDIR=/local/tmp_singularity_cache

3. Pull locally singularity image

Code Block
singularity  pull pytorch-23.12-py3.sif docker://nvcr.io/nvidia/pytorch:23.12-py3 

Example 4: How to create a Singularity Sandbox and run / repackage a new container image

1. Grab one node in interactive mode

Code Block
salloc -n 1 -N 1 -p gh -t 2:00:00

2. Identify which container to extend via a sandbox and prep the environment

Code Block
export CONT_DIR=/global/scratch/groups/gh/sif_images
export CONT_NAME="pytorch-23.12-py3.sif"
mkdir /local/$SLURM_JOBID
export APPTAINER_TMPDIR=/local/$SLURM_JOBID/_tmp_singularity
export APPTAINER_CACHEDIR=/local/$SLURM_JOBID/_cache_singularity
rm -rf ${APPTAINER_TMPDIR} && mkdir -p ${APPTAINER_TMPDIR}
rm -rf ${APPTAINER_CACHEDIR} && mkdir -p ${APPTAINER_CACHEDIR}

3. Make a copy of base container as reading and verifying it is faster on local disk

Code Block
cp ${CONT_DIR}/${CONT_NAME} /local/$SLURM_JOBID/

4. Create a Singularity definition file

Start with the original NGC container as base image and add extra packages in the %post phase

...

After this there are two options:

5A. Create the sandbox on the persistent storage

TIP - Use this method if you want to customise your image by bulding manually software or debugging a failing pip command.

...

Code Block
singularity run --nv custom-python-sandbox -bash-command /bin/bash

5B. Create a new SIF image

TIP - Use this method if you want to create a read-only image to run workloads and you are confident all %post steps can run successfully without manual intervention.

...