Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This post will help you get started with clusters on HPCAC-AI cluster center. We are using thor helios cluster in this document.

...

Code Block
$ ssh <userid>@gw.hpcadvisorycouncil.com

To check available thor helios nodes using slurm commands.

Code Block
$ sinfo -p thorhelios
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
thorhelios         up   infinite      4  alloc thorhelios[011-012],thorbf2aheliosbf2a[011-012]
thorhelios         up   infinite     76   idle thorhelios[001-010,013-032],thorbf2aheliosbf2a[001-010012,013-032],thorbf3a[001-016]
$squeue -p thorhelios
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
494553   thor    interact   ...    R       1:07  
   4 thor[001-002],thorbf3a[001-002]
494552   thorhelios    interact   ...    R      25:51      4 thorhelios[011-012],thorbf2aheliosbf2a[011-012]

To allocate nodes interactively

...

Code Block
# CPU nodes only 
$ salloc -N 2 -p thorhelios --time=1:00:00 -w thor001helios001,thor002helios002
# CPU and BlueField nodes
$ salloc -N 4 -p thorhelios --time=1:00:00 -w thor00helios00[1-2],thorbf3a00heliosbf2a00[1-2]

To submit a batch job

Code Block
# CPU nodes only
$ sbatch -N 4 -p thorhelios --time=1:00:00 -w thor00helios00[1-4] <slurm script>
# CPU and BlueField nodes
$ sbatch -N 4 -p thorhelios --time=1:00:00 -w thor00helios00[1-2],thorbf2a00heliosbf2a00[1-2] <slurm script>

Note: Thor helios cluster has NVIDIA BlueField-2 & BlueField-3 cards with ARM processors on it. Those adapters can also be seen in slurm as “nodes” marked with thorbf2a0heliosbf2a0[01-32] & thorbf3a0[01-16], while the hosts are named thorhelios[001-032].

Storage

Code Block
When you login you are in the $HOME. There is also extra scratch space.
nfs home -> /global/home/users/$USER/
Lustre -> /global/scratch/users/$USER/
Please run jobs from the scratch partition. It is a Lustre filesystem and it is mounted over IB on every compute node.

...