This post will help you get started with clusters on HPCAC-AI cluster center. We are using thor helios cluster in this document.
...
Code Block |
---|
$ ssh <userid>@gw.hpcadvisorycouncil.com |
To check available thor helios nodes using slurm commands.
Code Block |
---|
$ sinfo -p thorhelios PARTITION AVAIL TIMELIMIT NODES STATE NODELIST thorhelios up infinite 4 alloc thorhelios[011-012],thorbf2aheliosbf2a[011-012] thorhelios up infinite 76 idle thorhelios[001-010,013-032],thorbf2aheliosbf2a[001-010012,013-032],thorbf3a[001-016] $squeue -p thorhelios JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 494553 thor interact ... R 1:07 4 thor[001-002],thorbf3a[001-002] 494552 thorhelios interact ... R 25:51 4 thorhelios[011-012],thorbf2aheliosbf2a[011-012] |
To allocate nodes interactively
...
Code Block |
---|
# CPU nodes only $ salloc -N 2 -p thorhelios --time=1:00:00 -w thor001helios001,thor002helios002 # CPU and BlueField nodes $ salloc -N 4 -p thorhelios --time=1:00:00 -w thor00helios00[1-2],thorbf3a00heliosbf2a00[1-2] |
To submit a batch job
Code Block |
---|
# CPU nodes only $ sbatch -N 4 -p thorhelios --time=1:00:00 -w thor00helios00[1-4] <slurm script> # CPU and BlueField nodes $ sbatch -N 4 -p thorhelios --time=1:00:00 -w thor00helios00[1-2],thorbf2a00heliosbf2a00[1-2] <slurm script> |
Note: Thor helios cluster has NVIDIA BlueField-2 & BlueField-3 cards with ARM processors on it. Those adapters can also be seen in slurm as “nodes” marked with thorbf2a0heliosbf2a0[01-32] & thorbf3a0[01-16], while the hosts are named thorhelios[001-032].
Storage
Code Block |
---|
When you login you are in the $HOME. There is also extra scratch space. nfs home -> /global/home/users/$USER/ Lustre -> /global/scratch/users/$USER/ Please run jobs from the scratch partition. It is a Lustre filesystem and it is mounted over IB on every compute node. |
...