Getting Started with HPC-AI AC Clusters
This post will help you get started with clusters on HPCAC-AI cluster center. We are using helios cluster in this document.
Once you got your username, login to the clusters:
$ ssh <userid>@gw.hpcadvisorycouncil.com
To check available helios nodes using slurm commands.
$ sinfo -p helios
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
helios up infinite 4 alloc helios[011-012],heliosbf2a[011-012]
helios up infinite 76 idle helios[001-010,013-032],heliosbf2a[001-012,013-016]
$squeue -p helios
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
494552 helios interact ... R 25:51 4 helios[011-012],heliosbf2a[011-012]
To allocate nodes interactively
Please avoid allocating nodes interactively if possible or set the time limit short because we are sharing the resources with multiple users.
# CPU nodes only
$ salloc -N 2 -p helios --time=1:00:00 -w helios001,helios002
# CPU and BlueField nodes
$ salloc -N 4 -p helios --time=1:00:00 -w helios00[1-2],heliosbf2a00[1-2]
To submit a batch job
Note: helios cluster has NVIDIA BlueField-2 cards with ARM processors on it. Those adapters can also be seen in slurm as “nodes” marked with heliosbf2a0[01-16], while the hosts are named helios[001-032].
Storage
Basic environment
Based on the order of module load/unload, extra module are present to the user. Remember to load compilers first.
Loading HPC-X with Intel compiler 2022.
Loading HPC-X with GNU compiler
Running OSU latency using HPC-X