Getting Started with HPC-AI AC Clusters

This post will help you get started with clusters on HPCAC-AI cluster center. We are using helios cluster in this document.

Once you got your username, login to the clusters:

$ ssh <userid>@gw.hpcadvisorycouncil.com

 

To check available helios nodes using slurm commands.

$ sinfo -p helios PARTITION AVAIL TIMELIMIT NODES STATE NODELIST helios up infinite 4 alloc helios[011-012],heliosbf2a[011-012] helios up infinite 76 idle helios[001-010,013-032],heliosbf2a[001-012,013-016] $squeue -p helios JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 494552 helios interact ... R 25:51 4 helios[011-012],heliosbf2a[011-012]

To allocate nodes interactively

Please avoid allocating nodes interactively if possible or set the time limit short because we are sharing the resources with multiple users.

# CPU nodes only $ salloc -N 2 -p helios --time=1:00:00 -w helios001,helios002 # CPU and BlueField nodes $ salloc -N 4 -p helios --time=1:00:00 -w helios00[1-2],heliosbf2a00[1-2]

To submit a batch job

Note: helios cluster has NVIDIA BlueField-2 cards with ARM processors on it. Those adapters can also be seen in slurm as “nodes” marked with heliosbf2a0[01-16], while the hosts are named helios[001-032].

Storage

Basic environment

Based on the order of module load/unload, extra module are present to the user. Remember to load compilers first.

  • Loading HPC-X with Intel compiler 2022.

  • Loading HPC-X with GNU compiler

Running OSU latency using HPC-X