For ISC22 Student Cluster Competition, we will have a coding challenge for the participating teams!
...
Widget Connector | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Slides:
View file | ||
---|---|---|
|
View file | ||
---|---|---|
|
...
Code Block |
---|
#!/bin/bash -l #SBATCH -p thor #SBATCH --nodes=16 #SBATCH -J osu #SBATCH --time=15:00 #SBATCH --exclusive module load gcc/8.3.1 mvapich2-dpu/2021.08 srun -l hostname -s | awk '{print $2}' | grep -v bf | sort > hostfile srun -l hostname -s | awk '{print $2}' | grep bf | sort |uniq > dpufile NPROC=$(cat hostfile |wc -l) EXE=$MVAPICH2_DPU_DIR/libexec/osu-micro-benchmarks/osu_ialltoall # No DPU offload mpirun_rsh -np $NPROC -hostfile hostfile MV2_USE_DPU=0 $EXE # DPU offload mpirun_rsh -np $NPROC -hostfile hostfile -dpufile dpufile $EXE |
Keep in mind that we are not running processes directly on DPUs, but on the hosts. Mvapich will take care of DPU offloading.
Job submission:
Code Block |
---|
sbatch -N 16 -w thor0[25-32],thor-bf[25-32] --ntasks-per-node=8 RUN-osu.slurm |
...
[contribution to total score 30%] – Run the original and modified xcompact3d application using the cylinder input case (/global/home/groups/isc_scc/coding-challenge/input.i3d). You are not allowed to change the problem size but you should adjust “Domain decomposition” in the input file. Obtain performance measurements using 8 nodes with and without the DPU adapter (note: Thor servers equipped with 2 adapters, ConnectX-6 and BlueField-2, mlx5_2 should be used on the host), make sure to vary PPN (4, 8, 16, 32). Run MPI profiler (mpiP or IPM) to understand if MPI overlap is happening and how the parallel behaviour of the application has changed.
...