Getting Started with the Coding Challenge for ISC24 SCC
Overview
The ICON (ICOsahedral Nonhydrostatic) earth system model is a unified next-generation global numerical weather prediction and climate modelling system.To prepare the model for the exascale era of supercomputing systems, ICON is currently undergoing a major refactoring. Given the heterogeneous hardware, performance portability is crucial. For this purpose, the code base is converted from a monolithic code into a modularized, scalable and flexible code.
In the new ICON Consolidated (ICON-C) software design, the model consists of several encapsulated modules and each module can be independently ported to new architectures, using thereby different parallel programming paradigms.
Note: The page may be changed until the competition stats, maybe sure to follow up until the opening ceremony.
Coding Challenge presentation to the teams:
Presentation file:
The Task
Your task is to parallelize and optimize the micro-physics (μphys) standalone module extracted from the ICON model. Starting from a C++ serial implementation, extend it using a directive-based programming API to target heterogeneous platforms and optimise it to achieve best execution times.
The implementation must use either OpenACC* or OpenMP**, and should target the following platforms:
Implementation | Platforms | Compilers |
---|---|---|
OpenACC | x86_64 CPU, NVIDIA GPU | nvhpc++, GNU/g++ |
OpenMP | x86_64 CPU, NVIDIA GPU | GNU/g++, LLVM/clang++, |
Optimisations could include, but are not restricted to:
new data formats (e.g. AOS/SOA)
different cache configurations
different compilation flags
different communication patterns CPU-GPU
GPU to GPU communication via MPI
configurable workload distribution per thread/block
etc
Cluster Access
The team captain of each team will need to register for an account at DKRZ here :
In case your e-mail is not accepted, you should write an email to support@dkrz.de. We will then activate your e-mail address so that you can register.
After your account has been set up you can login and request membership to project 1273 (ICON at student cluster competition isc24) at the following link
In the field Project* enter 1273 and the message should contain the name of the captain and the team (see below).
Being a member of the project 1273 you can access the source code on gitlab here
Levante Hardware for the coding challenge
1 GPU node for development
4 GPU nodes for testing allocated through SLURM jobs of max 30 min
Prerequisites
Levante nodes have all the dependencies available to the users. In case the teams prefer to have the development on their laptops, the following tools/libs are needed:
C++ compiler & libc ; to use a gcc compiler which is able to do offloading you can either use your spack clone or use one of the following installed compilers:
module load gcc/.12.3.0-gcc-11.2.0-nvptx module load gcc/.13.2.0-gcc-11.2.0-nvptx module load nvhpc/22.5-gcc-11.2.0
NETCDF - This can be installed in several ways:
from sources https://downloads.unidata.ucar.edu/netcdf-c/4.9.2/netcdf-c-4.9.2.tar.gz
[for MacOS] : using https://formulae.brew.sh/formula/netcdf-cxx
using spack:
spack install netcdf-cxx4
on Levante make use of the pre-installed NETCDF lib, which can be loaded with spack
spack load netcdf-cxx4@4.3.1
CDO:
[for MacOS] using https://code.mpimet.mpg.de/projects/cdo/wiki/MacOS_Platform
using spack:
on Levante make use of the pre-installed cdo lib, which can be loaded with spack
Tasks
Each team has a programming task and an optimisation task, while extra points are awarded for an optional task (only if the first ones are at least partially addressed).
Programming task (50%)
the code compiles on all supported platforms
all existing unit-tests are passed
code readability following C++ code guidelines
the results are numerically CORRECT (compared to a benchmark)
the input files are in
tasks/
folder; larger file for optimisation purposes can be found [here]in the file
path-to-muphys-cpp/reference_results/sequential_double_output.nc
results from the sequential run are provided. Use thecdo infon -sub
command to compare your results, which are written by the μphys application to the fileoutput.nc
, with the reference results as followsfor the comparsion between CPU and GPU runs we expect differences as seen below
In the column Mean the average difference for each field (Parameter_name) should be less or equal to 10e-11.
for CPU only runs we expect bit-identical results, which can be checked with
cdo diffn
command and the output of the cdo diffn command should look like this
cdo diffn: Processed 156800 values from 14 variables over 2 timesteps [0.05s 24MB]
Optimisation task (50%)
wall clock runtime is faster than benchmarks (provided at the begining)
results from different profiles are visualized and explained
efficient run configurations (e.g. for V100, use N blocks x M threads x P registers/thread)
(optional) Maintainability task (for extra points, up to 20%)
the code is covered by unit-tests
the code is well document in doxygen style
the final solution is more portable than expected
experience report about working with OpenMP/OpenACC
Submissions
For evaluation, each team needs to submit in their associated working directory on Levante :
Path to the implementation (e.g. impl/openX folder)
Scripts to build CPU/GPU executable for correctness/performance (e.g. O0 vs O3)
For checking the correctness of the results, your implementation compiled with GCC (and -O0 flag) should produce bit-identical (or close to bit-identical) results to
scc_at_isc24/implementations/seq/graupel.cpp
(which was provided at the beginning) in both single and double precision, when run a CPU node. For this, you should provide the scripts which produces these two builds, similar to this:For performance, you can do ANY source code changes, use ANY compiler and ANY compile flags while making sure that the results are within the threshold (1e-11) with respect to the CPU results. You need to provide the associated scripts, similar to the one above, but with your own setup (e.g. compiler & flags). These should run on a GPU node of Levante, because we will only time the runs on the GPU. Use the files from the scripts folder to guide you.
Slurm logs to confirm results
Summary list of optimizations performed
Plots to confirm performance results
(opt) Profiler analysis output & interpretation
(opt) Experience report for using OpenMP/OpenACC
* OpenACC version: min 2.6
** OpenMP version: min 4.5, recommended: 5.0