Nektar++
The version of Nektar++ that will be used for the Student Cluster Competition at ISC18 is available from GitHub ().
Here is a step by step example of building Nektar++ for use on a cluster with x86_64 processors. Replace all absolute paths with whatever is appropriate for your system.
Create a directory where you want to do your build, and change into it:
cd /global/home/users/gerardo/benchmarks/isc18scc
Set your environment to use the appropriate compilers and MPI implementation (in this example, Intel compilers and HPC-X):
module purge module load intel/2018.2.199 hpcx/2.1.0 mkl/2018.2.199 module load cmake
Get the Nektar++ tarball from GitHub, extract its contents, and go down into the nektar-master directory:
wget https://gitlab.nektar.info/nektar/nektar/-/archive/master/nektar-master.tar.gz tar zxvf nektar-master.tar.gz cd nektar-master
Make a build directory, go down into it and configure the build by running cmake (this will take some time):
mkdir build_i18h21; cd build_i18h21 cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc \ -DCMAKE_Fortran_COMPILER=ifort \ -DCMAKE_INSTALL_PREFIX=/global/home/groups/hpcperf/centos-7/modules/nektar++/master-hpcx-2.1.0-intel-2018.2.199 -DNEKTAR_USE_MPI=ON \ -DNEKTAR_USE_MKL=ON -DNEKTAR_USE_METIS=ON .. 2>&1 | tee ../cmake_i18h21.log
Run make to actually build the application:
make -j20 2>&1 | tee ../make_i18h21.log
Create a directory in which to run the benchmark, get the data tarball (T106AP3.tgz) and extract it in it:
cd .. mkdir isc_scc ; cd isc_scc # get the tarball tar zxvf T106AP3.tgz cd p3
The subdirectory p3 contains the test case data and a sample batch script, submit.slurm. Edit the latter for your paths and system, and submit to your batch scheduler or run. Here is a sample run command:
mpirun -np 320 --display-map --report-bindings --map-by node --bind-to core -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -mca coll_fca_enable 0 -mca coll_hcoll_enable 0 IncNavierStokesSolver --npz 8 --use-metis t106a.xml
The '--use-metis' option is important, because the default mesh partitioner, Scotch, fails at larger core counts.
The measure of performance is the "Total Computation Time" written by the application near the end of the run. For example, on eight nodes having 40 SKL 2.0GHz cores each:
Total Computation Time = 1337s