Page Comparison

...

Create a directory where you want to do your build, and change into it:
Code Block
cd /global/home/users/gerardo/benchmarks/isc18scc
Set your environment to use the appropriate compilers and MPI implementation (in this example, Intel compilers and HPC-X):
Code Block
module purge module load intel/2018.2.199 hpcx/2.1.0 mkl/2018.2.199 module load cmake

Get the Nektar++ tarball from GitHub, extract its contents, and go down into the nektar-master directory:

Code Block
wget https://gitlab.nektar.info/nektar/nektar/-/archive/master/nektar-master.tar.gz tar zxvf nektar-master.tar.gz cd nektar-master

Make a build directory, go down into it and configure the build by running cmake (this will take some time):

Code Block

mkdir build_i18h21; cd build_i18h21
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc \
 -DCMAKE_Fortran_COMPILER=ifort \
 -DCMAKE_INSTALL_PREFIX=/global/home/groups/hpcperf/centos-7/modules/nektar++/master-hpcx-2.1.0-intel-2018.2.199 -DNEKTAR_USE_MPI=ON \
 -DNEKTAR_USE_MKL=ON -DNEKTAR_USE_METIS=ON  .. 2>&1 | tee ../cmake_i18h21.log

Run make to actually build the application:
Code Block
make -j20 2>&1 | tee ../make_i18h21.log

Create a directory in which to run the benchmark, get the data tarball (T106AP3.tgz) and extract it in it:

Code Block
cd .. mkdir isc_scc ; cd isc_scc # get the tarball tar zxvf T106AP3.tgz cd p3

The subdirectory p3 contains the test case data and a sample batch script, submit.slurm. Edit the latter for your paths and system, and submit to your batch scheduler or run. Here is a sample run command:

Code Block

mpirun -np 320 --display-map --report-bindings --map-by node --bind-to core -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -mca coll_fca_enable 0 -mca coll_hcoll_enable 0 IncNavierStokesSolver --npz 8 --use-metis t106a.xml

The '--use-metis' option is important, because the default mesh partitioner, Scotch, fails at larger core counts.

The measure of performance is the "Total Computation Time" written by the application near the end of the run. For example, on four eight nodes having 40 SKL 2.0GHz cores each:
Code Block
Total Computation Time = 2475s1337s

Versions Compared

Old Version 4

New Version 5

Key