...
Create a directory where you want to do your build, and change into it:
Code Block cd /global/home/users/gerardo/benchmarks/isc18scc
Set your environment to use the appropriate compilers and MPI implementation (in this example, Intel compilers and HPC-X):
Code Block module purge module load intel/2018.2.199 hpcx/2.1.0 mkl/2018.2.199 module load cmake
Get the Nektar++ tarball from GitHub, extract its contents, and go down into the nektar-master directory:
Code Block wget https://gitlab.nektar.info/nektar/nektar/-/archive/master/nektar-master.tar.gz tar zxvf nektar-master.tar.gz cd nektar-master
Make a build directory, go down into it and configure the build by running cmake (this will take some time):
Code Block mkdir build_i18h21; cd build_i18h21 cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc \ -DCMAKE_Fortran_COMPILER=ifort \ -DCMAKE_INSTALL_PREFIX=/global/home/groups/hpcperf/centos-7/modules/nektar++/master-hpcx-2.1.0-intel-2018.2.199 -DNEKTAR_USE_MPI=ON \ -DNEKTAR_USE_MKL=ON -DNEKTAR_USE_METIS=ON .. 2>&1 | tee ../cmake_i18h21.log
Run make to actually build the application:
Code Block make -j20 2>&1 | tee ../make_i18h21.log
Create a directory in which to run the benchmark, get the data tarball (T106AP3.tgz) and extract it in it:
Code Block cd .. mkdir isc_scc ; cd isc_scc # get the tarball tar zxvf T106AP3.tgz cd p3
The subdirectory p3 contains the test case data and a sample batch script, submit.slurm. Edit the latter for your paths and system, and submit to your batch scheduler or run. Here is a sample run command:
Code Block mpirun -np 320 --display-map --report-bindings --map-by node --bind-to core -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -mca coll_fca_enable 0 -mca coll_hcoll_enable 0 IncNavierStokesSolver --npz 8 --use-metis t106a.xml
The '--use-metis' option is important, because the default mesh partitioner, Scotch, fails at larger core counts.
The measure of performance is the "Total Computation Time" written by the application near the end of the run. For example, on four eight nodes having 40 SKL 2.0GHz cores each:
Code Block Total Computation Time = 2475s1337s