Overview
CONQUEST is a DFT code designed for large-scale calculations, with excellent parallelisation. It gives an exact diagonalisation approach for systems from 1 to 10,000+ atoms, and brings the possibility of linear scaling calculations on over 1,000,000 atoms. In this task, you will be using the linear scaling approach, which can show perfect weak scaling of thousands of cores.
Note: The page may be changed until the competition stats, maybe sure to follow up until the opening ceremony.
Building and Running example
Download v1.2 from https://github.com/OrderN/CONQUEST-release.git.
wget https://github.com/OrderN/CONQUEST-release/releases/download/v1.2/CONQUEST-release-1.2.tar.gz
Download libxc 6.2.2 from https://www.tddft.org/programs/libxc/download/.
Prerequisites
FFTW/MKL package
SCALAPACK
Build libxc:
# Load intel compilers and mpi modules cd libxc-6.2.2 ./configure --prefix=<path> CC=mpicc FC=mpif90 make make install
Build Conquest:
# Load intel compilers and mpi modules cd CONQUEST-release/src # Edit system.make for XC lib and include paths, and FFT & blas libraries. # Add correct flag (-qopenmp for Intel) for OpenMP to compile and link arguments # Set MULT_KERN to ompGemm make
Sample build script for libxc and Conquest on PSC:
#!/bin/bash BASE=$PWD source /jet/packages/oneapi/v2023.2.0/compiler/2023.2.1/env/vars.sh rm -rf libxc-6.2.2 tar xfp libxc-6.2.2.tar.gz cd libxc-6.2.2 MPI=impi-2021.10.0 MPI=hpcx-2.18 if [[ "$MPI" =~ ^impi ]]; then source /jet/packages/oneapi/v2023.2.0//mpi/2021.10.0/env/vars.sh export MPIFC=mpiifort export CC=mpiicc export FC=$MPIFC elif [[ "$MPI" =~ ^hpcx ]]; then module use $HOME/tools/$MPI/modulefiles module load hpcx export OMPI_CC=icc export OMPI_CXX=icpc export OMPI_FC=ifort export OMPI_F90=ifort export MPIFC=mpif90 export CC=mpicc export FC=mpif90 fi rm -rf $BASE/libxc-6.2.2-$MPI ./configure --prefix=$BASE/libxc-6.2.2-$MPI make -j 16 install
Modify src/system.make under Conquest source directory,
# # Set compilers FC=$(MPIFC) F77=$(FC) # Linking flags LINKFLAGS= -L/usr/local/lib ARFLAGS= # Compilation flags # NB for gcc10 you need to add -fallow-argument-mismatch COMPFLAGS= -O3 $(XC_COMPFLAGS) COMPFLAGS_F77= $(COMPFLAGS) # Set BLAS and LAPACK libraries # MacOS X # BLAS= -lvecLibFort # Intel MKL use the Intel tool # Generic # BLAS= -llapack -lblas # Full library call; remove scalapack if using dummy diag module LIBS= -qmkl=sequential -lmkl_scalapack_lp64 -lmkl_blacs_$(WHICHMPI)_lp64 $(XC_LIB) # LIBS= $(FFT_LIB) $(XC_LIB) -lscalapack $(BLAS) # LibXC compatibility (LibXC below) or Conquest XC library # Conquest XC library #XC_LIBRARY = CQ #XC_LIB = #XC_COMPFLAGS = # LibXC compatibility # Choose LibXC version: v4 (deprecated) or v5/6 (v5 and v6 have the same interface) # XC_LIBRARY = LibXC_v4 XC_DIR = <path>/libxc-6.2.2-$(MPI) XC_LIBRARY = LibXC_v5 XC_LIB = -L$(XC_DIR)/lib -lxcf90 -lxc XC_COMPFLAGS = -I$(XC_DIR)/include # Set FFT library FFT_LIB=-lfftw3 FFT_OBJ=fft_fftw3.o # Matrix multiplication kernel type MULT_KERN = default # Use dummy DiagModule or not DIAG_DUMMY =
Build Conquest:
#!/bin/bash BASE=$PWD source /jet/packages/oneapi/v2023.2.0/compiler/2023.2.1/env/vars.sh MPI=impi-2021.10.0 MPI=hpcx-2.18 if [[ "$MPI" =~ ^impi ]]; then source /jet/packages/oneapi/v2023.2.0//mpi/2021.10.0/env/vars.sh export MPIFC=mpiifort export CC=mpiicc export FC=$MPIFC export WHICHMPI=intelmpi elif [[ "$MPI" =~ ^hpcx ]]; then module use $HOME/tools/$MPI/modulefiles module load hpcx export OMPI_CC=icc export OMPI_CXX=icpc export OMPI_FC=ifort export OMPI_F90=ifort export MPIFC=mpif90 export CC=mpicc export FC=mpif90 export WHICHMPI=openmpi fi cd src export MPI make clean make cd $BASE/bin mv Conquest Conquest-$MPI
Running Conquest:
You will need to set the number of threads per process for OpenMP as well as the number of MPI processes.
export OMP_NUM_THREADS=XX mpirun -np YY path/to/Conquest
Application metric is wall-time “Total run time”.
Tasks & Submissions
Input:
The virtual task involves performing linear scaling calculations on samples of bulk silicon with different numbers of atoms. Conquest weak scaling is seen when the number of atoms per MPI process is kept fixed, and the number of processes is scaled with the system size (number of atoms). You have been provided with three inputs, with 512 atoms (si_444.xtl), 1728 atoms (si_666.xtl) and 4096 atoms (si_888.xtl). The minimum number of atoms per MPI process is 8; the maximum will be dictated by memory limitations. The simplest way to examine weak scaling is to keep the product of MPI processes and OpenMP threads per process constant, and vary system size. You might also explore the effect of under-populating nodes where that is possible.
The smaller inputs are only for practice, not for submissions. The only input for submission is si_888.xtl.
Find the best balance between OpenMP threads and MPI processes, show your work in the team’s interview presentation. Investigate the weak scaling as the MPI/OpenMP balance is changed.
Run CONQUEST with on 4 nodes and submit the results to the team’s folder.
Run IPM profile or any other MPI profile on 4 nodes, and find the 3 most used MPI calls, show your work in the team interview presentation.
Try run the application on 1,2,4 nodes (for the si_888.xtl input) and present strong scaling graph in the teams interview presentation.
Add Comment