Code Block
tar xfz gromacs-2020.2.tar.gz
cd gromacs-2020.2
module load intel/2019.5.281
module load mkl/2019.5.281
module load gcc/8.4.0
module load cmake/3.13.4
module load hpcx/2.6.0
mkdir build 
mkdir install run
cd build
        -DMKL_INCLUDE_DIR=$MKLROOT/include \
        -DGMX_SIMD=AVX2_256 \
        -DGMX_MPI=ON \
        -DBUILD_SHARED_LIBS=on \
        -DGMX_HWLOC=off \
        -DCMAKE_INSTALL_PREFIX=../install \
make -j 16 install


Code Block
tar xfz gromacs-2020.2.tar.gz
cd gromacs-2020.2
module load intel/2019.5.281
module load mkl/2019.5.281
module load gcc/8.4.0
module load cmake/3.13.4
module load hpcx/2.6.0
module load cuda/10.1
mkdir build 
mkdir install run
cd build
        -DMKL_INCLUDE_DIR=$MKLROOT/include \
        -DGMX_SIMD=AVX2_256 \
        -DGMX_MPI=ON \
        -DGMX_GPU=ON \
        -DBUILD_SHARED_LIBS=on \
        -DGMX_HWLOC=off \
        -DCMAKE_INSTALL_PREFIX=../install \        
make -j 16 install


To run Gromacs with GPU

Code Block
% export OMP_NUM_THREADS=2
% export KMP_AFFINITY=verbose,compact
% mpirun -np 4 -x UCX_NET_DEVICES=mlx5_0:1 -bind-to none --map-by node:PE=2$OMP_NUM_THREADS \
mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb gpu -pin on \

Command line:
  mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb gpu -pin on

Reading file stmv.tpr, VERSION 2018.1 (single precision)
Note: file tpx version 112, software tpx version 119
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
Changing nstlist from 10 to 100, rlist from 1.2 to 1.339

On host 2 GPUs selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node:
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the CPU
Using 4 MPI processes
Using 2 OpenMP threads per MPI process
imb F  0% step 9900, remaining wall clock time:     3 s
imb F  0% step 10000, remaining wall clock time:     0 s

Dynamic load balancing report:
 DLB was off during the run due to low measured imbalance.
 Average load imbalance: 0.2%.
 The balanceable part of the MD step is 59%, load imbalance is computed from this.
 Part of the total run time spent waiting due to load imbalance: 0.1%.

               Core t (s)   Wall t (s)        (%)
       Time:     2581.428      322.681      800.0
                 (ns/day)    (hour/ns)
Performance:        5.356        4.481
