Page Comparison

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions many groups are also using it for research on non-biological systems.

Some test cases to start with can be found at ftp://ftp.gromacs.org/pub/benchmarks/gmxbench-3.0.tar.gz.

Download the source code

Code Block
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-2020.2.tar.gz

Build Gromacs for CPU only

Code Block

tar xfz gromacs-2020.2.tar.gz
cd gromacs-2020.2
# load Intel compilers and HPC-X
mkdir build
cd build
cmake .. -DGMX_FFT_LIBRARY=mkl -DMKL_LIBRARIES=-mkl \
        -DMKL_INCLUDE_DIR=$MKLROOT/include \
        -DGMX_SIMD=AVX2_256 \
        -DGMX_MPI=ON \
        -DGMX_BUILD_MDRUN_ONLY=on \
        -DBUILD_SHARED_LIBS=on \
        -DCMAKE_INSTALL_PREFIX=<install path> \
        -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx
make -j 16 install

Build Gromacs with GPU support

Code Block

cmake .. -DGMX_FFT_LIBRARY=mkl -DMKL_LIBRARIES=-mkl \
        -DMKL_INCLUDE_DIR=$MKLROOT/include \
        -DGMX_SIMD=AVX2_256 \
        -DGMX_MPI=ON \
        -DGMX_GPU=ON \
        -DGMX_BUILD_MDRUN_ONLY=on \
        -DBUILD_SHARED_LIBS=on \
        -DCMAKE_INSTALL_PREFIX=<install path> \        
        -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx
make -j 16 install

To run Gromacs with CPU only using stmv case

Code Block

% mpirun -np 40 -x UCX_NET_DEVICES=mlx5_0:1 -bind-to core -report-bindings \
mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb cpu -pin on

Command line:
  mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb cpu -pin on

Reading file stmv.tpr, VERSION 2018.1 (single precision)
Note: file tpx version 112, software tpx version 119
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
Changing nstlist from 10 to 80, rlist from 1.2 to 1.316


Using 40 MPI processes
Using 1 OpenMP thread per MPI process
...
step 9900, remaining wall clock time:     6 s
vol 0.96  imb F  1% pme/F 0.81 step 10000, remaining wall clock time:     0 s


Dynamic load balancing report:
 DLB was turned on during the run due to measured imbalance.
 Average load imbalance: 0.9%.
 The balanceable part of the MD step is 94%, load imbalance is computed from this.
 Part of the total run time spent waiting due to load imbalance: 0.8%.
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 % Z 0 %
 Average PME mesh/force load: 0.790
 Part of the total run time spent waiting due to PP/PME imbalance: 2.0 %


               Core t (s)   Wall t (s)        (%)
       Time:    27556.777      688.921     4000.0
                 (ns/day)    (hour/ns)
Performance:        2.509        9.567

To run Gromacs with GPU

Code Block

% mpirun -np 4 -x UCX_NET_DEVICES=mlx5_0:1 -bind-to none --map-by node:PE=2 \
mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb gpu -pin on

Command line:
  mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb gpu -pin on

Reading file stmv.tpr, VERSION 2018.1 (single precision)
Note: file tpx version 112, software tpx version 119
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
Changing nstlist from 10 to 100, rlist from 1.2 to 1.339


On host ops003.hpcadvisorycouncil.com 2 GPUs selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node:
  PP:0,PP:1
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the CPU
Using 4 MPI processes
Using 2 OpenMP threads per MPI process
...
imb F  0% step 9900, remaining wall clock time:     3 s
imb F  0% step 10000, remaining wall clock time:     0 s


Dynamic load balancing report:
 DLB was off during the run due to low measured imbalance.
 Average load imbalance: 0.2%.
 The balanceable part of the MD step is 59%, load imbalance is computed from this.
 Part of the total run time spent waiting due to load imbalance: 0.1%.


               Core t (s)   Wall t (s)        (%)
       Time:     2581.428      322.681      800.0
                 (ns/day)    (hour/ns)
Performance:        5.356        4.481

Version	Old Version 1	New Version 2
Changes made by	David Jaekyu Cho	David Jaekyu Cho
Saved on	Jun 09, 2020	Jun 09, 2020

Versions Compared

Key

Download the source code

Build Gromacs for CPU only

Build Gromacs with GPU support