Some test cases to start with can be found at ftp://ftp.gromacs.org/pub/benchmarks/gmxbench-3.0.tar.gz.

Table of Contents

Refereneces

http://manual.gromacs.org/documentation/current/install-guide/index.html

Build and Install Gromacs

...

Code Block

tar xfz gromacs-2020.2.tar.gz
cd gromacs-2020.2
module load intel/2019.5.281
module load mkl/2019.5.281
module load gcc/8.4.0
module load cmake/3.13.4
module load hpcx/2.6.0
mkdir build 
mkdir install run
cd build
cmake .. -DGMX_FFT_LIBRARY=mkl -DMKL_LIBRARIES=-mkl \
        -DMKL_INCLUDE_DIR=$MKLROOT/include \
        -DGMX_SIMD=AVX2_256 \
        -DGMX_MPI=ON \
        -DGMX_BUILD_MDRUN_ONLY=on \
        -DBUILD_SHARED_LIBS=on \
        -DGMX_HWLOC=off \
        -DCMAKE_INSTALL_PREFIX=../install \
        -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx
make -j 16 install

Note: if you get hwloc error, comment out the following lines from the following file: /src/gromacs/hardware/hardwaretopology.cpp.

Code Block
//# if GMX_HWLOC_API_VERSION < 0x00020000 //# error "HWLOC library major version set during configuration is 1, but currently using version 2 headers" //# endif

Check the install directory for the file mdrun_mpi

Code Block
$ ls ../install/ bin include lib64 share $ ls ../install/bin gmx-completion-mdrun_mpi.bash mdrun_mpi

Build Gromacs with GPU support

Code Block

tar xfz gromacs-2020.2.tar.gz
cd gromacs-2020.2
module load intel/2019.5.281
module load mkl/2019.5.281
module load gcc/8.4.0
module load cmake/3.13.4
module load hpcx/2.6.0
module load cuda/10.1
mkdir build 
mkdir install 
cd build
cmake .. -DGMX_FFT_LIBRARY=mkl -DMKL_LIBRARIES=-mkl \
        -DMKL_INCLUDE_DIR=$MKLROOT/include \
        -DGMX_SIMD=AVX2_256 \
        -DGMX_MPI=ON \
        -DGMX_GPU=ON \
        -DGMX_BUILD_MDRUN_ONLY=on \
        -DBUILD_SHARED_LIBS=on \
        -DCMAKE_INSTALL_PREFIX=../install \        
        -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx
make -j 16 install

Run Gromacs

To run Gromacs with CPU only using stmv case

...

Check the install directory for the file mdrun_mpi

Code Block
$ ls ../install/ bin include lib64 share $ ls ../install/bin gmx-completion-mdrun_mpi.bash mdrun_mpi

Build Gromacs with GPU support

Code Block

tar xfz gromacs-2020.2.tar.gz
cd gromacs-2020.2
module load intel/2019.5.281
module load mkl/2019.5.281
module load gcc/8.4.0
module load cmake/3.13.4
module load hpcx/2.6.0
module load cuda/10.1
mkdir build install run
cd build
cmake .. -DGMX_FFT_LIBRARY=mkl -DMKL_LIBRARIES=-mkl \
        -DMKL_INCLUDE_DIR=$MKLROOT/include \
        -DGMX_SIMD=AVX2_256 \
        -DGMX_MPI=ON \
        -DGMX_GPU=ON \
        -DGMX_BUILD_MDRUN_ONLY=on \
        -DBUILD_SHARED_LIBS=on \
        -DGMX_HWLOC=off \
        -DCMAKE_INSTALL_PREFIX=../install \        
        -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx
make -j 16 install

Run Gromacs

To run Gromacs with CPU only using stmv case

Code Block

% mpirun -np 40 -x UCX_NET_DEVICES=mlx5_0:1 -bind-to core -report-bindings \
mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb cpu -pin on

Command line:
  mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb cpu -pin on

Reading file stmv.tpr, VERSION 2018.1 (single precision)
Note: file tpx version 112, software tpx version 119
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
Changing nstlist from 10 to 80, rlist from 1.2 to 1.316


Using 40 MPI processes
Using 1 OpenMP thread per MPI process
...
step 9900, remaining wall clock time:     6 s
vol 0.96  imb F  1% pme/F 0.81 step 10000, remaining wall clock time:     0 s


Dynamic load balancing report:
 DLB was turned on during the run due to measured imbalance.
 Average load imbalance: 0.9%.
 The balanceable part of the MD step is 94%, load imbalance is computed from this.
 Part of the total run time spent waiting due to load imbalance: 0.8%.
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 % Z 0 %
 Average PME mesh/force load: 0.790
 Part of the total run time spent waiting due to PP/PME imbalance: 2.0 %


               Core t (s)   Wall t (s)        (%)
       Time:    27556.777      688.921     4000.0
                 (ns/day)    (hour/ns)
Performance:        2.509        9.567

To run Gromacs with GPU

Code Block

% export OMP_NUM_THREADS=2
% export KMP_AFFINITY=verbose,compact
% mpirun -np 4 -x UCX_NET_DEVICES=mlx5_0:1 -map-by node:PE=$OMP_NUM_THREADS \
mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb gpu -pin on \
-ntomp $OMP_NUM_THREADS 

Command line:
  mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb gpu -pin on

Reading file stmv.tpr, VERSION 2018.1 (single precision)
Note: file tpx version 112, software tpx version 119
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
Changing nstlist from 10 to 100, rlist from 1.2 to 1.339


On host ops003.hpcadvisorycouncil.com 2 GPUs selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node:
  PP:0,PP:1
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the CPU
Using 4 MPI processes
Using 2 OpenMP threads per MPI process
...
imb F  0% step 9900, remaining wall clock time:     3 s
imb F  0% step 10000, remaining wall clock time:     0 s


Dynamic load balancing report:
 DLB was off during the run due to low measured imbalance.
 Average load imbalance: 0.2%.
 The balanceable part of the MD step is 59%, load imbalance is computed from this.
 Part of the total run time spent waiting due to load imbalance: 0.1%.


               Core t (s)   Wall t (s)        (%)
       Time:     2581.428      322.681      800.0
                 (ns/day)    (hour/ns)
Performance:        5.356        4.481

Other command options

Code Block

OPTIONS

Options to specify input files:

 -s      [<.tpr>]           (topol.tpr)
           Portable xdr run input file
 -cpi    [<.cpt>]           (state.cpt)      (Opt.)
           Checkpoint file
 -table  [<.xvg>]           (table.xvg)      (Opt.)
           xvgr/xmgr file
 -tablep [<.xvg>]           (tablep.xvg)     (Opt.)
           xvgr/xmgr file
 -tableb [<.xvg> [...]]     (table.xvg)      (Opt.)
           xvgr/xmgr file
 -rerun  [<.xtc/.trr/...>]  (rerun.xtc)      (Opt.)
           Trajectory: xtc trr cpt gro g96 pdb tng
 -ei     [<.edi>]           (sam.edi)        (Opt.)
           ED sampling input
 -multidir [<dir> [...]]    (rundir)         (Opt.)
           Run directory
 -awh    [<.xvg>]           (awhinit.xvg)    (Opt.)
           xvgr/xmgr file
 -membed [<.dat>]           (membed.dat)     (Opt.)
           Generic data file
 -mp     [<.top>]           (membed.top)     (Opt.)
           Topology file
 -mn     [<.ndx>]           (membed.ndx)     (Opt.)
           Index file

Options to specify output files:

 -o      [<.trr/.cpt/...>]  (traj.trr)
           Full precision trajectory: trr cpt tng
 -x      [<.xtc/.tng>]      (traj_comp.xtc)  (Opt.)
           Compressed trajectory (tng format or portable xdr format)
 -cpo    [<.cpt>]           (state.cpt)      (Opt.)
           Checkpoint file
 -c      [<.gro/.g96/...>]  (confout.gro)
           Structure file: gro g96 pdb brk ent esp
 -e      [<.edr>]           (ener.edr)
           Energy file
 -g      [<.log>]           (md.log)
           Log file
 -dhdl   [<.xvg>]           (dhdl.xvg)       (Opt.)
           xvgr/xmgr file
 -field  [<.xvg>]           (field.xvg)      (Opt.)
           xvgr/xmgr file
 -tpi    [<.xvg>]           (tpi.xvg)        (Opt.)
           xvgr/xmgr file
 -tpid   [<.xvg>]           (tpidist.xvg)    (Opt.)
           xvgr/xmgr file
 -eo     [<.xvg>]           (edsam.xvg)      (Opt.)
           xvgr/xmgr file
 -px     [<.xvg>]           (pullx.xvg)      (Opt.)
           xvgr/xmgr file
 -pf     [<.xvg>]           (pullf.xvg)      (Opt.)
           xvgr/xmgr file
 -ro     [<.xvg>]           (rotation.xvg)   (Opt.)
           xvgr/xmgr file
 -ra     [<.log>]           (rotangles.log)  (Opt.)
           Log file
 -rs     [<.log>]           (rotslabs.log)   (Opt.)
           Log file
 -rt     [<.log>]           (rottorque.log)  (Opt.)
           Log file
 -mtx    [<.mtx>]           (nm.mtx)         (Opt.)
           Hessian matrix
 -if     [<.xvg>]           (imdforces.xvg)  (Opt.)
           xvgr/xmgr file
 -swap   [<.xvg>]           (swapions.xvg)   (Opt.)
           xvgr/xmgr file

Other options:

 -deffnm <string>
           Set the default filename for all file options
 -xvg    <enum>             (xmgrace)
           xvg plot formatting: xmgrace, xmgr, none
 -dd     <vector>           (0 0 0)
           Domain decomposition grid, 0 is optimize
 -ddorder <enum>            (interleave)
           DD rank order: interleave, pp_pme, cartesian
 -npme   <int>              (-1)
           Number of separate ranks to be used for PME, -1 is guess
 -nt     <int>              (0)
           Total number of threads to start (0 is guess)
 -ntmpi  <int>              (0)
           Number of thread-MPI ranks to start (0 is guess)
 -ntomp  <int>              (0)
           Number of OpenMP threads per MPI rank to start (0 is guess)
 -ntomp_pme <int>           (0)
           Number of OpenMP threads per MPI rank to start (0 is -ntomp)
 -pin    <enum>             (auto)
           Whether mdrun should try to set thread affinities: auto, on, off
 -pinoffset <int>           (0)
           The lowest logical core number to which mdrun should pin the first
           thread
 -pinstride <int>           (0)
           Pinning distance in logical cores for threads, use 0 to minimize
           the number of threads per physical core
 -gpu_id <string>
           List of unique GPU device IDs available to use
 -gputasks <string>
           List of GPU device IDs, mapping each PP task on each node to a
           device
 -[no]ddcheck               (yes)
           Check for all bonded interactions with DD
 -rdd    <real>             (0)
           The maximum distance for bonded interactions with DD (nm), 0 is
           determine from initial coordinates
 -rcon   <real>             (0)
           Maximum distance for P-LINCS (nm), 0 is estimate
 -dlb    <enum>             (auto)
           Dynamic load balancing (with DD): auto, no, yes
 -dds    <real>             (0.8)
           Fraction in (0,1) by whose reciprocal the initial DD cell size will
           be increased in order to provide a margin in which dynamic load
           balancing can act while preserving the minimum cell size.
 -nb     <enum>             (auto)
           Calculate non-bonded interactions on: auto, cpu, gpu
 -nstlist <int>             (0)
           Set nstlist when using a Verlet buffer tolerance (0 is guess)
 -[no]tunepme               (yes)
           Optimize PME load between PP/PME ranks or GPU/CPU
 -pme    <enum>             (auto)
           Perform PME calculations on: auto, cpu, gpu
 -pmefft <enum>             (auto)
           Perform PME FFT calculations on: auto, cpu, gpu
 -bonded <enum>             (auto)
           Perform bonded calculations on: auto, cpu, gpu
 -update <enum>             (auto)
           Perform update and constraints on: auto, cpu, gpu
 -[no]v                     (no)
           Be loud and noisy
 -pforce <real>             (-1)
           Print all forces larger than this (kJ/mol nm)
 -[no]reprod                (no)
           Try to avoid optimizations that affect binary reproducibility
 -cpt    <real>             (15)
   (ns/day)    (hour/ns) Performance:   Checkpoint interval (minutes)
  2.509-[no]cpnum          9.567

To run Gromacs with GPU

Code Block

% mpirun -np 4 -x UCX_NET_DEVICES=mlx5_0:1 -bind-to none --map-by node:PE=2 \
mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb gpu -pin on

Command line:
  mdrun_mpi -v -s stmv.tpr -nsteps 10000 -noconfout -nb gpu -pin on

Reading file stmv.tpr, VERSION 2018.1 (single precision)
Note: file tpx version 112, software tpx version 119
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
Changing nstlist from 10 to 100, rlist from 1.2 to 1.339


On host ops003.hpcadvisorycouncil.com 2 GPUs selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node:
  PP:0,PP:1
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the CPU
Using 4 MPI processes
Using 2 OpenMP threads per MPI process
...
imb F  0% step 9900, remaining wall clock time:     3 s
imb F  0% step 10000, remaining wall clock time:     0 s


Dynamic load balancing report:
 DLB was off during the run due to low measured imbalance.
 Average load imbalance: 0.2%.
 The balanceable part of the MD step is 59%, load imbalance is computed from this.
 Part of the total run time spent waiting due to load imbalance: 0.1%. (no)
           Keep and number checkpoint files
 -[no]append                (yes)
           Append to previous output files when continuing from checkpoint
           instead of adding the simulation part number to all file names
 -nsteps <int>              (-2)
           Run this number of steps (-1 means infinite, -2 means use mdp
           option, smaller is invalid)
 -maxh   <real>             (-1)
           Terminate after 0.99 times this time (hours)
 -replex <int>              (0)
           Attempt replica exchange periodically with this period (steps)
 -nex    <int>              (0)
           Number of random exchanges to carry Coreout teach (s)exchange interval  Wall(N^3
t (s)        (%)  is one suggestion).  -nex zero Time:or not specified gives neighbor
2581.428      322.681     replica 800exchange.0
 -reseed  <int>              (ns/day-1)
   (hour/ns) Performance:       Seed 5.356for replica exchange, -1 is generate a  4.481seed

Version	Old Version 7	New Version Current
Changes made by	Ophir Maor	David Jaekyu Cho
Saved on	Jun 09, 2020	Jun 12, 2020

Versions Compared

Key

Refereneces

Build and Install Gromacs

Build Gromacs with GPU support

Run Gromacs

Build Gromacs with GPU support

Run Gromacs

Other command options

Page Comparison

Versions Compared

Key

Refereneces

Build and Install Gromacs

Build Gromacs with GPU support

Run Gromacs

Build Gromacs with GPU support

Run Gromacs

Other command options