LAMMPS for ISC21

Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a classical molecular dynamics code. LAMMPS has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. LAMMPS is distributed as an open source code under the terms of the GPL. More information on LAMMPS can be found at the LAMMPS web site: http://lammps.sandia.gov.

 

 

Introduction to LAMMPS

 

Slides are here:

 

Build LAMMPS for CPUs

This is an example that uses HPC-X MPI.

  1. Load the relevant modules

module load intel/2020.4.304 module load mkl/2020.4.304 module load tbb/2020.4.304 module load hpcx-2.7.0

2. Make

# Download latest stable version from https://lammps.sandia.gov/download.html. tar xf lammps-stable.tar.gz cd lammps-29Oct20/src TARGET=intel_cpu_openmpi sed -e "s/xHost/xCORE-AVX512/g" -i MAKE/OPTIONS/Makefile.$TARGET make clean-all make no-all make no-lib make yes-manybody yes-molecule yes-replica yes-kspace yes-asphere yes-rigid yes-snap yes-user-omp yes-user-reaxc yes-user-omp make yes-user-intel make -j 32 $TARGET

Input example

For “3d Lennard-Jones melt” , see https://lammps.sandia.gov/bench/in.lj.txt

 

Run example

mpirun -np 160 lammps/lmp_mpi-hpcx-2.7.0.AVX512 -in in.lj.all.inp -pk intel 0 omp 1 -sf intel

Output example

We will be looking on the performance tau/day and timesteps/s values.

 

For GPUs

LAMMPS Accelerator Package Documentation:

https://lammps.sandia.gov/doc/Speed_packages.html
https://lammps.sandia.gov/doc/Build_extras.html

On GPUs, timing breakdown won’t be accurate without CUDA_LAUNCH_BLOCKING=1 (will slow down simulation though).

By default for Kokkos, KSpace (including FFTs) run on GPU, but can change to run on CPU and overlap with force bonded interactions.

Configuration Changes allowed

  • Number of MPI ranks

  • MPI and thread affinity

  • Number of OpenMP threads per MPI rank

  • Compiler optimization flags

  • Can compile with “-default-stream per-thread”

  • FFT library

  • MPI library

  • Compiler version

  • CUDA version

  • CUDA-aware flags

  • CUDA MPS on/off

  • Can use any LAMMPS accelerator package

  • Any package option (see https://lammps.sandia.gov/doc/package.html), except precision

  • Coulomb cutoff

  • Can use atom sorting

  • Newton flag on/off

  • Can add load balancing

  • Can use LAMMPS “processors” command

  • Can turn off tabulation in pair_style (i.e “pair_modify table 0”)

  • Can use multiple Kokkos backends (e.g. CUDA + OpenMP)

  • Can use “kk/device” or “kk/host” suffix for any kernel to run on CPU or GPU

 

Configuration Changes not allowed

  • Modifying any style: pair, fix, kspace, etc.

  • Number of atoms

  • Timestep value

  • Number of timesteps

  • Neighborlist parameters (except binsize)

  • Changing precision (must use double precision FP64)

  • LJ charmm cutoff

Visualizing the results

 

Here is an example:

 

Input Files

LJ and Rhodo input files can be downloaded here.

 

Tasks and Submissions

 

Note: You need to run this on both clusters 4 CPU nodes, or 4 GPUs. and to supply two results, one per cluster.

 

  1. Run LAMMPS with the given inputs. On both Niagara and NSCC clusters, you can use up to 4 nodes or 4 GPUs for this run. Submit the results to OneDrive team folder. Change the tunables parameters and see what gives you the best performance.

  2. Run IPM LAMMPS profile on one of the clusters on 4 nodes, what are the main MPI calls used. submit the profile results in PDF format. Need to submit a profile per input.

  3. Visualize the input files. Generate a video or image out of the run.

  4. For teams with twitter account, tweet your video or image, tagged with your team name/university with the hashtags : #ISC21 #ISC21_SCC #LAMMPS