LAMMPS for ISC21
Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a classical molecular dynamics code. LAMMPS has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. LAMMPS is distributed as an open source code under the terms of the GPL. More information on LAMMPS can be found at the LAMMPS web site: http://lammps.sandia.gov.
Introduction to LAMMPS
Slides are here:
Build LAMMPS for CPUs
This is an example that uses HPC-X MPI.
Load the relevant modules
module load intel/2020.4.304
module load mkl/2020.4.304
module load tbb/2020.4.304
module load hpcx-2.7.0
2. Make
# Download latest stable version from https://lammps.sandia.gov/download.html.
tar xf lammps-stable.tar.gz
cd lammps-29Oct20/src
TARGET=intel_cpu_openmpi
sed -e "s/xHost/xCORE-AVX512/g" -i MAKE/OPTIONS/Makefile.$TARGET
make clean-all
make no-all
make no-lib
make yes-manybody yes-molecule yes-replica yes-kspace yes-asphere yes-rigid yes-snap yes-user-omp yes-user-reaxc yes-user-omp
make yes-user-intel
make -j 32 $TARGET
Input example
For “3d Lennard-Jones melt” , see https://lammps.sandia.gov/bench/in.lj.txt
Run example
mpirun -np 160 lammps/lmp_mpi-hpcx-2.7.0.AVX512 -in in.lj.all.inp -pk intel 0 omp 1 -sf intel
Output example
We will be looking on the performance tau/day and timesteps/s values.
For GPUs
LAMMPS Accelerator Package Documentation:
https://lammps.sandia.gov/doc/Speed_packages.html
https://lammps.sandia.gov/doc/Build_extras.html
On GPUs, timing breakdown won’t be accurate without CUDA_LAUNCH_BLOCKING=1 (will slow down simulation though).
By default for Kokkos, KSpace (including FFTs) run on GPU, but can change to run on CPU and overlap with force bonded interactions.
Configuration Changes allowed
Number of MPI ranks
MPI and thread affinity
Number of OpenMP threads per MPI rank
Compiler optimization flags
Can compile with “-default-stream per-thread”
FFT library
MPI library
Compiler version
CUDA version
CUDA-aware flags
CUDA MPS on/off
Can use any LAMMPS accelerator package
Any package option (see https://lammps.sandia.gov/doc/package.html), except precision
Coulomb cutoff
Can use atom sorting
Newton flag on/off
Can add load balancing
Can use LAMMPS “processors” command
Can turn off tabulation in pair_style (i.e “pair_modify table 0”)
Can use multiple Kokkos backends (e.g. CUDA + OpenMP)
Can use “kk/device” or “kk/host” suffix for any kernel to run on CPU or GPU
Configuration Changes not allowed
Modifying any style: pair, fix, kspace, etc.
Number of atoms
Timestep value
Number of timesteps
Neighborlist parameters (except binsize)
Changing precision (must use double precision FP64)
LJ charmm cutoff
Visualizing the results
LAMMPS “dump image” command: https://lammps.sandia.gov/doc/dump_image.html
Here is an example:
Input Files
LJ and Rhodo input files can be downloaded here.
Tasks and Submissions
Note: You need to run this on both clusters 4 CPU nodes, or 4 GPUs. and to supply two results, one per cluster.
Run LAMMPS with the given inputs. On both Niagara and NSCC clusters, you can use up to 4 nodes or 4 GPUs for this run. Submit the results to OneDrive team folder. Change the tunables parameters and see what gives you the best performance.
Run IPM LAMMPS profile on one of the clusters on 4 nodes, what are the main MPI calls used. submit the profile results in PDF format. Need to submit a profile per input.
Visualize the input files. Generate a video or image out of the run.
For teams with twitter account, tweet your video or image, tagged with your team name/university with the hashtags : #ISC21 #ISC21_SCC #LAMMPS