Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The MILC Code is a body of high performance research software written in C (with some C++) for doing SU(3) lattice gauge theory on high performance computers as well as single-processor workstations.

Compiling

...

MILC (CPU only)

Code Block
# fall3d-8.1.2.tar.gz will be provided during competition.
git clone --branch develop https://github.com/milc-qcd/milc_qcd.git
cd milc_qcd/ks_imp_rhmc
cp ../Makefile .
# Edit compile_su3_rhmd_hisq_quda.sh
# Remove QUDA, CUDA, GPU flags and set compilers, arch etc.
#
module load intel/2021.42022.3.1 compiler mkl
module load compilerhpcx/20212.414.0
module load mkl/2021.4.0
# Build netcdf and set include and lib paths

./configure -prefix=<install path> --enable-parallel

make clean
make -j 32 
make install

Running FALL3D

A sample example is available under Example directory.

Code Block
cd Example
NPROC=160
mpirun -np $NPROC <MPI Flags> Fall3d.r8.x Example.inp 8 5 4 -nens 1
# Make sure the process grid matches with NPROC, i.e. 8 * 5 * 4 = 160

Tasks and Submissions

Input is related Mount St. Helens eruption, https://www.usgs.gov/media/videos/mount-st-helens-eruption-may-18-1980.

Code Block
# Copy input from USB to a folder.
NPROC=160
mpirun -np $NPROC <MPI Flags> Fall3d.r8.x fall3d helens.inp 8 5 4
# Make sure the process grid matches with NPROC, i.e. 8 * 5 * 4 = 160

Try different processor grids for better performance.

For debugging or quick testing, you can decrease the duration of simulation by changing the below parameter in helens.inp but make sure to use the original value for the submission.

Code Block
   RUN_END_(HOURS_AFTER_00)   = 55    

...


export OMPI_MPICC=icx
./compile_su3_rhmd_hisq_quda.sh

Compiling MILC (GPU support)

Check the below link for building MILC with QUDA.

https://github.com/lattice/quda/wiki/MILC-with-QUDA

Running MILC

Input file and instructions are available at https://github.com/lattice/quda/wiki/Running-the-NERSC-MILC-Benchmarks

We will be using the medium benchmark,36x36x36x72.chklat for the competition.

Code Block
wget https://portal.nersc.gov/project/m888/apex/MILC_160413.tgz
tar xvzf MILC_160413.tgz
cd MILC-apex/benchmarks/medium
wget https://portal.nersc.gov/project/m888/apex/MILC_lattices/36x36x36x72.chklat
# Edit and execute run_medium.sh 

Sample output:

Code Block
Running "mpirun -np 1 -x UCX_NET_DEVICES=mlx5_0:1  ./su3_rhmd_hisq"
Ignoring PCI device with non-16bit domain.
Pass --enable-32bits-pci-domain to configure to support such devices
(warning: it would break the library ABI, don't enable unless really needed).
com_qmp: set thread-safety level to 0
SU3 with improved KS action
Microcanonical simulation with refreshing
Rational function hybrid Monte Carlo algorithm
MIMD version 3be2-dirty
Machine = QMP (portable), with 1 nodes
...
Options selected...
Generic double precision
C_GLOBAL_INLINE
FEWSUMS
KS_MULTICG=HYBRID
KS_MULTIFF=FNMAT
VECLENGTH=4
INT_ALG=INT_3G1F
HISQ_REUNIT_ALLOW_SVD
HISQ_REUNIT_SVD_REL_ERROR = 1e-08
HISQ_REUNIT_SVD_ABS_ERROR = 1e-08
HISQ_FORCE_FILTER = 5e-05
HISQ_FF_MULTI_WRAPPER is ON
type 0 for no prompts, 1 for prompts, or 2 for proofreading
nx 36
ny 36
nz 36
nt 72
...
       initQuda-endQuda Total time =  2806.835 secs

                   QUDA Total time =  2458.793 secs
                 download     =   133.950 secs (  5.448%),       with     6136 calls at 2.183e+04 us per call
                   upload     =   143.992 secs (  5.856%),       with     3255 calls at 4.424e+04 us per call
                     init     =    47.168 secs (  1.918%),       with    10661 calls at 4.424e+03 us per call
                 preamble     =     0.252 secs (  0.010%),       with     3174 calls at 7.952e+01 us per call
                  compute     =  2098.426 secs ( 85.344%),       with     9173 calls at 2.288e+05 us per call
                    comms     =     1.242 secs (  0.051%),       with      861 calls at 1.443e+03 us per call
                 epilogue     =    20.609 secs (  0.838%),       with     3180 calls at 6.481e+03 us per call
                     free     =    11.711 secs (  0.476%),       with     6848 calls at 1.710e+03 us per call
        total accounted       =  2457.351 secs ( 99.941%)
        total missing         =     1.442 secs (  0.059%)

Device memory used = 30996.2 MiB
Pinned device memory used = 0.0 MiB
Managed memory used = 5689.6 MiB
Shmem memory used = 0.0 MiB
Page-locked host memory used = 20174.4 MiB
Total host memory used >= 26043.5 MiB

Submissions

Submit your build & run scripts and the output file.