Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Input file and instructions are available at https://github.com/lattice/quda/wiki/Running-the-NERSC-MILC-Benchmarks

We will be using 24x24x24x60the medium benchmark,36x36x36x72.chklat for the competition.

Code Block
wget https://portal.nersc.gov/project/m888/apex/MILC_160413.tgz
tar xvzf MILC_160413.tgz
cd MILC-apex/benchmarks/medium
wget https://portal.nersc.gov/project/m888/apex/MILC_lattices/24x24x24x6036x36x36x72.chklat
cp# Edit and execute medium/run_medium.sh run-milc.sh
# Change problem size in run-milc.sh to 24 24 24 60 and adjust other parameters.
# Execute run-milc.sh 

Sample output:

Code Block
Running "mpirun -np 1 -x UCX_NET_DEVICES=mlx5_0:1  ./su3_rhmd_hisq"
Ignoring PCI device with non-16bit domain.
Pass --enable-32bits-pci-domain to configure to support such devices
(warning: it would break the library ABI, don't enable unless really needed).
com_qmp: set thread-safety level to 0
SU3 with improved KS action
Microcanonical simulation with refreshing
Rational function hybrid Monte Carlo algorithm
MIMD version 3be2-dirty
Machine = QMP (portable), with 1 nodes
...
Options selected...
Generic double precision
C_GLOBAL_INLINE
FEWSUMS
KS_MULTICG=HYBRID
KS_MULTIFF=FNMAT
VECLENGTH=4
INT_ALG=INT_3G1F
HISQ_REUNIT_ALLOW_SVD
HISQ_REUNIT_SVD_REL_ERROR = 1e-08
HISQ_REUNIT_SVD_ABS_ERROR = 1e-08
HISQ_FORCE_FILTER = 5e-05
HISQ_FF_MULTI_WRAPPER is ON
type 0 for no prompts, 1 for prompts, or 2 for proofreading
nx 36
ny 36
nz 36
nt 72
...
       initQuda-endQuda Total time =  2806.835 secs

                   QUDA Total time =  2458.793 secs
                 download     =   133.950 secs (  5.448%),       with     6136 calls at 2.183e+04 us per call
                   upload     =   143.992 secs (  5.856%),       with     3255 calls at 4.424e+04 us per call
                     init     =    47.168 secs (  1.918%),       with    10661 calls at 4.424e+03 us per call
                 preamble     =     0.252 secs (  0.010%),       with     3174 calls at 7.952e+01 us per call
                  compute     =  2098.426 secs ( 85.344%),       with     9173 calls at 2.288e+05 us per call
                    comms     =     1.242 secs (  0.051%),       with      861 calls at 1.443e+03 us per call
                 epilogue     =    20.609 secs (  0.838%),       with     3180 calls at 6.481e+03 us per call
                     free     =    11.711 secs (  0.476%),       with     6848 calls at 1.710e+03 us per call
        total accounted       =  2457.351 secs ( 99.941%)
        total missing         =     1.442 secs (  0.059%)

Device memory used = 30996.2 MiB
Pinned device memory used = 0.0 MiB
Managed memory used = 5689.6 MiB
Shmem memory used = 0.0 MiB
Page-locked host memory used = 20174.4 MiB
Total host memory used >= 26043.5 MiB

Submissions

Submit your build & run script scripts and the output file.