...
Input file and instructions are available at https://github.com/lattice/quda/wiki/Running-the-NERSC-MILC-Benchmarks
We will be using 24x24x24x60the medium benchmark,36x36x36x72.chklat
for the competition.
Code Block |
---|
wget https://portal.nersc.gov/project/m888/apex/MILC_160413.tgz tar xvzf MILC_160413.tgz cd MILC-apex/benchmarks/medium wget https://portal.nersc.gov/project/m888/apex/MILC_lattices/24x24x24x6036x36x36x72.chklat cp# Edit and execute medium/run_medium.sh run-milc.sh # Change problem size in run-milc.sh to 24 24 24 60 and adjust other parameters. # Execute run-milc.sh |
Sample output:
Code Block |
---|
Running "mpirun -np 1 -x UCX_NET_DEVICES=mlx5_0:1 ./su3_rhmd_hisq"
Ignoring PCI device with non-16bit domain.
Pass --enable-32bits-pci-domain to configure to support such devices
(warning: it would break the library ABI, don't enable unless really needed).
com_qmp: set thread-safety level to 0
SU3 with improved KS action
Microcanonical simulation with refreshing
Rational function hybrid Monte Carlo algorithm
MIMD version 3be2-dirty
Machine = QMP (portable), with 1 nodes
...
Options selected...
Generic double precision
C_GLOBAL_INLINE
FEWSUMS
KS_MULTICG=HYBRID
KS_MULTIFF=FNMAT
VECLENGTH=4
INT_ALG=INT_3G1F
HISQ_REUNIT_ALLOW_SVD
HISQ_REUNIT_SVD_REL_ERROR = 1e-08
HISQ_REUNIT_SVD_ABS_ERROR = 1e-08
HISQ_FORCE_FILTER = 5e-05
HISQ_FF_MULTI_WRAPPER is ON
type 0 for no prompts, 1 for prompts, or 2 for proofreading
nx 36
ny 36
nz 36
nt 72
...
initQuda-endQuda Total time = 2806.835 secs
QUDA Total time = 2458.793 secs
download = 133.950 secs ( 5.448%), with 6136 calls at 2.183e+04 us per call
upload = 143.992 secs ( 5.856%), with 3255 calls at 4.424e+04 us per call
init = 47.168 secs ( 1.918%), with 10661 calls at 4.424e+03 us per call
preamble = 0.252 secs ( 0.010%), with 3174 calls at 7.952e+01 us per call
compute = 2098.426 secs ( 85.344%), with 9173 calls at 2.288e+05 us per call
comms = 1.242 secs ( 0.051%), with 861 calls at 1.443e+03 us per call
epilogue = 20.609 secs ( 0.838%), with 3180 calls at 6.481e+03 us per call
free = 11.711 secs ( 0.476%), with 6848 calls at 1.710e+03 us per call
total accounted = 2457.351 secs ( 99.941%)
total missing = 1.442 secs ( 0.059%)
Device memory used = 30996.2 MiB
Pinned device memory used = 0.0 MiB
Managed memory used = 5689.6 MiB
Shmem memory used = 0.0 MiB
Page-locked host memory used = 20174.4 MiB
Total host memory used >= 26043.5 MiB |
Submissions
Submit your build & run script scripts and the output file.