Overview

Here we provide only the additional information needed to run WRF with the input data being provided with the actual competition. For information on WRF itself and how to build it and run it, please review the WRF single domain case instructions. It is strongly recommended that you go through the practice WRF case first.

Introduction to WRF and 3-domain case

Given by by Software Engineer David Gill, National Center for Atmospheric Research (NCAR)

Slides:

Video:

Running WRF with the 3-domain data

The input files for the WRF competition run can be downloaded from here (the uncompressed tar file is about 6.7 GB). For reference, the design for the 3-domain input was kindly provided by Xu Zhen, acknowledged in the 2020 Joint WRF/MPAS Users' Workshop Jian-Wen Bao's presentation . Slide 24 of his presentation has a figure depicting the locations of the three nested computational domains that WRF uses.

After you have downloaded the tarball and extracted the files, you should have a WRF_ISC21_3_DOMAIN directory. This directory contains two types of files: those that are required to run the simulation (input data), and those files that are required for the validation step. The contents of this tar file are similar to the single domain practice case for WRF, but not identical.

$ cd WRF_ISC21_3_DOMAIN
$ ls -R 
.:
anova.f90       fields_d02.txt  namelist.input   timing.csh            wrfinput_d01
EXEMPLAR        fields_d03.txt  qr_acr_qgV2.dat  validate.csh          wrfinput_d02
f2p.py          freezeH2O.dat   qr_acr_qsV2.dat  wrfbdy_d01            wrfinput_d03
fields_d01.txt  Makefile        RUN.slurm        wrf_bench_checker.py

./EXEMPLAR:
ISC2021_WRF_VALIDATION_OUTPUT_02_2016-08-29_09:00:00  ISC2021_WRF_VALIDATION_OUTPUT_02_2016-08-29_09:03:00
ISC2021_WRF_VALIDATION_OUTPUT_02_2016-08-29_09:01:00  ISC2021_WRF_VALIDATION_OUTPUT_02_2016-08-29_09:04:00
ISC2021_WRF_VALIDATION_OUTPUT_02_2016-08-29_09:02:00  ISC2021_WRF_VALIDATION_OUTPUT_02_2016-08-29_09:05:00

Files required for validating the simulation results:

The Makefile is intended for building an anova executable from the anova.f90 source; anova is built and used by the validation script validate.csh. Currently, this is set up to use the Intel compiler.
The Python scripts f2p.py and wrf_bench_checker.py, as well as timing.csh, are also used by validate.csh. These assume python3 is available.
The EXEMPLAR directory contains reference output files that are used for comparisons.

Files required for running the actual WRF simulation:

The data files required by WRF are namelist.input, wrfinput_d01, wrfinput_d02, wrfinput_d03 and wrfbdy_d01, as well as fields_d01.txt, fields_d02.txt and fields_d03.txt. As with the practice test case, the three files with the .dat suffix are also read by WRF; they are not strictly necessary because they would be computed if not present, but we provide them to avoid the extra computation.
WRF also requires a set of fixed, auxiliary data files that are found in the run directory of the main source directory, WRF_ISC21. You may copy those files to WRF_ISC21_3_DOMAIN or make symbolic links to them. Let's assume WRF_ISC21 is at the same directory level as WRF_ISC21_3_DOMAIN. Then, the following will create the symbolic links, including a symbolic link to the WRF executable. We are being careful to not clobber our new 3-domain run-time configuration file, namelist.input, with our link commands. Assume that we are “in” the WRF_ISC21_3_DOMAIN directory.

cp namelist.input namelist.input.ORIGINAL
for i in ../WRF_ISC21/run/* ; do ln -sf $i . ; done
rm namelist.input
cp namelist.input.ORIGINAL namelist.input

Performance Testing

The single run of the 3-domain WRF case takes about 25 minutes before any of the timing can be measured, as we put all of the validation output at the beginning of the simulation. For TEMPORARY performance tuning, you can disable the output of the history data by editing the namelist.input file (basically these commands say “DO NOT OUTPUT ANY GRIDDED DATA!”).

&time_control
   io_form_auxhist24 = 0
   auxhist24_interval_s = 00, 00, 00,

This would allow you to run through various compiler options (all of which require the clean, configure, compile steps) and the run-time options (such as fine tuning the interplay between OpenMP and MPI). Remember to set this back to the default value when generating the final output for the validation step.

&time_control
   io_form_auxhist24 = 2
   auxhist24_interval_s = 60, 60, 60,

Validation and Timing

There are two scripts that are provided for the competition for the WRF model. The first is a simple timing script – how long does the model take to run. The script completes in about one second. The other script is a validation script, that includes some python3 and Fortran. This takes about one minute to complete. Both are run on a single processor. Both produce some simple text output that needs to be captured and used as part of the competition metrics.

Timing Script

The timing script uses the generated rsl.out.0000 file as input, and contains the usual Linux text processors: grep / cat / awk / sed. The script reads past timing data that is contaminated with initialization and I/O. Two separate types of times are reported, representing various physical processes. By separating these times, the standard deviations are fairly small (less than 1%). Note the trailing “.” as an argument to the timing command. It is the Linux shorthand for “the current working directory”, and it is required as directory argument for this command.

$ ./timing.csh .

WRF RUN COMPLETED
MPI ranks used: 10 x 16 = 160
Domain size :
 ids,ide,jds,jde            1         793           1         853
 ids,ide,jds,jde            1         805           1         805
 ids,ide,jds,jde            1        1001           1        1001
WRF NUMBER OF TILES FROM OMP_GET_MAX_THREADS =   1
Average Time for radiation:      33.1845 ± 0.0183004 s (7 times)
Average Time for non-radiation:  16.6742 ± 0.0197135 s (88 times)
Total Time:                    1699.62 s (95 times)

Validation Script

The validation script compares the WRF output that you generated to some exemplar data that we generated. The comparison uses an ANOVA system (ANalysis Of VAriance). In this case, the two command-line arguments are still directories: the first (EXEMPLAR) is where the exemplar data that we generated is stored, and the second (“.” the current working directory) is where your manufactured data is located.

$ ./validate.csh EXEMPLAR .

running python script to construct differences, takes about 1 minute

compiler comparison for theta
Input, F-statistic: 0.002717691300108771
Input, df factor: 1
Input, df error: 2736
p-value probability = 1.0 means 100% reject null hypothesis that means are same
p-value probability = 0.04157226537922372

We are pretty darn confident that the vendor vs exemplar comparisons are OK
 
            ▕▔▔▔╲ 
             ▏  ▕ 
             ▏  ▕ 
             ▏  ▕ 
             ▏  ▕▂▂▂▂
      ▂▂▂▂▂▂╱┈▕      ▏
      ▉▉▉▉▉┈┈┈▕▂▂▂▂▂▂▏
      ▉▉▉▉▉┈┈┈▕      ▏
      ▉▉▉▉▉┈┈┈▕▂▂▂▂▂▂▏
      ▉▉▉▉▉┈┈┈▕      ▏
      ▉▉▉▉▉┈┈┈▕▂▂▂▂▂▂▏
      ▉▉▉▉▉╲┈┈▕      ▏
      ▔▔▔▔▔▔╲▂▕▂▂▂▂▂▂▏
 
 

compiler comparison for qv
Input, F-statistic: 1.893986548387782e-08
Input, df factor: 1
Input, df error: 2736
p-value probability = 1.0 means 100% reject null hypothesis that means are same
p-value probability = 0.00010979658231962368

We are pretty darn confident that the vendor vs exemplar comparisons are OK
 
            ▕▔▔▔╲ 
             ▏  ▕ 
             ▏  ▕ 
             ▏  ▕ 
             ▏  ▕▂▂▂▂
      ▂▂▂▂▂▂╱┈▕      ▏
      ▉▉▉▉▉┈┈┈▕▂▂▂▂▂▂▏
      ▉▉▉▉▉┈┈┈▕      ▏
      ▉▉▉▉▉┈┈┈▕▂▂▂▂▂▂▏
      ▉▉▉▉▉┈┈┈▕      ▏
      ▉▉▉▉▉┈┈┈▕▂▂▂▂▂▂▏
      ▉▉▉▉▉╲┈┈▕      ▏
      ▔▔▔▔▔▔╲▂▕▂▂▂▂▂▂▏
 
 

compiler comparison for u
Input, F-statistic: 2.144475172374168e-06
Input, df factor: 1
Input, df error: 2736
p-value probability = 1.0 means 100% reject null hypothesis that means are same
p-value probability = 0.001168317102382205

We are pretty darn confident that the vendor vs exemplar comparisons are OK
 
            ▕▔▔▔╲ 
             ▏  ▕ 
             ▏  ▕ 
             ▏  ▕ 
             ▏  ▕▂▂▂▂
      ▂▂▂▂▂▂╱┈▕      ▏
      ▉▉▉▉▉┈┈┈▕▂▂▂▂▂▂▏
      ▉▉▉▉▉┈┈┈▕      ▏
      ▉▉▉▉▉┈┈┈▕▂▂▂▂▂▂▏
      ▉▉▉▉▉┈┈┈▕      ▏
      ▉▉▉▉▉┈┈┈▕▂▂▂▂▂▂▏
      ▉▉▉▉▉╲┈┈▕      ▏
      ▔▔▔▔▔▔╲▂▕▂▂▂▂▂▂▏

Submissions

The runs should be done on the niagara cluster on 4 compute nodes.
Each team will need to deliver the timing results and validation results for the 3-domain case (only) for their runs. Participants need to upload these files to the oneDrive folder for the team. Also to be uploaded are the rsl.out.0000 and namelist.output files (both small text files). Lastly, the model output (all of the files ISC2021_WRF_* for the simulation that you are reporting) needs to be retained in your directory, but SHOULD NOT BE UPLOADED!
Analyze the difference between AVX2 and AVX512 compilations on the niagara cluster. Which one gets you better results? Present this during the interview.
For the interview, present the performance differences considering aspects of OpenMP and MPI. What was the mix of MPI and OpenMP threads that worked best? What aspect ratio shape for the OpenMP threads and the MPI ranks worked best? What novel performance ideas did you try?
Run Generate IPM profile of the run over Niagara cluster, and submit it with the results to the one drive (as PDF file).
Generate a figure from your generated WRF output and submit it with the results to the one drive (as PDF file).