GRID (ISC18)

Welcome to ISC18 Student Cluster Competition.

The version of GRID that will be used for the Student Cluster Competition at ISC18 is 0.8.1, which can be found at https://github.com/paboyle/Grid/tree/0.8.1.

The executable that will be used during the actual competition is Test_compressed_lanczos_hot_start, located in the tests directory.  Per the directions below it is recommended to copy this executable to the installation bin directory.   This test program requires an input file called Params.xml, which must be present in the working directory where the program is run.  There will be two test jobs: one with the saveEvecs parameter set to true in Params.xml and another with saveEvecs set to false.  The first setting causes a large binary file to be written at the end of the job; the second causes the program to do minimal I/O.

Note: For the test job in which Test_compressed_lanczos_hot_start writes a large data file, Grid will have to be built with the C-LIME library (http://usqcd-software.github.io/c-lime), so it is best to include it from the beginning.

General instructions for building and running Grid can be found at https://github.com/paboyle/Grid/wiki/Dirac-ITT-Benchmarks. 

To practice before the competition you can use the Benchmark_ITT, which does no I/O other than to standard output and requires no input file.

GRID requires an MPI implementation that supports MPI_THREAD_MULTIPLE, and runs best with only a few MPI ranks per node (typically, two or four).  Make sure the MPI implementation you use does support MPI_THREAD_MULTIPLE.

Here is a step by step example of building Grid for use on a cluster with Intel "Skylake" processors.  Replace all absolute paths with whatever is appropriate for your system.

  1. Create a directory where you want to do your build, and change into it:

     cd /mnt/lustre/users/gerardo/scc_isc18
  2. Set your environment to use the appropriate compilers and MPI implementation (in this example, Intel compilers and HPC-X)

    module purge
    module load intel/ics-18.0.1 hpcx-ga-icc-mt

     
  3. Get the C-LIME tarball from GitHub, extract its contents, build it, and install it:

    wget http://usqcd-software.github.io/downloads/c-lime/lime-1.3.2.tar.gz
    tar zxvf lime-1.3.2.tar.gz
    cd lime-1.3.2
    CC=icc ./configure --prefix=/labhome/gerardo/lime_132_i18
    make 2>&1 | tee make_i18.log
    make check 2>&1 | tee check_i18.log
    make install 2>&1 | tee install_i18.log
    cd ..
  4. Get the Grid tarball from GitHub, extract its contents, and go down into the Grid-0.8.1 directory:

    wget https://github.com/paboyle/Grid/archive/0.8.1.tar.gz
    tar zxvf v0.8.1.tar.gz
    cd Grid-0.8.1/


  5. This next step only needs to be done once before configuring and building:

    ./bootstrap.sh

    Further configuration and building (e.g., to try out various configuration parameters) will not require redoing this step.


     
  6. Make a build directory and go down into it:

    mkdir build_i18h21; cd build_i18h21

  7. Configure the build:

    OMPI_CXX=icpc ../configure --enable-precision=single --enable-simd=AVX512 \
    --enable-comms=mpi3 --enable-mkl CXX=mpicxx LIBS=-lrt \
    --with-lime=/labhome/gerardo/lime_132_i18 --prefix=/labhome/gerardo/Grid_0.8.1_i18h21


    A few items to note about the options and arguments above:

    For the competition, be sure to enable single precision.

    The 'LIBS=-lrt' setting is needed for OpenMPI builds; if building with Intel MPI it would not be needed.

    The --enable-simd, --enable-comms and --enable-mkl options are described in the general instructions.
  8.  Make the library, run the checks, make the tests and install the library:

    make 2>&1 | tee make_i18h21.log
    make check 2>&1 | tee check_i18h21.log
    make tests 2>&1 | tee tests_i18h21.log
    make install 2>&1 | tee install_i17h21.log

    If your --enable-simd option calls for a specific instruction set (such as AVX512 in the example above), then it is best to run the make commands on a server with processors that support the instruction set.  Otherwise, some of the make commands will fail when trying to execute test programs.

  9. Copy the competition test program to the installation bin directory (Benchmark_ITT should already be there):

    cp -p tests/Test_compressed_lanczos_hot_start /labhome/gerardo/Grid_0.8.1_i18h21/bin/

  10. Run the program:

    mpirun -np 16 --map-by ppr:2:node --bind-to socket -report-bindings --display-map \
    -mca coll_hcoll_enable 0 -mca mtl ^mxm -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 \
    /labhome/gerardo/Grid_0.8.1_i18h21/bin/Benchmark_ITT \
    --mpi 2.2.2.2 --shm 2048 --threads 18


    In the example above (run from within a batch script that called for running the job on 8 nodes each having two 18-core processors), all of the options that follow '-np 16' and precede the Benchmark_ITT path are specific to OpenMPI and HPC-X, and the options that follow are Benchmark_ITT parameters, described in the general instructions.

    Be aware that, in particular, the product of the four integers in the --mpi option (2.2.2.2, in the example above) must be equal to the total number of MPI ranks (16, in the example).

  11. For Benchmark_ITT, using grep, look for the line with the string "Comparison point  *result:" in the standard output.  This is the figure you want to make as large as possible.

    For Test_compressed_lanczos_hot_start  in the competition. Look for the "Time to solution" at the end of the output.

    Grid : Message : 602642 ms : Computation time is 523.439286 seconds
    Grid : Message : 602642 ms : I/O         time is 69.615096 seconds
    Grid : Message : 602642 ms : Time to solution is 593.054382 seconds
    Grid : Message : 602642 ms : Done