What is NEMO?

NEMO, which expands to Nucleus for European Modelling of the Ocean, is a state-of-the-art modelling framework for research activities and forecasting services in ocean and climate sciences, developed in a sustainable way by a European consortium. (https://www.nemo-ocean.eu/)

The version for which the following notes are valid is 4.0

Installing NEMO

Instructions for installing and running NEMO can be found in https://forge.ipsl.jussieu.fr/nemo/chrome/site/doc/NEMO/guide/html/install.html

The first thing to do is to set up the software environment: compilers, MPI and support libraries. When building to run on the HPC-AI Advisory Council’s clusters, the requisite HDF5, NetCDF and FCM libraries are made available using environment modules.

module purge
module load intel/2019.5.281 hpcx/2.5.0
module load hdf5/1.10.4-i195h250 netcdf/4.6.2-i195h250 fcm/2017.10.0

One additional library required by NEMO is XIOS; the Installation Guide recommends using version 2.5 rather than the latest available version. Instructions for building and testing XIOS are found here. First, check out a copy of XIOS version 2.5 (the command below extracts it to a new directory called XIOS).

svn co http://forge.ipsl.jussieu.fr/ioserver/svn/XIOS/branchs/xios-2.5 XIOS
cd XIOS

Under XIOS there is an “arch” subdirectory that contains files that determine how XIOS is built on various systems; you have to create your "arch/arch_YOUR_ARCH.fcm" and "arch/arch_YOUR_ARCH.path" files for your intended system, typically based on existing arch*.* files. We have chosen to build XIOS and NEMO for the Lab’s Helios cluster, which has Intel Xeon Gold 6138 (Skylake) processors. Here are the contents of the two architecture-related files:

arch/arch-skl_hpcx.fcm

################################################################################
###################                Projet XIOS               ###################
################################################################################

%CCOMPILER      mpicc
%FCOMPILER      mpif90 
%LINKER         mpif90  -nofor-main

%BASE_CFLAGS    -diag-disable 1125 -diag-disable 279
%PROD_CFLAGS    -O3 -D BOOST_DISABLE_ASSERTS
%DEV_CFLAGS     -g -traceback
%DEBUG_CFLAGS   -DBZ_DEBUG -g -traceback -fno-inline

%BASE_FFLAGS    -D__NONE__ 
%PROD_FFLAGS    -O3 -xCORE-AVX512
%DEV_FFLAGS     -g -O2 -traceback
%DEBUG_FFLAGS   -g -traceback

%BASE_INC       -D__NONE__
%BASE_LD        -lstdc++ 

%CPP            mpicc -EP
%FPP            cpp -P
%MAKE           make
[gerardo@login02 arch]$ cat arch-skl_hpcx.path
NETCDF_INCDIR="-I/$NETCDF_DIR/include"
NETCDF_LIBDIR="-L/$NETCDF_DIR/lib"
NETCDF_LIB="-lnetcdff -lnetcdf"

MPI_INCDIR=""
MPI_LIBDIR=""
MPI_LIB=""

HDF5_INCDIR="-I/$HDF5_DIR/include"
HDF5_LIBDIR="-L/$HDF5_DIR/lib"
HDF5_LIB="-lhdf5_hl -lhdf5 -lhdf5 -lz"

OASIS_INCDIR=""
OASIS_LIBDIR=""
OASIS_LIB=""

arch/arch-skl_hpcx.path

NETCDF_INCDIR="-I/$NETCDF_DIR/include"
NETCDF_LIBDIR="-L/$NETCDF_DIR/lib"
NETCDF_LIB="-lnetcdff -lnetcdf"

MPI_INCDIR=""
MPI_LIBDIR=""
MPI_LIB=""

HDF5_INCDIR="-I/$HDF5_DIR/include"
HDF5_LIBDIR="-L/$HDF5_DIR/lib"
HDF5_LIB="-lhdf5_hl -lhdf5 -lhdf5 -lz"

OASIS_INCDIR=""
OASIS_LIBDIR=""
OASIS_LIB=""

The environment variables NETCDF_DIR and HDF5_DIR are set up by the second ‘module load’ command above. Building XIOS is done using a script that comes with it:

./make_xios --job 16 --arch skl_hpcx 2>&1 | tee make_i195h250.log

We are now ready to continue with the installation of NEMO.

cd ..
svn co https://forge.ipsl.jussieu.fr/nemo/svn/NEMO/releases/release-4.0
ln -s release-4.0 NEMO_4
cd NEMO_4

As with XIOS, NEMO has an arch directory with files that determine how it is built on various platforms. You have to create your "arch/arch-YOUR_ARCH.fcm" file. Again, it is advisable to create that file based on one of the existing arch-*.fcm files. For our build intended to be run on the Lab’s Helios cluster, here are the contents of the corresponding arch-*.fcm file.

arch-skl_hpcx.fcm

# generic ifort compiler options for linux
#
# NCDF_HOME   root directory containing lib and include subdirectories for netcdf4
# HDF5_HOME   root directory containing lib and include subdirectories for HDF5
# XIOS_HOME   root directory containing lib for XIOS
# OASIS_HOME  root directory containing lib for OASIS
#
# NCDF_INC    netcdf4 include file
# NCDF_LIB    netcdf4 library
# XIOS_INC    xios include file    (taken into accound only if key_iomput is activated)
# XIOS_LIB    xios library         (taken into accound only if key_iomput is activated)
# OASIS_INC   oasis include file   (taken into accound only if key_oasis3 is activated)
# OASIS_LIB   oasis library        (taken into accound only if key_oasis3 is activated)
#
# FC          Fortran compiler command
# FCFLAGS     Fortran compiler flags
# FFLAGS      Fortran 77 compiler flags
# LD          linker
# LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries
# FPPFLAGS    pre-processing flags
# AR          assembler
# ARFLAGS     assembler flags
# MK          make
# USER_INC    complete list of include files
# USER_LIB    complete list of libraries to pass to the linker
# CC          C compiler used to compile conv for AGRIF
# CFLAGS      compiler flags used with CC
#
# Note that:
#  - unix variables "$..." are accpeted and will be evaluated before calling fcm.
#  - fcm variables are starting with a % (and not a $)
#
%NCDF_HOME           $NETCDF_DIR
%HDF5_HOME           $HDF5_DIR
%XIOS_HOME           /global/home/users/gerardo/gscratch/apac_scc/XIOS
%OASIS_HOME          /not/defined

%NCDF_INC            -I%NCDF_HOME/include 
%NCDF_LIB            -L%NCDF_HOME/lib -lnetcdff -lnetcdf -L%HDF5_HOME/lib -lhdf5_hl -lhdf5 -lhdf5
%XIOS_INC            -I%XIOS_HOME/inc 
%XIOS_LIB            -L%XIOS_HOME/lib -lxios -lstdc++
%OASIS_INC           -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1
%OASIS_LIB           -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip

%CPP                 cpp
%FC                  mpif90 -c -cpp
%FCFLAGS             -i4 -r8 -O3 -xCORE-AVX512 -fp-model precise -fno-alias
%FFLAGS              %FCFLAGS
%LD                  mpif90
%LDFLAGS             
%FPPFLAGS            -P -C -traditional
%AR                  ar
%ARFLAGS             rs
%MK                  make
%USER_INC            %XIOS_INC %OASIS_INC %NCDF_INC
%USER_LIB            %XIOS_LIB %OASIS_LIB %NCDF_LIB

%CC                  mpicc
%CFLAGS              -O0

You will need to adjust the path on the line that starts with %XIOS_HOME to point to your own build directory for XIOS.

NEMO can be built to run in several different configurations; we will use the GYRE_PISCES, which is listed along with several other reference configurations here. GYRE_PISCES involves three components and does not require any additional datasets to be downloaded. Here is the command to build NEMO with the above architecture file and the GYRE_PISCES configuration. The resulting build is created in the cfgs directory, in a subdirectory whose name is specified in the argument to the -n option of the build command:

./makenemo -m skl_hpcx -r GYRE_PISCES -n hpcx_gyre_pisces -j 16 2>&1 | tee make_gyre_pisces_i195h250.log

The results of the build are under cfgs/hpcx_gyre_pisces, and the place to run NEMO is in EXP00 in that directory; the latter subdirectory already contains a set of input files specific to GYRE_PISCES as well as symbolic links to other input files that are shared by various configurations.

The executable, nemo.exe, is found under cfgs/hpcx_gyre_pisces/BLD/bin, but there will already be a symlink called nemo under EXP00, too.

Running NEMO

To run NEMO we need to be in the experiment directory:

cd cfgs/hpcx_gyre_pisces/EXP00/

There is a configuration file, namelist_cfg, that provides several parameters for the actual test or benchmark runs. As it stands, the size of the problem to be run is very small and can be run on a very small number of processors. For our purposes, the variables we need to change to make our testing more interesting are nn_GYRE (which determines the size of the simulation domain) and ln_bench. Save a copy of the original file:

cp -p namelist_cfg orig_namelist_cfg

Edit namelist_cfg with your favorite editor to change nn_GYRE from 1 to 25, and ln_bench from .false. to .true.

[gerardo@login02 EXP00]$ diff orig_namelist_cfg namelist_cfg 
38,39c38,39
<    nn_GYRE     =     1     !  GYRE resolution [1/degrees]
<    ln_bench    = .false.   !  ! =T benchmark with gyre: the gridsize is kept constant
---
>    nn_GYRE     =    25     !  GYRE resolution [1/degrees]
>    ln_bench    = .true.    !  ! =T benchmark with gyre: the gridsize is kept constant

The most advisable way to run NEMO on the Helios cluster is to use a batch script. The scheduler in use on the Lab clusters is Slurm, so here is the Slurm script for running NEMO.

#!/bin/bash
#SBATCH -J gyre_p
#SBATCH -N 24
#SBATCH --tasks-per-node=40
#SBATCH -o gyre-24n960T-bmt25-%j.out
#SBATCH -t 01:00:00
#SBATCH -p helios
#SBATCH --exclusive
#SBATCH -d singleton

module purge
module load intel/2019.5.281 hpcx/2.5.0-mt hdf5/1.10.4-i195h250 netcdf/4.6.2-i195h250 fcm/2017.10.0
module list 2>&1

ulimit -s 10485760

cd $SLURM_SUBMIT_DIR

MPI=h250
TAG=${SLURM_JOB_NUM_NODES}n${SLURM_NTASKS}T-${MPI}-oob-bmt25-$SLURM_JOB_ID

echo "Running on $SLURM_JOB_NODELIST"
echo "Nnodes = $SLURM_JOB_NUM_NODES"
echo "Ntasks = $SLURM_NTASKS"
echo "Launch command: mpirun -np $SLURM_NTASKS --map-by core -report-bindings -mca io ompio -x UCX_NET_DEVICES=mlx5_0:1,mlx5_2:1 ./nemo"
/usr/bin/time -p mpirun -np $SLURM_NTASKS --map-by core -report-bindings -mca io ompio -x UCX_NET_DEVICES=mlx5_0:1,mlx5_2:1 ./nemo
mkdir $TAG
mv layout.dat output.namelist.??? communication_report.txt time.step GYRE_* ocean.output $TAG

One thing to note is that the domain decomposition used by NEMO does not allow it to be run with an arbitrary number of MPI ranks. When a job with specified number of ranks fails, NEMO will suggest a list of possible valid numbers near the failing choice:

The error message will appear in the ocean.output file; an example follows:

 mpp_init:
 ~~~~~~~~ 
 
   The number of mpi processes:   640
   exceeds the maximum number of subdomains (ocean+land) =   638
   defined by the following domain decomposition: jpni =   29 jpnj =   22
    You should: 
      - either prescribe your domain decomposition with the namelist variables
        jpni and jpnj to match the number of mpi process you want to use, 
        even IF it not the best choice...
      - or keep the automatic and optimal domain decomposition by picking up one
        of the number of mpi process proposed in the list bellow
 
 
                   For your information:
  list of the best partitions around   640 mpi processes
   ---------------------------------------------------------
 
nb_cores   594 oce +     0 land (   33 x   18 ), nb_points       750 (    25 x    30 )
nb_cores   598 oce +     0 land (   26 x   23 ), nb_points       744 (    31 x    24 )
nb_cores   600 oce +     0 land (   30 x   20 ), nb_points       729 (    27 x    27 )
nb_cores   609 oce +     0 land (   29 x   21 ), nb_points       728 (    28 x    26 )
nb_cores   616 oce +     0 land (   28 x   22 ), nb_points       725 (    29 x    25 )
nb_cores   621 oce +     0 land (   27 x   23 ), nb_points       720 (    30 x    24 )
nb_cores   624 oce +     0 land (   26 x   24 ), nb_points       713 (    31 x    23 )
nb_cores   625 oce +     0 land (   25 x   25 ), nb_points       704 (    32 x    22 )
nb_cores   630 oce +     0 land (   30 x   21 ), nb_points       702 (    27 x    26 )
nb_cores   638 oce +     0 land (   29 x   22 ), nb_points       700 (    28 x    25 )
nb_cores   644 oce +     0 land (   28 x   23 ), nb_points       696 (    29 x    24 )
nb_cores   648 oce +     0 land (   36 x   18 ), nb_points       690 (    23 x    30 )
nb_cores   650 oce +     0 land (   26 x   25 ), nb_points       682 (    31 x    22 )
nb_cores   660 oce +     0 land (   33 x   20 ), nb_points       675 (    25 x    27 )
nb_cores   667 oce +     0 land (   29 x   23 ), nb_points       672 (    28 x    24 )
nb_cores   672 oce +     0 land (   28 x   24 ), nb_points       667 (    29 x    23 )
nb_cores   675 oce +     0 land (   27 x   25 ), nb_points       660 (    30 x    22 )
nb_cores   690 oce +     0 land (   30 x   23 ), nb_points       648 (    27 x    24 )
nb_cores   696 oce +     0 land (   29 x   24 ), nb_points       644 (    28 x    23 )
nb_cores   700 oce +     0 land (   28 x   25 ), nb_points       638 (    29 x    22 )
nb_cores   720 oce +     0 land (   36 x   20 ), nb_points       621 (    23 x    27 )

 ===>>> : E R R O R
         ===========