Getting Started with Xcompact3d for ISC22 SCC

This page is an overview, configuration and tasks for ISC22 Student Cluster Competition teams.

Note: Changes may occur until the beginning of the competition.

 

References

 

 

Overview

Xcompact3d is a Fortran90 MPI based, finite difference high-performance framework for solving the Navier-Stokes equations and associated scalar transport equations. Dedicated to Direct and Large Eddy Simulations (DNS/LES) for which the largest turbulent scales are simulated, it can combine the versatility of industrial codes with the accuracy of spectral codes. Its user-friendliness, simplicity, versatility, accuracy, scalability, portability and efficiency makes it an attractive tool for the Computational Fluid Dynamics community.

XCompact3d is currently able to solve the incompressible and low-Mach number variable density Navier-Stokes equations using sixth-order compact finite-difference schemes with a spectral-like accuracy on a monobloc Cartesian mesh.  It was initially designed in France in the mid-90's for serial processors and later converted to HPC systems. It can now be used efficiently on hundreds of thousands CPU cores to investigate turbulence and heat transfer problems thanks to the open-source library 2DECOMP&FFT (a Fortran-based 2D pencil decomposition framework to support building large-scale parallel applications on distributed memory systems using MPI; the library has a Fast Fourier Transform module).
When dealing with incompressible flows, the fractional step method used to advance the simulation in time requires to solve a Poisson equation. This equation is fully solved in spectral space via the use of relevant 3D Fast Fourier transforms (FFTs), allowing the use of any kind of boundary conditions for the velocity field. Using the concept of the modified wavenumber (to allow for operations in the spectral space to have the same accuracy as if they were performed in the physical space), the divergence free condition is ensured up to machine accuracy. The pressure field is staggered from the velocity field by half a mesh to avoid spurious oscillations created by the implicit finite-difference schemes. The modelling of a fixed or moving solid body inside the computational domain is performed with a customised Immersed Boundary Method. It is based on a direct forcing term in the Navier-Stokes equations to ensure a no-slip boundary condition at the wall of the solid body while imposing non-zero velocities inside the solid body to avoid discontinuities on the velocity field. This customised IBM, fully compatible with the 2D domain decomposition and with a possible mesh refinement at the wall, is based on a 1D expansion of the velocity field from fluid regions into solid regions using Lagrange polynomials or spline reconstructions. In order to reach high velocities in a context of LES, it is possible to customise the coefficients of the second derivative schemes (used for the viscous term) to add extra numerical dissipation in the simulation as a substitute of the missing dissipation from the small turbulent scales that are not resolved. 

For more details about the numerical methods and parallelisation strategy used in Xcompact3d:

For the competition, Incompact3d, the incompressible flow solver of Xcompact3d, and Winc3d, the wind farm simulator of Xcompact3d, will be used.

Wind Turbine Simulations

Modern large–scale offshore wind farms consist of multiple turbines clustered together usually in
well–structured formations. Such a clustering exhibits a number of drawbacks during the operation
of the wind farm, as some of the downstream turbines will inevitably have to operate within the wake
of the upstream ones. A wind turbine operating within a wake field is an issue for two reasons. First,
the apparent reduction of its power output due to the wind speed de-acceleration and second an
increase of the fatigue loads due to experiencing the upstream wake-laden turbulence. Power
losses due to wake effects were recently reported to be in the order of 10–25 % while the fatigue–
related failures were reported to be around the same levels owing to a limited understanding of the
offshore turbulence. To summarise, currently installed wind farms do not produce as much power as
expected.

Photograph of the Horns Rev 2 offshore wind farm on 25 January 2016 at 12:45 UTC. From Hasager, C. B., Nygaard, N. G., Volker, P. J., Karagali, I., Andersen, S. J., & Badger, J. (2017). Wind farm wake: The 2016 Horns Rev photo case. Energies, 10(3), 317.

In order to optimise the power output of wind farms, there is a clear need for reliable physics-based
simulation methods that can faithfully replicate realistic scenarios during operational conditions, using wind farm simulators (WFS) such as Winc3d.

 

Introduction to XCompact3D

 

The slides:

Configuration Example

First, get the source code by cloning the git repository (you want to work with version 4.0):

git clone https://github.com/xcompact3d/Incompact3d cd Incompact3d git checkout v4.0

For Niagara, you can load the following modules in order to use the Intel Fortran compiler with Intelmpi:

module load intel/2020u4 module load intelmpi/2020u4

For Bridges-2, you can load the following module in order to use the GNU compiler with Openmpi:

module load openmpi/4.1.1-gcc8.3.1

On both clusters, you can use HPC-X as well, https://hpcadvisorycouncil.atlassian.net/wiki/spaces/HPCWORKS/pages/2910060545

To compile the code, you need to set-up the Makefile accordingly:

Set below variables in Makefile for the Intel compiler on Niagara (and adjust the variable FC accordingly, it is the wrapper command for the Fortran compiler)

Set below variables in Makefile for the GNU compiler on Bridges-2 (and adjust the variable FC accordingly, it is the wrapper command for the Fortran compiler)

Then

On successful completion you will have an executable called xcompact3d.

Running the Taylor-Green vortex case

The Taylor–Green Vortex (TGV) is a well-known benchmark in CFD, modelling the transition from an initially laminar state to a fully turbulent one. It is attractive as a test case due to its simple setup.

To ensure that the code is running fine, you can run the TGV benchmark on 16 or 32 MPI processes

In interactive mode:

You can also use the following job script on Niagara:

You can also use the following job script on Bridges-2:

and

The simulation will run in few minutes and will produce a file called time_evol.dat. The data in this file can be compared with a reference time_evol_ref.dat file which can be downloaded here:

Using your favourite tool, you can plot the time evolution of the kinetic energy of the flow (second column) and the dissipation (third column). Time is in the first column. You should obtain the same curves as in the reference file.

Running the wind turbine case

The bulk of the tasks is based on a wind turbine case for which two generic wind turbines are aligned and subject to an incoming uniform wind. The files required for this case are provided in the Tasks section.

The setup is similar to the following experiment:

From Bartl, J., & Sætran, L. (2017). Blind test comparison of the performance and wake flow between two in-line wind turbines exposed to different turbulent inflow conditions. Wind energy science2(1), 55-76.

Tunables Parameters

For the competition, you will only have to change three parameters (in the input.i3d file): the domain decomposition layout (the dimension of the pencils) and the number of iterations for the simulations.

1-Size of the 2D pencils (input.i3d file)

When p_row and p_col are set to zero, the size of the pencils will be automatically decided during the execution. The simulations are performed with [p_row x p_col] MPI_processes (with one MPI_process per core). For instance, if you want to use 128 cores/MPI_processes, you can have p_row x p_col = 2 x 64, 4 x 32, 8 x 16, 16 x 8, 32 x 4, 62 x 2 (p_row and p_col should be equal or larger than 2). For the competition, it is advised to try difference values for best performance.

2-Number of time steps (input.i3d file)

You can adjust the number of time steps (iterations) by changing this parameter.

It is not recommended to modify the other parameters in the input files and in the “turb” files, excepted if clearly specified in the tasks.

Tasks and Submissions

Input files for the wind turbine simulations

For the competition, create a new directory in the examples folder and unzip the input files:

Profiling

Run IPM profiles using 4 nodes on Niagara and Bridges-2 (using all the cores per node) for the wind turbines case using 2,500 iterations. Submit the 2 IPM profiles with comments on potential bottlenecks in the code and the ratio of communications/calculations. Discuss about the differences between the two supercomputers.

Performance

Identify the best configuration to achieve the fastest wall clock time to solution with 2,500 iterations using 4 nodes on Niagara and Bridges-2.

You can change the Fortran compiler and associated option, the MPI library, the number of MPI_processes per node and the size for the 2D pencils (p_row, p_col). Compiler options can be changed in the Makefile with the variable FFLAGS. Submit the Makefile, job submission file and output file for your fastest run on each machine.

Visualizations

Generate some 3D visualizations of the flow by running the simulation for 5,000 iterations (on the machine of your choice) and using the open-source visualization sofware Paraview. You can change the frequency of writing 3D snapshots of the flow with the parameter

in input.i3d (snapshots are saved every “ioutput” time steps).

To generate visualisations of the flow field, the best option is to use the vorticity fields (vort_*.bin files in the data directory) with an iso-contour of 75, colored by the streamwise velocity ux (ux-*.bin files). Use the xdmf reader option, the xdmf files will read the “.bin” files directly. Once your files are loaded in Paraview, you can select the “contour” function (look for it via the “Filters” tab). See example below:

You can change the color code for the visualizations with the color map editor by selecting one of the preset map available (in the visualization above, the Rainbow Desaturated color map was used). Submit two different visualizations of the flow.

Bonus Task

Perform two strong scalability studies, one on Niagara and one on Bridges2 using 1 to 8 nodes with 2,500 iterations (using the same compiler/MPI library and parameters as in the performance tasks). Comment on the scalability of the code, identify potential bottlenecks and suggest avenues for improvement.