This year, we will have a coding exercise challenge!

Table of Contents

High Level Overview

For this exercise, you will write a part of the code that simulates particles moving in a predefined manner within a two-dimensional space (bounding box). The bounding box is divided into a two dimensional grid of cells. The particles will be moving around in the bounding box with <x,y> double precision floating point coordinates and belong to one of the cells based on their coordinates. In each cell, the particles are stored as a vector. The size of each cell is 1 X 1, making the entire bounding box have a size of n * n, where n is the number of cells per dimension. For your simulation, we will have a simulation size of 35 * 35, with 35 cells along each dimension. The coordinates of the particles in the simulation box will be between 0 and 35 along each dimension (0 <= x, y <= 35).

...

Note: There might be multiple particles having the same x and y coordinates. You do not need to handle this case separately; it is a valid case assumption.

Tasks

Task 1: Compile charm++ and run simple programs

You can select one among several machine layers (or networking layers) to serve as your networking backend.

Consider using the flag ‘--with-production’ for the best optimizations
Build with the smp mode to take advantage of shared memory optimizations ( ./build charm++ <machine layer-arch> smp --with-production)
After building successfully, try a simple program like tests/charm++/simplearrayhello to ensure that programs run successfully.

Task 2: Complete the first section of the programming assignment

Untar the programming assignment tar file and then understand the coding assignment. After that, write code for the Cell::updateParticles function in the src/exercise.cpp file. You will be required to iterate through the particles vector and move each particle. After moving them, you will determine the particles that do not belong in my cell and then send them to the appropriate neighbor cells using helper methods. Run the simplistic version of the program using ‘make test’.

Note: If you run ‘make test' before writing the code in Cell::updateParticles, your application will hang since it calls Cell::updateParticles (which is required to be filled in by you).

Task 3: Debugging, testing and profiling

For debugging, you can compile your program with -g (and optionally -O0) and debug the program using gdb . Ensure that the program runs successfully with one rank ( or process) before debugging the multiple process case. While using gdb in the multiple process case, you can use an xterm to launch multiple gdb windows.

...

For profiling, you can use the tool called ‘projections’ to analyze the performance of the charm++ program. The usage profile view is especially useful to view the load on each processor.

Task 4: Bonus Question

For your bonus question, learn to use Charm++ custom reduction functions and add code to calculate the cell with the maximum number of particles and the minimum number of particles and their respective values. This will execute at the end of the simulation. On running the program with the bonus question, make sure to set BONUS_QUESTION to 1 in your makefile.

Task 5: Benchmarking and Output submission

After you’ve verified the correctness of your code using the small target (make test), you can now run the actual full-sized simulation using the 'make testbench' target. Optionally, you can use the `+balancer <load-balancer>` option in your run command to apply a load balancer to balance the imbalanced load.

Note: Ensure that LIVEVIZ_RUN is set to 0 during your benchmarking run to avoid performance loss because of any visualization component.

This year, since we are conducting the competition remotely, we will reproduce your results in order to validate your submission. After the completion of your run, when you are ready for submission, create a tar file (named cc_<team-name>.tar.gz) with the following files and folders:

A folder called cc_src with all the particle simulation source code files that were modified by you (including src/exercise.cpp).
A folder called cc_output with all the final simulation output files (should include all the sim_output_* files including sim_output_main). Also include the program output that is displayed to stdout in a file called sim_output_stdout.
A file called cc_jobscript.sh. This is essentially your job script file that you used to run the final ‘testbench’ target of the particle simulation.
[OPTIONAL] A folder called cc_charm_src with the charm directory source code changes that were made by you. If you didn’t make any changes to the charm directory source code, you do not have to submit this directory.
A file called cc_instructions.txt with the following details:
1. Charm version: You can obtain this using the ‘VERSION’ file inside your charm directory.
2. Command used to build charm++ (Include any other commands used to build dependent libraries)
[OPTIONAL] A folder called cc_additionals with any other scripts or files that you think are relevant for reproducing your results. If you don’t have any other additional files, you do not have to submit this directory.

Task 6: Visualization and video

Install the Charm++ visualization tool called liveViz on your laptop. Build a netlrts version of Charm++ (on your laptop) and compile the particle simulation program with LIVEVIZ_RUN set to 1. Run the particle simulation with the ‘make testviz’ target and in a separate window, connect the liveViz client to the running program to visualize the particle simulation. Capture your screen and generate a video showing the movement of particles in the 2D space.

Note: You have to use the netlrts machine layer for this part of the exercise, since this functionality is not supported on other machine layers. It is highly recommended to do all of this task on a smaller machine like a laptop/desktop.

Task 7: Let others know

Add a team name-tag or a team photo in small into the video (to mark it) and post to your team’s twitter account if you have, or linkedIn account with the hashtags #ISC20 #ISC20_SCC @Charmplusplus. If you don’t use social media, you can submit the file using a flash drive.

Details

Charm++ Introduction

Charm++ is a C++-based parallel programming system, founded on the migratable-objects programming model, and supported by a novel and powerful adaptive runtime system. It supports both irregular as well as regular applications, and can be used to specify task-parallelism as well as data parallelism in a single application. It automates dynamic load balancing for task-parallel as well as data-parallel applications, via separate suites of load-balancing strategies. Via its message-driven execution model, it supports automatic latency tolerance, modularity, and parallel composition. Charm++ also supports automatic checkpoint/restart, as well as fault tolerance based on distributed checkpoints.

Charm++ is a production-quality parallel programming system used by multiple applications in science and engineering on supercomputers as well as smaller clusters around the world. Currently the parallel platforms supported by Charm++ are the IBM BlueGene/Q and OpenPOWER systems, Cray XE, XK, and XC systems, Omni-Path and Infiniband clusters, single workstations and networks of workstations (including x86 (running Linux, Windows, MacOS)), etc. The communication protocols and infrastructures supported by Charm++ are UDP, MPI, OFI, UCX, Infiniband, uGNI, and PAMI. Charm++ programs can run without changing the source on all these platforms. Notable Charm++ applications that are regularly run on leading supercomputers across the world are NAMD (molecular dynamics), ChaNGa (n-body cosmological simulations), Episimdemics (contagion in social networks) and OpenAtom (Ab initio Molecular Dynamics) among many others.

Particle Simulation Introduction

The particle simulation is written using the Charm++ parallel programming model and it simulates a set of particles moving in a 2-dimensional space within a bounding box. The bounding box is divided into a two dimensional grid of cells. The particles moving around in the bounding box have (x,y) floating point coordinates and belong to one of the cells based on their coordinates. In each cell, the particles are stored as vector. The size of each cell is 1 X 1, making it n X n for a simulation involving n cells in each dimension. In the code, the variable 'numCellsPerDim' represents the number of cells along each dimension of the bounding box.

...

Top Left Neighbor : (4,3)
Top Neighbor : (4,4)
Top Right Neighbor : (4,0)
Left Neighbor : (0,3)
Right Neighbor : (0,0)
Bottom Left Neighbor : (1,3)
Bottom Neighbor : (1,4)
Bottom Right Neighbor : (1,0)

...

Input Parameters

The simulation takes the following input parameters

...

In the particle simulation, the simulation begins after the initial population of cells with particles based on the particle distribution ratios. Particles move in a pre-defined manner (when you call the perturb function). Based on the particle's new position, it is your responsibility to determine if the particle is still in the same cell or if it has to be transferred to one of the adjacent cells. In the case of the particle not belonging to the same cell, you need to send the particle to the new cell, where it belongs. For each iteration, every particle in each cell is moved (perturbed) and then, the cell sends 8 messages to 8 of its neighboring cells (TL, T, TR, L, R, BL, B, BR, as explained above). After sending the 8 messages, each cell waits for 8 incoming messages from its neighbors, consisting of particles that belong to it. These new particles that belong to the cell are added to the existing particles of the cell. Note that, it is your task to only move and send the particles (the 8 outgoing messages) to 8 neighboring cells. The already written skeleton code takes care of waiting for the 8 incoming messages and adding them to the cell. This process of moving the particle, sending 8 outgoing messages and waiting for 8 incoming messages is repeated for every cell, for every iteration of the simulation. The simulation completes when the ongoing count of iterations reaches the total number of iterations.

Charm++ Setup

1. Download the latest Charm++ release

...

Code Block

$ make test
$ ../../../bin/charmc   hello.ci
$ ../../../bin/charmc  -c hello.C
$ ../../../bin/charmc  -language charm++ -o hello hello.o
$ ../../../bin/testrun  ./hello +p4 10
$ Charmrun> scalable start enabled.
$ Warning: No xauth data; using fake authentication data for X11 forwarding.
$ X11 forwarding request failed on channel 1
$ Charmrun> started all node programs in 1.147 seconds.
$ Charm++> Running in non-SMP mode: 4 processes (PEs)
$ Converse/Charm++ Commit ID: v6.10.1-0-gcc60a79
$ Charm++> scheduler running in netpoll mode.
$ CharmLB> Load balancer assumes all CPUs are same.
$ Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
$ Charm++> cpu topology info is gathered in 0.001 seconds.
$ Running Hello on 4 processors for 10 elements
$ [0] Hello 0 created
$ [0] Hello 1 created
$ [0] Hello 2 created
$ [0] Hi[17] from element 0
$ [0] Hi[18] from element 1
$ [0] Hi[19] from element 2
$ [2] Hello 6 created
$ [2] Hello 7 created
$ [3] Hello 8 created
$ [3] Hello 9 created
$ [1] Hello 3 created
$ [1] Hello 4 created
$ [1] Hello 5 created
$ [1] Hi[20] from element 3
$ [1] Hi[21] from element 4
$ [1] Hi[22] from element 5
$ [2] Hi[23] from element 6
$ [2] Hi[24] from element 7
$ [3] Hi[25] from element 8
$ [3] Hi[26] from element 9
$ All done
$ [Partition 0][Node 0] End of program
$
$ real 0m1.184s
$ user 0m0.015s
$ sys 0m0.016s

Coding Assignment (Particle Simulation) Setup

1. Untar the charm-exercise.tar.gz that will be provided to you

...

Code Block

$ make test
./charmrun +p4 ./particle 100 4 100 1,2,3,10 5 yes 5

Running on 4 processors:  ./particle 100 4 100 1,2,3,10 5 yes 5
./charmrun: line 106: mvapich2-start-mpd: command not found
Charm++> Running in non-SMP mode: 4 processes (PEs)
Converse/Charm++ Commit ID: v6.10.1-0-gcc60a792d
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (2 sockets x 12 cores x 1 PUs = 24-way SMP)
Charm++> cpu topology info is gathered in 0.005 seconds.
================================ Input Params ===============================
====================== Particles In A Box Simulation ========================
Grid Size                                                  = 4 X 4
Particles/Cell seed value                                  = 100
Number of Iterations                                       = 100
Green Particles (Lower Triangular Half) distribution ratio = 1
Blue Particles  (Upper Triangular Half) distribution ratio = 2
Red Particles   (Diagonal) distribution ratio              = 3
Red Particles   (Central Box) distribution ratio           = 10
Velocity Reduction Factor                                  = 5
Log Output                                                 = 1
Load Balancing Frequency                                   = 5
=============================================================================
======================= Launching Particle Simulation =======================
Iteration: 5, Outgoing Particles Sum: 543, Total Particles: 4000
Iteration: 10, Outgoing Particles Sum: 707, Total Particles: 4000
...
...
...
Iteration: 100, Outgoing Particles Sum: 870, Total Particles: 4000
======================= Particle Simulation Complete ========================
Simulation Complete, total time taken is 0.061054 seconds
=============================================================================
=============================================================================
Success! Simulation correctness verified across all cells
=============================================================================
Final summarized output has been written to: output/sim_output_15-55-59-789815-4-100-100/sim_output_main
All particle output has been written to files in directory : output/sim_output_15-55-59-789815-4-100-100
Exiting program
[Partition 0][Node 0] End of program




$ ./charmrun +p96 ./particle 10000 35 1000 1,2,30,10 5 no 5

Running on 96 processors:  ./particle 10000 35 1000 1,2,30,10 5 no 5
./charmrun: line 106: mvapich2-start-mpd: command not found
Charm++> Running in non-SMP mode: 96 processes (PEs)
Converse/Charm++ Commit ID: v6.10.1-0-gcc60a792d
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 4 hosts (2 sockets x 12 cores x 1 PUs = 24-way SMP)
Charm++> cpu topology info is gathered in 0.062 seconds.
================================ Input Params ===============================
====================== Particles In A Box Simulation ========================
Grid Size                                                  = 35 X 35
Particles/Cell seed value                                  = 10000
Number of Iterations                                       = 1000
Green Particles (Lower Triangular Half) distribution ratio = 1
Blue Particles  (Upper Triangular Half) distribution ratio = 2
Red Particles   (Diagonal) distribution ratio              = 30
Red Particles   (Central Box) distribution ratio           = 10
Velocity Reduction Factor                                  = 5
Log Output                                                 = 0
Load Balancing Frequency                                   = 5
=============================================================================
======================= Launching Particle Simulation =======================
Iteration: 5, Outgoing Particles Sum: 5399671, Total Particles: 34750000
Iteration: 10, Outgoing Particles Sum: 5436386, Total Particles: 34750000
Iteration: 15, Outgoing Particles Sum: 5696982, Total Particles: 34750000
Iteration: 20, Outgoing Particles Sum: 5727058, Total Particles: 34750000
Iteration: 25, Outgoing Particles Sum: 5602325, Total Particles: 34750000
Iteration: 30, Outgoing Particles Sum: 5579097, Total Particles: 34750000
...
...
...
Iteration: 980, Outgoing Particles Sum: 2331920, Total Particles: 34750000
Iteration: 985, Outgoing Particles Sum: 2325854, Total Particles: 34750000
Iteration: 990, Outgoing Particles Sum: 2322346, Total Particles: 34750000
Iteration: 995, Outgoing Particles Sum: 2320412, Total Particles: 34750000
Iteration: 1000, Outgoing Particles Sum: 2315199, Total Particles: 34750000
======================= Particle Simulation Complete ========================
Simulation Complete, total time taken is 226.429119 seconds
=============================================================================
=============================================================================
Success! Simulation correctness verified across all cells
=============================================================================
Final summarized output has been written to: output/sim_output_17-10-44-916566-35-10000-1000/sim_output_main
Exiting program
[Partition 0][Node 0] End of program

Coding Assignment Task

Your exercise is to add code for two functions:

The main part of the exercise is to add code in Cell::updateParticles.
The bonus part of the exercise is to add code in Cell::contributeToReduction, Main::receiveMinMaxReductionData, and global function:calculateMaxMin. For this part, you would need to understand how custom reduction functions are written in Charm++ in order to write the reduction function for calculating the min value, the min cell indices, the max value, and the max cell indices.

Main task:

The Cell::updateParticles function is called for each cell in the grid. It is called for each iteration of the

...

Your assignment is to add code into Cell::updateParticles in the file src/exercise.cpp to add the functionalities described above. It is important to note that sendParticles is called 8 times to send outgoing messages to the 8 adjacent cells or neighbors, since they are waiting on messages from all their neighbors. Failing to send messages to the neighbors will cause a hang.

Bonus exercise

Understand custom reductions from the manual, refer to https://charm.readthedocs.io/en/latest/charm++/manual.html#defining-a-new-reduction-type.

...

Add code in Cell::contributeToReduction in the file src/exercise.cpp to declare the reduction data and assign the declared data with the different values being reduced (min value, min x index, min y index, max value, max x index and max y index). Following the data declaration, declare a callback to Main::receiveMinMaxReductionData and call contribute passing the size, data, reduction type i.e minMaxType and the declared callback. Note that the custom reduction type, 'minMaxType' has already been declared to use calculateMaxMin.
Add code in calculateMinMax to implement the reduction operations for the data being received.
Add code in Main::receiveMinMaxReductionData to initialize the variables with the final reduction values. These variables are maxParticles, maxCellX, maxCellY, minParticles, minCellX and minCellY. Note that, Main::receiveMinMaxReductionData is called after the reduction operation is complete.

Testing your program

On completing the main task (and optionally the bonus task), run the small target 'make test' and ensure that the value of the total particles in the simulation is not changing across iterations. The unchanged value indicates that no particles are lost in the simulation. A changing value indicates a bug in your code.

...

This should create a folder bench inside /scripts/compareOutput. This will be used by your program during the runs to compare your benchmarking simulation output against. (testbench output).

Visualizing your run

To visualize your program, you would need to build a netlrts build, either with or without smp support, since that is the only build that can support live visualization. Additionally, you would also need to install the visualization client liveViz to connect to your run. On completing your exercise, in order to visualize the particle simulation, perform the following steps on your local machine (like your desktop or laptop). For this exercise, you wouldn’t your cluster.

...

With this, you should be able to see the visualization of moving particles.

...

Some Caveats to Understand

For the coding challenge, you will only need to add code to exercise.cpp. However, if you wish, you can change the other source code files as long your result matches the pre-computed output.
Important- While deciding to send particles to their correct homes (the 8 neighboring cells), ensure that a particle is sent over only if its coordinates have crossed the boundary. Do not send particles that are still on the boundary of the current cell.
Important - Make sure that particles that go outside the bounding box are wrapped back as explained previously.
Use the totalParticles displayed in the output to validate that no particle in the simulation is being lost. The value of totalParticles displayed should not change across the periodically displayed iterations.
If you’re attempting the bonus question, make sure to go into the Makefile and set the BONUS_QUESTION variable to 1. Since this is a compiler macro, you will have to run make clean all to recompile your files.
When you’re using the visualization feature and want to visualize using liveViz, make sure to go into the Makefile and set the LIVEVIZ_RUN variable to 1. Since this is also a compiler macro, you will have to run make clean all to recompile your files.
Among the input parameters listed in this section, for your final benchmarking task, you can only modify logOutput, load balancing frequency, and the load balancer that you pass using +balancer <load-balancer-name>.
If you are running your program in the SMP mode, which takes advantage of shared memory communication for threads of a process, note that in this mode, for each process, you will be using one of the threads (and hence cores) of the process for a communication thread, whose exclusive job is to send and receive messages. This means that you will lose that core as your compute resource. In the SMP mode, it is required to specify ++ppn <ppn-val> to specify the number of threads for the process. Additionally, you can specify cpu affinity to bind the threads to cores using +pemap <pe-map> and +commap <comm-map>. These are described here.
Note from Nitin Bhat:
It was brought to my notice by a few participants on the ISC20-SCC slack (#isc20-scc) that the implementation and version of the math library libm could impact the final output of the particle simulation. For that reason, in order to ensure that your libm versions don’t lead to different results when compared with the pre-computed test results, I decided to provide the details about the libm version that I’ll be using for testing. I will be using the standard libm that is shipped as a part of GNU libc 2.12. This is also the default library in your Linux environment on the NSCC cluster.
Note that you modify the number of processors in the testbench output. The current value is +p96. But, you’re free to play around with this value.
Parts of the code that cannot be modified:
1. Do not change/remove the timer calls since that is used to evaluate the total execution time.
2. Do not change/remove asserts in the code since they are basic sanity checks for the simulation to proceed as required.
  1. This includes the code inside checkParticleBelongsToMe and the places where checkParticleBelongsToMe is called.
  2. This also include the call to verifyCorrectness inside recvParticlesPostSimulation
3. Ensure that the simulation is performed such that particles have coordinates that are spatially belonging to the cell. (The calls to checkParticleBelongsToMe will enforce this behavior).
4. Do not change/remove the calls to reduceTotalAndOutbound since they will be performed during the simulation in order to print out the total particles and the total outgoing particles in the simulation.

Download Files

Click here to download the Coding Challenge Files

Good Luck Teams!!

References

Charm++ Manual: https://charm.readthedocs.io/en/latest/index.html
Custom Reduction: https://charm.readthedocs.io/en/latest/charm++/manual.html#defining-a-new-reduction-type
Projections Manual: https://charm.readthedocs.io/en/latest/projections/manual.html
LiveViz: https://charm.readthedocs.io/en/latest/libraries/manual.html#liveviz-library

...

Versions Compared

Old Version 27

New Version Current

Key

High Level Overview

Tasks

Task 1: Compile charm++ and run simple programs

Task 2: Complete the first section of the programming assignment

Task 3: Debugging, testing and profiling

Task 4: Bonus Question

Task 5: Benchmarking and Output submission

Task 6: Visualization and video

Task 7: Let others know

Details

Charm++ Introduction

Particle Simulation Introduction

Input Parameters

Charm++ Setup

Coding Assignment (Particle Simulation) Setup

Coding Assignment Task

Main task:

Bonus exercise

Testing your program

Visualizing your run

Some Caveats to Understand

Download Files

References

Page Comparison

Versions Compared

Old Version 27

New Version Current

Key

High Level Overview

Tasks

Task 1: Compile charm++ and run simple programs

Task 2: Complete the first section of the programming assignment

Task 3: Debugging, testing and profiling

Task 4: Bonus Question

Task 5: Benchmarking and Output submission

Task 6: Visualization and video

Task 7: Let others know

Details

Charm++ Introduction

Particle Simulation Introduction

Input Parameters

Charm++ Setup

Coding Assignment (Particle Simulation) Setup

Coding Assignment Task

Main task:

Bonus exercise

Testing your program

Visualizing your run

Some Caveats to Understand

Download Files

References