HowTo Setup Charm++ with UCX Machine Layer

This post shows the installation and configuration steps needed to set up Charm++ to run with UCX machine layer

References


1. Get the UCX machine layer code from https://charm.cs.illinois.edu/gerrit/#/c/charm/+/4940/


# wget https://charm.cs.illinois.edu/gerrit/changes/charm~4940/revisions/3/archive?format=tgz a.tgz

...
# tar xf a.tgz

# ls 

CHANGES         LICENSE  README.ampi    README.charm4py  build       contrib   doc       package-tarball.sh  smart-build.pl  tests
CMakeLists.txt  README   README.bigsim  a.tgz            charm.spec  coverage  examples  relink.script       src

2. Load the HPC-X module 

# module load intel/2018.5.274
# module load hpc-x/2.4.0-pre

3. Compile Charm++ with UCX Machine layer.

Look for the text Charm++  built successfully.

# ./build charm++ ucx-linux-x86_64 -j16 --no-shared --with-production 2>&1 | tee ../charm-6.8.2-hpcx-2.4.0-pre-ucx-intel-2018.5.274-no-avx.build.log

...
gmake[1]: Leaving directory `/global/software/centos-7/sources/NAMD_2.13_Source/ucx/ucx-linux-x86_64/tmp/libs/ck-libs/NDMeshStreamer'
../../../../bin/charmc -optimize -production   -o ../../../../lib/libmoduleCkIO.a ckio.o fs_parameters.o
ar: creating ../../../../lib/libmoduleCkIO.a
echo "-llustreapi" > ../../../../lib/libmoduleCkIO.dep
gmake[1]: Leaving directory `/global/software/centos-7/sources/NAMD_2.13_Source/ucx/ucx-linux-x86_64/tmp/libs/ck-libs/io'
touch charm++
-------------------------------------------------
charm++ built successfully.
Next, try out a sample program like ucx-linux-x86_64/tests/charm++/simplearrayhello

4. Run simple hello test 

# cd ucx-linux-x86_64/tests/charm++/simplearrayhello/

# make -j
../../../bin/charmc  -c hello.C
../../../bin/charmc  -language charm++ -o hello hello.o

# mpirun -n 2 ./hello

Charm++> Running in non-SMP mode: 2 processes (PEs)
Converse/Charm++ Commit ID: release-2-13-beta-2-27-g4095ae2
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (2 sockets x 14 cores x 2 PUs = 56-way SMP)
Charm++> cpu topology info is gathered in 0.002 seconds.
Running Hello on 2 processors for 5 elements
[0] Hello 0 created
[0] Hello 1 created
[0] Hello 2 created
[0] Hi[17] from element 0
[0] Hi[18] from element 1
[0] Hi[19] from element 2
[1] Hello 3 created
[1] Hello 4 created
[1] Hi[20] from element 3
[1] Hi[21] from element 4
All done
[Partition 0][Node 0] End of program


5. Run some pingpong test

# cd ../pingpong/

# make -j
../../../bin/charmc   pingpong.ci
touch cifiles
../../../bin/charmc  -I../../../src/conv-core pingpong.C
../../../bin/charmc  -language charm++ -o pgm pingpong.o

# salloc -N 2 -p thor 
# mpirun -map-by node -n 2 -x UCX_NET_DEVICES=mlx5_0:1 ./pgm