This post shows the installation and configuration steps needed to set up Charm++ to run with UCX machine layer
1. Get the UCX machine layer code from https://charm.cs.illinois.edu/gerrit/#/c/charm/+/4940/
# wget https://charm.cs.illinois.edu/gerrit/changes/charm~4940/revisions/3/archive?format=tgz a.tgz ... # tar xf a.tgz # ls CHANGES LICENSE README.ampi README.charm4py build contrib doc package-tarball.sh smart-build.pl tests CMakeLists.txt README README.bigsim a.tgz charm.spec coverage examples relink.script src
2. Load the HPC-X module
# module load intel/2018.5.274 # module load hpc-x/2.4.0-pre
3. Compile Charm++ with UCX Machine layer.
Look for the text Charm++ built successfully.
# ./build charm++ ucx-linux-x86_64 -j16 --no-shared --with-production 2>&1 | tee ../charm-6.8.2-hpcx-2.4.0-pre-ucx-intel-2018.5.274-no-avx.build.log ... gmake[1]: Leaving directory `/global/software/centos-7/sources/NAMD_2.13_Source/ucx/ucx-linux-x86_64/tmp/libs/ck-libs/NDMeshStreamer' ../../../../bin/charmc -optimize -production -o ../../../../lib/libmoduleCkIO.a ckio.o fs_parameters.o ar: creating ../../../../lib/libmoduleCkIO.a echo "-llustreapi" > ../../../../lib/libmoduleCkIO.dep gmake[1]: Leaving directory `/global/software/centos-7/sources/NAMD_2.13_Source/ucx/ucx-linux-x86_64/tmp/libs/ck-libs/io' touch charm++ ------------------------------------------------- charm++ built successfully. Next, try out a sample program like ucx-linux-x86_64/tests/charm++/simplearrayhello
4. Run simple hello test
# cd ucx-linux-x86_64/tests/charm++/simplearrayhello/ # make -j ../../../bin/charmc -c hello.C ../../../bin/charmc -language charm++ -o hello hello.o # mpirun -n 2 ./hello Charm++> Running in non-SMP mode: 2 processes (PEs) Converse/Charm++ Commit ID: release-2-13-beta-2-27-g4095ae2 Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'. CharmLB> Load balancer assumes all CPUs are same. Charm++> Running on 1 hosts (2 sockets x 14 cores x 2 PUs = 56-way SMP) Charm++> cpu topology info is gathered in 0.002 seconds. Running Hello on 2 processors for 5 elements [0] Hello 0 created [0] Hello 1 created [0] Hello 2 created [0] Hi[17] from element 0 [0] Hi[18] from element 1 [0] Hi[19] from element 2 [1] Hello 3 created [1] Hello 4 created [1] Hi[20] from element 3 [1] Hi[21] from element 4 All done [Partition 0][Node 0] End of program
5. Run some pingpong test
# cd ../pingpong/ # make -j ../../../bin/charmc pingpong.ci touch cifiles ../../../bin/charmc -I../../../src/conv-core pingpong.C ../../../bin/charmc -language charm++ -o pgm pingpong.o # salloc -N 2 -p thor # mpirun -map-by node -n 2 -x UCX_NET_DEVICES=mlx5_0:1 ./pgm
0 Comments