References

MHM2 Lecture

Download the Slides here:

Recording:

Getting Started

Build and Install upcxx

  1. Download MHM2 and UPCXX from https://bitbucket.org/berkeleylab/mhm2/src/master/ and https://bitbucket.org/berkeleylab/upcxx/downloads/?tab=downloads .

$ ls
mhm2-2.0.0.tar.gz  upcxx-2020.3.2.tar.gz

2. uncompress

$ tar xf mhm2-2.0.0.tar.gz
$ tar xf upcxx-2020.3.2.tar.gz 
$ ls
mhm2-2.0.0  mhm2-2.0.0.tar.gz  upcxx-2020.3.2  upcxx-2020.3.2.tar.gz

3. Follow the guidelines here to build, make and install UPC++, load relevant compilers and environment variables, make sure the compilers you use are above the basic version required:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --disable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC) 

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --disable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)

4. cd and configure

$ cd upcxx-2020.3.2
$ mkdir build upcxx_install ; cd build
$ ../configure --prefix=<path-to-dir>/upcxx-2020.3.2/upcxx_install --with-cc=gcc --with-cxx=g++

5. Check the Makefile the was created in the build directory:

# This file is generated by the UPC++ configure script
# Modifications to this file may be overwritten
#
# Configure command: ../configure --prefix=/global/home/users/ophirm/hpcperf/centos-7/benchmarks/ISC21/mhm2//upcxx-2020.3.2/upcxx_install --with-cc=gcc --with-cxx=g++

ifneq ($(firstword $(sort $(MAKE_VERSION) 3.80)), 3.80)
$(error GNU Make 3.80 or newer required, but this is $(MAKE_VERSION))
endif
export prefix=/global/home/users/ophirm/hpcperf/centos-7/benchmarks/ISC21/mhm2/upcxx-2020.3.2/upcxx_install
export upcxx_src=/global/home/groups/hpcperf/centos-7/benchmarks/ISC21/mhm2/upcxx-2020.3.2
export upcxx_bld=/global/home/groups/hpcperf/centos-7/benchmarks/ISC21/mhm2/upcxx-2020.3.2/build
export UPCXX_BASH=/bin/sh
export UPCXX_PYTHON=/usr/bin/env python3
export GMAKE=/usr/bin/gmake
export GMAKE_SHORT=gmake
export CONFIG_CC=/usr/bin/gcc
export CONFIG_CXX=/usr/bin/g++
export GASNET=/global/home/users/ophirm/hpcperf/centos-7/benchmarks/ISC21/mhm2/upcxx-2020.3.2/build/bld/GASNet-2020.3.0
export GASNET_TYPE=source
export CROSS=
export UPCXX_CSUMCMD=shasum
export UPCXX_MPSC_QUEUE=UPCXX_MPSC_QUEUE_ATOMIC
export UPCXX_DBGOPT=debug opt
export GASNET_UNPACKED=/global/home/users/ophirm/hpcperf/centos-7/benchmarks/ISC21/mhm2/upcxx-2020.3.2/build/bld/GASNet-2020.3.0
export GASNET_CONFIGURE_ARGS=
export GASNET_CONDUIT=smp
export UPCXX_CUDA=
export UPCXX_CUDA_NVCC=
export UPCXX_CUDA_CPPFLAGS=
export UPCXX_CUDA_LIBFLAGS=
include $(upcxx_src)/bld/Makefile.rules

6. Make and check

$ make -j16 all
...
$ make check 
Building dependencies...
************
Compiling and running tests for the default network, NETWORKS='smp'.
Please, ensure you are in a proper environment for launching parallel jobs
(eg batch system session, if necessary) or the run step may fail.
************
...

PASSED compiling 15 tests
...
PASSED running 15 tests

7. Make install

$ make install 
...
$ ls bin
test-upcxx-install.sh  upcxx  upcxx-meta  upcxx-run  upcxx.sh

8. Export upcxx bin to PATH

$ export PATH="<path>/upcxx-2020.3.2/upcxx_install/bin:$PATH"

Build and Install mhm2

1. cd to mhm2 directory

$ cd ../../../mhm2-2.0.0

2. Follow the guidelines here to build, make and install MHM2.

$ module load cmake/3.16.4

3. Export UPCXX network variables, not mandatory, as part of the build scripts supplied with the code, “Release” option should over that.

export UPCXX_CODEMODE=O3
export UPCXX_THREADMODE=par
export UPCXX_NETWORK=ibv

4. Continue with the build script supplied with mhm2 sources

#!/bin/bash -login
set -e

SECONDS=0

rootdir=`pwd`

echo $rootdir

INSTALL_PATH=${MHM2_INSTALL_PATH:=$rootdir/install}

rm -rf $INSTALL_PATH/bin/mhm2

if [ "$1" == "clean" ]; then
    rm -rf .build/*
    # if this isn't removed then the the rebuild will not work
    rm -rf $INSTALL_PATH/cmake
    exit 0
else
    mkdir -p $rootdir/.build
    cd $rootdir/.build
    if [ "$1" == "Debug" ] || [ "$1" == "Release" ] || [ "$1" == "RelWithDebInfo" ]; then
        rm -rf *
        rm -rf $INSTALL_PATH/cmake
        cmake $rootdir -DCMAKE_BUILD_TYPE=$1 -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ $MHM2_CMAKE_EXTRAS
    fi
    make -j ${MHM2_BUILD_THREADS} install
fi

echo "Build took $((SECONDS))s"

Run mhm2

  1. In this example, we used slurm script to spawn the executable (Using this input file)

#!/bin/bash -login 
# ----------------------------------------------------------------------------
#SBATCH -p iris
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=40
#SBATCH --threads-per-core=1
#SBATCH -J mhm2
#SBATCH --time=1:00:00
#SBATCH --exclusive
#SBATCH -d singleton
# ----------------------------------------------------------------------------
HOST=$SLURM_JOB_NUM_NODES
PPN=$SLURM_NTASKS_PER_NODE
NP=$SLURM_NPROCS
DATE=$(date +%Y%m%d-%H%M)

BASE=<path-to-dir>
ulimit -s unlimited

# OpenMP settings
export OMP_NUM_THREADS=1


export PATH=$BASE/mhm2-2.0.0/install/bin:$BASE/upcxx-2020.3.2/upcxx_install/bin:$PATH
export GASNET_IBV_PORTS="mlx5_0:1"
export GASNET_PHYSMEM_PROBE=0 
export GASNET_ODP_VERBOSE=0

input=$BASE/input/arctic_sample_0.fq
mhm2.py -r $input --checkpoint=no   <-- You can add -v for verbosity

Output

Output Example for arctic_sample_0.fq on 4 skylake nodes:

$ cat slurm-83452.out 
Found 40 cpus and 1 hyperthreads from lscpu
Found tasks per node from SLURM_NTASKS_PER_NODE= 40
Executing mhm2 with job 83452 (mhm2) on 4 nodes.
Executing as: /global/scratch/groups/hpcperf/ISC21/mhm2/run/../ibv/mhm2-2.0.0/install/bin/mhm2.py -r /global/scratch/groups/hpcperf/ISC21/mhm2/run/arctic_sample_0.fq
Found tasks per node from SLURM_NTASKS_PER_NODE= 40
Setting GASNET_COLL_SCRATCH_SIZE=4M 
2020-10-15 18:17:27.111090 executing:
 upcxx-run -n 160 -N 4 -shared-heap 10% -- /global/scratch/groups/hpcperf/ISC21/mhm2/run/../ibv/mhm2-2.0.0/install/bin/mhm2 -r /global/scratch/groups/hpcperf/ISC21/mhm2/run/arctic_sample_0.fq
Started executing at 2020-10-15 18:17:28.965530 with PID 5096
Set Lustre striping on the output directory
Set Lustre striping on the per_thread output directory
MHM2 version 2.0.0.1-g9f6cac9-master with upcxx-utils 0.3.5.25-g2539c79 built on 20201009_164527
Options:
  reads =                  /global/scratch/groups/hpcperf/ISC21/mhm2/run/arctic_sample_0.fq
  kmer-lens =              21 33 55 77 99
  scaff-kmer-lens =        99 33
  min-ctg-print-len =      500
  output =                 mhm2-run-arctic_sample_0-n160-N4-201015181728
  checkpoint =             true
_________________________
Starting run with 160 processes on 4 nodes at 10/15/20 18:17:28
Pinning to logical cpus: process 0 on node 0 pinned to cpu 0
Total size of 1 input file is 836.79MB
Initial free memory across all 4 nodes: 725.25GB (181.31GB avg, 181.31GB min, 181.32GB max)
Starting with 181.31GB free on node 0
Merging reads                           0.42 s

Completed initialization in 0.57 s at 10/15/20 18:17:29 (180.38GB free memory on node 0)
_________________________
Contig generation k = 21

Analyzing kmers                         1.23 s
Traversing deBruijn graph               2.26 s
Dumping contigs                         0.48 s
Aligning reads to contigs               1.28 s
Locally extending ends of contigs       1.46 s
Dumping contigs                         0.35 s
_________________________
Assembly statistics (contig lengths >= 500)
    Number of contigs:       12634
    Total assembled length:  9941407
    Average contig depth:    6.53026
    Number of Ns/100kbp:     0 (0)
    Max. contig length:      4636
    Contig lengths:
        > 1kbp:              3038276 (30.56%)
        > 5kbp:              0 (0.00%)
        > 10kbp:             0 (0.00%)
        > 25kbp:             0 (0.00%)
        > 50kbp:             0 (0.00%)

Completed contig round k = 21 in 7.09 s at 10/15/20 18:17:36 (173.35GB free memory on node 0)
_________________________
Contig generation k = 33

Analyzing kmers                         1.31 s
Traversing deBruijn graph               2.22 s
Dumping contigs                         0.28 s
Aligning reads to contigs               1.23 s
Locally extending ends of contigs       1.41 s
Dumping contigs                         0.40 s
_________________________
Assembly statistics (contig lengths >= 500)
    Number of contigs:       17463
    Total assembled length:  18092193
    Average contig depth:    4.24641
    Number of Ns/100kbp:     0 (0)
    Max. contig length:      15769
    Contig lengths:
        > 1kbp:              9714981 (53.70%)
        > 5kbp:              1196173 (6.61%)
        > 10kbp:             215136 (1.19%)
        > 25kbp:             0 (0.00%)
        > 50kbp:             0 (0.00%)

Completed contig round k = 33 in 6.87 s at 10/15/20 18:17:43 (173.09GB free memory on node 0)
_________________________
Contig generation k = 55

Analyzing kmers                         1.13 s
Traversing deBruijn graph               1.97 s
Dumping contigs                         0.61 s
Aligning reads to contigs               1.03 s
Locally extending ends of contigs       1.93 s
Dumping contigs                         0.39 s
_________________________
Assembly statistics (contig lengths >= 500)
    Number of contigs:       18025
    Total assembled length:  18902365
    Average contig depth:    3.21746
    Number of Ns/100kbp:     0 (0)
    Max. contig length:      16538
    Contig lengths:
        > 1kbp:              10355676 (54.79%)
        > 5kbp:              1246953 (6.60%)
        > 10kbp:             223051 (1.18%)
        > 25kbp:             0 (0.00%)
        > 50kbp:             0 (0.00%)

Completed contig round k = 55 in 7.09 s at 10/15/20 18:17:50 (172.88GB free memory on node 0)
_________________________
Contig generation k = 77

Analyzing kmers                         1.05 s
Traversing deBruijn graph               1.78 s
Dumping contigs                         0.21 s
Aligning reads to contigs               0.83 s
Locally extending ends of contigs       1.71 s
Dumping contigs                         0.23 s
_________________________
Assembly statistics (contig lengths >= 500)
    Number of contigs:       18149
    Total assembled length:  19025787
    Average contig depth:    2.54331
    Number of Ns/100kbp:     0 (0)
    Max. contig length:      16538
    Contig lengths:
        > 1kbp:              10435412 (54.85%)
        > 5kbp:              1238177 (6.51%)
        > 10kbp:             231531 (1.22%)
        > 25kbp:             0 (0.00%)
        > 50kbp:             0 (0.00%)

Completed contig round k = 77 in 5.84 s at 10/15/20 18:17:56 (172.45GB free memory on node 0)
_________________________
Contig generation k = 99

Analyzing kmers                         0.91 s
Traversing deBruijn graph               1.59 s
Dumping contigs                         0.24 s
Dumping contigs                         0.30 s
_________________________
Assembly statistics (contig lengths >= 500)
    Number of contigs:       17984
    Total assembled length:  18955469
    Average contig depth:    2.09242
    Number of Ns/100kbp:     0 (0)
    Max. contig length:      16536
    Contig lengths:
        > 1kbp:              10467305 (55.22%)
        > 5kbp:              1309406 (6.91%)
        > 10kbp:             245295 (1.29%)
        > 25kbp:             0 (0.00%)
        > 50kbp:             0 (0.00%)

Completed contig round k = 99 in 3.06 s at 10/15/20 18:17:59 (171.82GB free memory on node 0)
_________________________
Scaffolding k = 99

Aligning reads to contigs               0.64 s
Traversing contig graph                 0.46 s
Assembly statistics (contig lengths >= 500)
    Number of contigs:       18102
    Total assembled length:  19311716
    Average contig depth:    4.23073
    Number of Ns/100kbp:     0 (0)
    Max. contig length:      16536
    Contig lengths:
        > 1kbp:              10837393 (56.12%)
        > 5kbp:              1458915 (7.55%)
        > 10kbp:             259292 (1.34%)
        > 25kbp:             0 (0.00%)
        > 50kbp:             0 (0.00%)
Dumping contigs                         0.45 s

Completed scaffolding round k = 99 in 1.62 s at 10/15/20 18:18:00 (168.75GB free memory on node 0)
_________________________
Scaffolding k = 33

Aligning reads to contigs               1.28 s
Traversing contig graph                 0.73 s
Assembly statistics (contig lengths >= 500)
    Number of contigs:       18360
    Total assembled length:  27651381
    Average contig depth:    7.21741
    Number of Ns/100kbp:     0.0759456 (21)
    Max. contig length:      71617
    Contig lengths:
        > 1kbp:              20642175 (74.65%)
        > 5kbp:              6289130 (22.74%)
        > 10kbp:             3476308 (12.57%)
        > 25kbp:             997631 (3.61%)
        > 50kbp:             240837 (0.87%)

Completed scaffolding round k = 33 in 2.14 s at 10/15/20 18:18:03 (168.47GB free memory on node 0)
_________________________
Dumping contigs                         0.45 s
_________________________
Assembly statistics (contig lengths >= 500)
    Number of contigs:       18360
    Total assembled length:  27651381
    Average contig depth:    7.21741
    Number of Ns/100kbp:     0.0759456 (21)
    Max. contig length:      71617
    Contig lengths:
        > 1kbp:              20642175 (74.65%)
        > 5kbp:              6289130 (22.74%)
        > 10kbp:             3476308 (12.57%)
        > 25kbp:             997631 (3.61%)
        > 50kbp:             240837 (0.87%)

Completed finalization in 0.45 s at 10/15/20 18:18:03 (168.48GB free memory on node 0)
_________________________
Stage timing:
    main.cpp:Merge reads: 0.42 s
    main.cpp:Analyze kmers: 5.63 s 5 count
    main.cpp:Traverse deBruijn graph: 9.82 s 5 count
    main.cpp:Alignments: 6.30 s 6 count
      -> main.cpp:Kernel alignments: 0.40 s
    main.cpp:Local assembly: 6.51 s 4 count
    main.cpp:Traverse contig graph: 1.20 s 2 count
    FASTQ total read time: 0.0157606
    merged FASTQ write time: 0.282843
    Contigs write time: 4.39638
_________________________
Peak memory used across all 4 nodes: 51.47GB (12.87GB avg, 12.72GB min, 13.01GB max)
Finished in 35.21 s at 10/15/20 18:18:04 for 2.0.0.1-g9f6cac9
Overall time taken (including any restarts): 38.04 s

We will look for the total time it took:

Overall time taken (including any restarts): 38.04 s

And the accuracy of the results.

Tuning Options

You are allowed to change any gasnet or upcxx configuration as long as the results come out correctly. We will supply a tool to check that.

Input File

The input file for the competition can be downloaded here.

ISC21 SCC Tasks

  1. Build and Install UPCXX

  2. Build and Install MHM2

  3. Run MHM2 for the given input (to be given to you at the competition day)

  4. Submit the output file and the results folder.

Note: You will need to supply results on both clusters, Niagara and Aspire-1 (4 CPU nodes only).

Note:

The output of MHM2 is probabilistic, so we don't expect identical results each time. However, the results should not differ dramatically from one run to the next. For validation, you can run a script ci_asm_qual_test.sh, found in the ci directory in the mhm2 repo. That script will download a small dataset, assemble it with MHM2, and compare the results with expected results. you can look at the contents of the script to get an idea of how the validation is done.

For this competition MHM2 should be running only on CPUs.