Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The following tests were done with HPC-X version 2.57.0-pre, gcc/4.8.5intel 2019 compilers

Table of Contents

OSU Point to Point Tests

...

  • This micro-benchmarks runs on two cores only .(basic latency and bandwidth)

  • Please use the local core to the adapter, in this case core 80.

  • HPC-X MPI version 2.57.0 was used

  • 10000 iteration were used per test

  • OSU 5.6.2

  • MLNX_OFED 5.0.2

Command example:

Code Block
breakoutModefull-width
mpirun mpirun -np 2 -map-by ppr:1:node -rank-by core -bind-to cpu-list:ordered  -cpu-list 80  -mca pml ucx80   -x UCX_NET_DEVICES=mlx5_2:1 osu_latency -i 10000010000 -x 10000010000

Command output example on Rome (HDR):

Code Block
breakoutModefull-width
$ mpirun -np 2   -map-by ppr:1:node -rank-by core -bind-to cpu-list:ordered  -cpu-list 80  -mca pml ucx -x UCX_NET_DEVICES=mlx5_2:1 osu_latency -i 100000010000 -x 100000000010000
# OSU MPI Latency Test v5.6.2
# Size          Latency (us)
0                       1.0907
1                       1.1007
2                       1.0907
4                       1.0907
8                       1.0907
16                      1.1007
32                      1.2817
64                      1.3126
128                     1.4631
256                     1.8570
512                     21.0494
1024                    2.4127
2048                    2.4927
4096                    32.0180
8192                    3.8744
16384                   4.9658
32768                   76.2256
65536                   109.4236
131072                 1615.8519
262144                 1716.3748
524288                 3027.1246
1048576                5550.9623
2097152                10795.3084
4194304               202175.4050

osu_bw

This is a point to point benchmark.

  • This micro-benchmarks runs on two cores only.

  • Please use the local core to the adapter, in this case core 80.

  • HPC-X MPI version 2.57.0 was used

  • 10000 iteration were used per test

  • OSU 5.6.2

  • Set NPS=1 (or 2) on the BIOS to reach line rate (more memory channels).

...

Code Block
breakoutModefull-width
mpirun mpirun -np 2 -map-by ppr:1:node -rank-by core -bind-to cpu-list:ordered  -cpu-list 80  -mca80 pml ucx 
-x UCX_NET_DEVICES=mlx5_2:1 osu_bw -i 10000 -x 10000

...

Code Block
breakoutModefull-width
$ mpirunmpirun -np 2   -map-by ppr:1:node -rank-by core -bind-to cpu-list:ordered  -cpu-list 80  -mca pml ucx -x UCX_NET_DEVICES=mlx5_2:1  osu_bw -i 10000 -x 10000
# OSU MPI Bandwidth Test v5.6.2
# Size      Bandwidth (MB/s)
1                       3.8390
2                       7.6480
4                      15.2553
8                      3031.6120
16                     6162.1634
32                    121124.3911
64                    240243.3701
128                   472477.1291
256                   893900.9270
512                  15061593.8369
1024                 28143103.1496
2048                 42865299.1351
4096                 62317513.4663
8192                 851610371.1166
16384                929816105.8336
32768               1776819001.4713
65536               1932022253.0028
131072              2114023313.3310
262144              2397223997.8289
524288              2438024349.9010
1048576             24532.09
2097152             24614.44
4194304             24636.01

osu_bibw

This is a point to point benchmark.

  • This micro-benchmarks runs on two cores only.

  • Please use the local core to the adapter, in this case core 80.

  • HPC-X MPI version 2.57.0 was used

  • 10000 iteration were used per test

  • OSU 5.6.2

  • Set NPS=1 (or 2) on the BIOS to reach line rate (more memory channels).

...

Code Block
breakoutModefull-width
mpirun -np 2 -map-by ppr:1:node -rank-by core -bind-to cpu-list:ordered  -cpu-list 80  
-mca pml ucx90   -x UCX_NET_DEVICES=mlx5_2:1  osu_bibw -i 10000 -x 10000 -W 512

Command output example on Rome (HDR):

Code Block
breakoutModefull-width
$ mpirun -np 2   -map-by ppr:1:node -rank-by core -bind-to cpu-list:ordered  -cpu-list 80 90 -mca pml ucx -x UCX_NET_DEVICES=mlx5_2:1  osu_bibw -i 10000 -x 10000 -W 512 
# OSU MPI Bi-Directional Bandwidth Test v5.6.2
# Size      Bandwidth (MB/s)
1                       65.3782
2                      1211.7763
4                      2523.6427
8                      5146.3262
16                    102 93.3807
32                    202185.3889
64                    306285.0426
128                   594559.6738
256                  10851143.7645
512                  18011761.3726
1024                 31833385.1442
2048                 53215512.6257
4096                 75849142.8215
8192                1012615138.0753
16384               1067821865.7895
32768               2824830857.4667
65536               3423439546.8248
131072              3825743946.8792
262144              4178646488.1386
524288              4624047851.9840
1048576             4846448518.9397
2097152             4886048831.1865
4194304             4901348942.0690

osu_mbw_mr

Multi Bandwidth Message Rate test creates multiple pairs that are sending traffic each other. Each rank of those pairs are located on difference node (otherwise, it will be shared memory test).

...

Code Block
breakoutModefull-width
$ mpirun  -np 128  -map-by ppr:64:node -rank-by core -bind-to cpu-list:ordered -cpu-list 64-127  -mca pml ucx -x UCX_NET_DEVICES=mlx5_2:1 osu_mbw_mr   -W 512
# OSU MPI Multiple Bandwidth / Message Rate Test v5.6.2
# [ pairs: 64 ] [ window size: 512 ]
# Size                  MB/s        Messages/s
1                     130.26      130257378.09194.22	194215337.24
2                     258.83      129414509.59387.23	193615177.36
4                     519.04      129760593.14773.95	193486570.81
8                    1027.171545.46	193182287.83
16      128395883.76 16            2976.90	186056309.00
32      2055.53      128470323.55 32      4297.78	134305707.67
64            2952.82       92275622.75
646344.98	99140297.19
128                   5782.98 9758.37	76237262.16
256      90359007.38 128          13561.93	52976275.87
512       8340.03       65156459.45 256  17913.93	34988135.77
1024              14207.65       55498629.86
51221370.83	20869955.40
2048                 18901.00 23158.19	11307707.78
4096      36916009.55 1024           23462.98	5728265.86
8192    21789.19       21278509.05 2048    24260.12	2961439.84
16384           23110.41       11284378.38
409623698.72	1446455.16
32768                23748.48 23653.46	721846.43
65536       5797969.69 8192       23803.63	363214.64
131072        23810.83        2906595.67
1638424523.31	187097.99
262144               24284.29 24546.99	93639.35
524288       1482195.44 32768      24557.37	46839.47
1048576        24394.41         744458.18
6553624571.09	23432.82
2097152               24438.78         372906.17
13107224579.22	11720.29
4194304              24748.07         188812.7724561.79	5855.99

OSU Collectives

osu_barrier

...