Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This post is shows few common RDMA benchmark examples for AMD RomeAMD 2nd Generation EPYC™ CPU (formerly codenamed “Rome”) based servers to achieve maximum performance using ConnectX-6 HDR InfiniBand adapters. This post was established based testing over the AMD Daytona_X reference platform with 2nd Gen EPYC CPUs and with ConnectX-6 HDR InfiniBand adapters.

Before starting, make sure to please follow AMD Rome 2nd Gen EPYC CPU Tuning Guide for InfiniBand HPC to tune your cluster to best performance. Make sure to use latest firmware and driver.RDMA Testing is set the cluster parameters for high performance. Please use the latest firmware and driver, and find a core close to the adapter on your local Numa, see HowTo Find the local NUMA node in AMD EPYC Servers.

RDMA Testing is important to have before each application or micro-benchmark application testing, as it gives you the low level capabilities of your fabric.

Table of Contents

RDMA Write Benchmarks

RDMA Write Latency (ib_write_lat)

To check the latency of RDMA write -Write please follow those notes:

  • Make sure to Please use the core local to the HCA, in this example the HDR InfiniBand adapter is local to core 80

  • More iterations helps to make the output more smooth. In this example, I am We are using also 10000.

  • Expected RDMA write latency is around 1usec (0.97-1.02) for Rome 7742 2.25GHz using HDR InfiniBand adapter over single HDR switch.

  • NPS Configuration is not critical here10000 iterations.

Command Example:

Code Block
# numactl --physcpubind=80 ib_write_lat -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 & 
ssh rome002 numactl --physcpubind=80  ib_write_lat -a -d mlx5_2 -i 1 --report_gbits -F  rome001 -n 10000

...

Code Block
breakoutModefull-width
$ numactl --physcpubind=80 ib_write_lat -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 & ssh rome008 numactl --physcpubind=80  ib_write_lat -a -d mlx5_2 -i 1 --report_gbits -F  rome007 -n 10000
[1] 59440

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Write Latency Test                                            
 Dual-port       : OFF          Device         : mlx5_2                                
 Number of qps   : 1            Transport type : IB                                    
 Connection type : RC           Using SRQ      : OFF                                   
 Mtu             : 4096[B]                                                             
 Link type       : IB                                                                  
 Max inline data : 220[B]                                                              
 rdma_cm QPs     : OFF                                                                 
 Data ex. method : Ethernet                                                            
---------------------------------------------------------------------------------------
 local address: LID 0xba QPN 0x0134 PSN 0x597b96 RKey 0x01bb9f VAddr 0x002b1ec3800000  
 remote address: LID 0xd1 QPN 0x0135 PSN 0x18ae22 RKey 0x019a74 VAddr 0x002ab499400000 
---------------------------------------------------------------------------------------
 #bytes #iterations    t_minavg[usec]
 2  t_max[usec]  t_typical[usec]     t_avg[usec]10000    t_stdev[usec]   99% percentile[usec]   991.9%01
percentile[usec] 
 2 4       10000          01.9801
 8       10000  5.71         1.01
 16      10000          1.01
 32           0.05 10000           1.0204
 64        10000          1.8705
 128     10000          41.08
 256     10000          01.9855
 512     10000    3.35         1.0159
 1024    10000          1.0165
 2048    10000       0.05    1.77  
 4096    1.0210000          2.07  
 8192    10000  1.87        2.42  
 16384   10000  8       10000 3.01  
 32768     0.9810000           3.1895  
 65536   10000  1.01        5.28  
 131072  10000 1.01         7.93  
 0.04262144  10000          113.0323 
 524288  10000               2.0423.83 
             
 16      1048576 10000          045.9801 
 2097152 10000       2.78         1.0187.41 
 4194304 10000           1172.0121
 8388608 10000          0342.04            1.03                    1.50               
 32      10000          1.01           3.17         1.04               1.05             0.04            1.06                    2.00     52
---------------------------------------------------------------------------------------

RDMA Write Bandwidth (ib_write_bw)

To check the latency of RDMA write follow those notes:

  • Please use the core local to the HCA, in this example the HDR InfiniBand adapter is local to core 80

  • More iterations helps to make the output more smooth. In this example, we are using 10000 iterations.

  • NPS Configuration should set to 1 (or 2) for HDR for maximum bandwidth.

Command Example:

Code Block
# numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 & 
ssh rome002 numactl --physcpubind=80  ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F  rome001 -n 10000

Output example, tested on Rome cluster.

Code Block
breakoutModefull-width
$ numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 & ssh rome008 numactl --physcpubind=80  ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F  rome007 -n 10000            64      10000          1.02           2.95         1.05               1.05             0.05            1.06                    1.83                 128     10000          1.05           3.15         1.08               1.08             0.04            1.10                   
[1.60               
 256     10000          1.50] 59777           3.69         1.55               1.55             0.04            1.57                    2.02               
 512     10000          1.55           3.67         1.59               1.59             0.04            1.61                    2.29               
 1024    10000          1.61           3.63         1.65               1.66             0.03            1.68                    2.14               
 2048    10000          1.73           3.33         1.77               1.77             0.04            1.79                    2.26               
 4096    10000          2.04           4.15         2.07               2.07             0.04            2.09                    2.86               
 8192    10000          2.37           3.79         2.42               2.42             0.03            2.45                    2.99               
 16384   10000          2.93           4.32         3.01               3.01             0.03            3.07                    3.50               
 32768   10000          3.87           4.94         3.95               3.96             0.04            4.05                    4.29               
 65536   10000          5.21           8.51         5.28               5.30             0.07            5.41                    6.34               
 131072  10000          7.85           9.11         7.93               7.94             0.04            8.05                    8.33               
 262144  10000          13.15          14.30        13.23              13.24            0.04            13.36                   13.68
 524288  10000          23.74          25.17        23.83              23.84            0.04            23.93                   24.30
 1048576 10000          44.92          46.46        45.01              45.03            0.05            45.16                   45.35
 2097152 10000          87.33          88.77        87.41              87.42            0.04            87.53                   87.88
 4194304 10000          172.10         180.25       172.21             172.27           0.48            173.21                  180.00
 8388608 10000          342.33         394.65       342.52             345.79           6.22            356.91                  380.58
---------------------------------------------------------------------------------------

RDMA Write Bandwidth (ib_write_bw)

To check the latency of RDMA write follow those notes:

  • Make sure to use the core local to the HCA, in this example the HDR InfiniBand adapter is local to core 80

  • More iterations helps to make the output more smooth. In this example, I am using also 10000.

  • Expected RDMA write bandwidth line rate around 8-16K message size for Rome 7742 2.25GHz using HDR InfiniBand adapter over single HDR switch.

  • NPS Configuration should set to 1 (or 2) for HDR for maximum bandwidth.

Command Example:

Code Block
# numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 & 
ssh rome002 numactl --physcpubind=80  ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F  rome001 -n 10000

Output example, tested on Rome cluster.

Code Block
breakoutModefull-width
$ numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 & ssh rome008 numactl --physcpubind=80  ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F  rome007 -n 10000                                                                                                                                                                     

************************************
* Waiting for client to connect... *
                  ************************************
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test                                                 
 Dual-port       : OFF          Device     [1] 59777   : mlx5_2                                
                                     Number of qps   : 1            Transport type : IB                                    
 Connection type : RC           Using SRQ      : OFF                                   
 CQ Moderation   : 100    ************************************ * Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------                                                      
 Mtu     RDMA_Write BW Test      : 4096[B]                                            Dual-port       : OFF         
Device Link type       : mlx5_2 IB                                      Number of qps   : 1            Transport type : IB       
 Max inline data : 0[B]                         Connection type : RC           Using SRQ      : OFF                 
 rdma_cm QPs     : OFF           CQ Moderation   : 100                                                 
 Data ex. method : Ethernet            Mtu             : 4096[B]                                  
---------------------------------------------------------------------------------------
                         
 Link type       : IB                       local address: LID 0xba QPN 0x0135 PSN 0xddd5cc RKey 0x01c3be VAddr 0x002b948b000000  
 remote address: LID 0xd1 QPN 0x0136 PSN 0x8fb73f RKey 0x019c76 VAddr 0x002ab363c00000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]               
 Max inline data : 0[B]                                    BW average[Gb/sec]   MsgRate[Mpps]
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port     rdma_cm QPs     : OFF          Device         : mlx5_2
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using     SRQ   Data ex. method : Ethernet                                         OFF
 TX depth        : 128
 CQ Moderation   : 100
  
---------------------------------------------------------------------------------------
 local address: LID 0xba QPN 0x0135 PSN 0xddd5cc RKey 0x01c3be VAddr 0x002b948b000000  
 remote address: LID 0xd1 QPN 0x0136 PSN 0x8fb73f RKey 0x019c76 VAddr 0x002ab363c00000
Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 local address: LID 0xd1 QPN 0x0136 PSN 0x8fb73f RKey 0x019c76 VAddr 0x002ab363c00000
 remote address: LID 0xba QPN 0x0135 PSN 0xddd5cc RKey 0x01c3be VAddr 0x002b948b000000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW average[Gb/sec]
 2          10000         RDMA_Write BW Test
 Dual-port  0.066468
 4         : OFF10000          Device 0.13  
 8    : mlx5_2  Number of qps 10000  : 1        0.27  
 Transport16 type : IB  Connection type : RC 10000          Using SRQ0.53  
 32  : OFF  TX depth   10000     : 128  CQ Moderation  1.07 : 100
 Mtu64         10000    : 4096[B]  Link type   2.12  
 :128 IB  Max inline data : 0[B] 10000 rdma_cm QPs     : OFF  Data ex4.26 method :
Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0xd1 QPN 0x0136 PSN 0x8fb73f RKey 0x019c76 VAddr 0x002ab363c00000
 remote address: LID 0xba QPN 0x0135 PSN 0xddd5cc RKey 0x01c3be VAddr 0x002b948b000000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2 256        10000           8.51  
 512        10000           17.00 
 1024       10000           33.57 
 2048       10000           66.95 
10000 4096       10000   0.066914        133.17
 8192  0.066468     10000       4.154222  4  186.58
 16384      10000           192.38
 32768  0.13    10000           0197.1306
 65536      10000       4.157486  8  196.62
 131072      10000           197.47
0.27 262144     10000         0.27  197.53
 524288     10000      4.164931  16   197.54
 1048576    10000            0197.5354
 2097152    10000         0.53   197.56
 4194304    10000      4.147037  32   197.57
 8388608    10000             1.07               1.07               4.163076
 64         10000            2.13               2.12               4.132880
 128        10000            4.27               4.26               4.157987
 256        10000            8.55               8.51197.53
 ---------------------------------------------------------------------------------------

RDMA Write Bi-Directional Bandwidth (ib_write_bw -b)

To check the latency of RDMA write follow those notes:

  • Please use the core local to the HCA, in this example the HDR InfiniBand adapter is local to core 80

  • More iterations helps to make the output more smooth. In this example, we are using 10000 iterations.

  • NPS Configuration should set to 1 (or 2) for HDR for maximum bandwidth.

Command Example:

Code Block
# numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 -b & 
ssh rome002 numactl --physcpubind=80  ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F  rome001 -n 10000 -b

Output example, tested on Rome cluster.

Code Block
breakoutModefull-width
$ numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 -b

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    4.157105
 512   RDMA_Write Bidirectional BW Test
 Dual-port     10000  : OFF         17.07 Device         : mlx5_2
 Number 17.00of qps   : 1         4.150256  1024 Transport type : IB
 Connection 10000type : RC          33.82 Using SRQ      : OFF
 TX depth  33.57      : 128
 CQ Moderation   : 4.097722100
 2048Mtu       10000      :      67.27  4096[B]
 Link type       : IB
 Max inline 66.95data : 0[B]
 rdma_cm QPs     : OFF
 Data 4ex.086370 method 4096: Ethernet
---------------------------------------------------------------------------------------
 local address: LID 100000xba QPN 0x0138 PSN 0x210c6 RKey 0x01e1ef VAddr 0x002b7835800000
 remote address: 133.57LID 0xd1 QPN 0x0139 PSN 0x99d1e6 RKey       133.17             4.064149
 8192       10000 0x01dae9 VAddr 0x002abd1dc00000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW  186.65 average[Gb/sec]
 2          10000  186.58         0.132682
 4  2.846967  16384      10000            1920.5027   
 8        192.38  10000           10.46776953  
 3276816      10000   10000         197.07  1.06  
 32       197.06  10000           02.75171312  
65536 64     10000    10000        196.64   4.22  
 128      196.62  10000           08.37502550  131072
 256   10000     10000       197.48    16.93 
 512      197.47  10000           033.18831983 
262144 1024    10000   10000         197.54  66.53 
 2048       10000 197.53          131.87
  0.0941914096  524288     10000           261.10
197.57 8192       10000     197.54      357.88
 16384     0.047097 10000 1048576    10000      379.22
 32768    197.57  10000           197391.5456
 65536      10000     0.023549  2097152    10000390.98
 131072     10000     197.58       393.42
 262144    197.56 10000           393.66
0.011775 524288 4194304    10000            197393.5876
 1048576    10000       197.57    393.80
 2097152    10000   0.005888  8388608    10000  393.82
 4194304    10000    197.55       393.82
 8388608    197.5310000             0393.002943 
 83
---------------------------------------------------------------------------------------

Note: All trademarks are property of their respective owners. All information is provided “As-Is” without any kind of warranty.   The HPC-AI Advisory Council makes no representation to the accuracy and completeness of the information contained herein.  HPC------ AI Advisory Council undertakes no duty and assumes no obligation to update or correct any information presented herein.

References