Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This post is shows few common RDMA benchmark examples for AMD Rome2nd Generation EPYC™ CPU (formerly codenamed “Rome”) based servers to achieve maximum performance using ConnectX-6 HDR InfiniBand adapters. This post was established based testing over the AMD Daytona_X reference platform with 2nd Gen EPYC CPUs and with ConnectX-6 HDR InfiniBand adapters.

Before starting, make sure to please follow AMD Rome 2nd Gen EPYC CPU Tuning Guide for InfiniBand HPC to tune your cluster to best performance. Make sure to use set the cluster parameters for high performance. Please use the latest firmware and driver, and find a core close to the adapter on your local Numa, see HowTo Find the local NUMA node in AMD EYPC EPYC Servers.

RDMA Testing is important to have before each application or micro-benchmark application testing, as it gives you the low level capabilities of your fabric.

Table of Contents

RDMA Write Benchmarks

RDMA Write Latency (ib_write_lat)

To check the latency of RDMA write -Write please follow those notes:

  • Make sure to Please use the core local to the HCA, in this example the HDR InfiniBand adapter is local to core 80

  • More iterations helps to make the output more smooth. In this example, We are using 10000 iterations.

...

Code Block
breakoutModefull-width
$ numactl --physcpubind=80 ib_write_lat -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 & ssh rome008 numactl --physcpubind=80  ib_write_lat -a -d mlx5_2 -i 1 --report_gbits -F  rome007 -n 10000
[1] 59440

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Write Latency Test                                            
 Dual-port       : OFF          Device         : mlx5_2                                
 Number of qps   : 1            Transport type : IB                                    
 Connection type : RC           Using SRQ      : OFF                                   
 Mtu             : 4096[B]                                                             
 Link type       : IB                                                                  
 Max inline data : 220[B]                                                              
 rdma_cm QPs     : OFF                                                                 
 Data ex. method : Ethernet                                                            
---------------------------------------------------------------------------------------
 local address: LID 0xba QPN 0x0134 PSN 0x597b96 RKey 0x01bb9f VAddr 0x002b1ec3800000  
 remote address: LID 0xd1 QPN 0x0135 PSN 0x18ae22 RKey 0x019a74 VAddr 0x002ab499400000 
---------------------------------------------------------------------------------------
 #bytes #iterations    t_minavg[usec]
 2  t_max[usec]    t_typical[usec] 10000   t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec] 
 21.01
 4         10000          01.9801
 8        10000 5.71         1.01
 16      10000          1.01
 32           0.0510000            1.0204
 64        10000          1.8705
 128     10000          41.08
 256     10000          01.9855
 512     10000    3.35      1.59
 1024  1.01  10000          1.65
 2048 1.01   10000          0.051.77  
 4096    10000        1.02  2.07  
 8192    10000          12.8742  
 16384   10000          83.01  
 32768   10000          03.9895  
 65536   10000    3.18      5.28  
1.01 131072  10000            17.0193  
 262144  10000       0.04   13.23 
 524288  10000    1.03      23.83 
 1048576 10000          245.0401 
 2097152 10000          87.41 
 16     4194304 10000          0172.9821
 8388608    10000     2.78         1.01               1.01             0.04            1.03                    1.50               
 32      10000          1.01           3.17         1.04               1.05             0.04            1.06      342.52
---------------------------------------------------------------------------------------

RDMA Write Bandwidth (ib_write_bw)

To check the latency of RDMA write follow those notes:

  • Please use the core local to the HCA, in this example the HDR InfiniBand adapter is local to core 80

  • More iterations helps to make the output more smooth. In this example, we are using 10000 iterations.

  • NPS Configuration should set to 1 (or 2) for HDR for maximum bandwidth.

Command Example:

Code Block
# numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 & 
ssh rome002 numactl --physcpubind=80  ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F  rome001 -n 10000

Output example, tested on Rome cluster.

Code Block
breakoutModefull-width
$ numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 & ssh rome008 numactl --physcpubind=80  ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F  rome007 -n 10000              2.00                 64      10000          1.02           2.95         1.05               1.05             0.05            1.06                    1.83                 128     10000          1.05           3.15         1.08               1.08           
 0.04[1] 59777            1.10                    1.60                 256     10000          1.50           3.69         1.55               1.55             0.04            1.57                    2.02                 512     10000          1.55           3.67         1.59 

************************************
* Waiting for client        1.59             0.04            1.61                    2.29               
 1024    10000to connect... *
************************************
---------------------------------------------------------------------------------------
             1.61       RDMA_Write BW Test  3.63         1.65               1.66             0.03          
 1.68Dual-port       : OFF          Device  2.14       : mlx5_2         2048    10000          1.73         
 3.33Number of qps   : 1   1.77         Transport type : IB   1.77             0.04            1.79        
 Connection type : RC       2.26    Using SRQ      : OFF     4096    10000          2.04           4.15     
 CQ Moderation 2.07  : 100            2.07             0.04            2.09                    2.86        
 Mtu       8192    10000  : 4096[B]       2.37           3.79         2.42               2.42             0.03      
 Link type   2.45    : IB               2.99                 16384   10000          2.93           4.32         3.01 
 Max inline data : 0[B]        3.01             0.03            3.07                    3.50           
 rdma_cm QPs   32768  : 10000OFF          3.87           4.94         3.95               3.96             0.04         
 Data 4ex.05 method : Ethernet                 4.29                 65536   10000          5.21           8.51         5.28               5.30             0.07            5.41                    6.34               
 131072  10000          7.85  
---------------------------------------------------------------------------------------
 local address: LID 0xba QPN 0x0135 PSN 0xddd5cc RKey 0x01c3be VAddr 0x002b948b000000  
 remote address: LID 0xd1 QPN 0x0136 PSN 0x8fb73f RKey 0x019c76 VAddr 0x002ab363c00000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
---------------------------------------------------------------------------------------
          9.11         7.93 RDMA_Write BW Test
 Dual-port       : OFF  7.94        Device     0.04    : mlx5_2
 Number of qps   8.05: 1            Transport type : IB
 Connection type 8.33: RC           Using SRQ    262144  10000: OFF
 TX depth      13.15  : 128
 CQ Moderation   : 14.30100
 Mtu      13.23       : 4096[B]
 Link  type  13.24     : IB
 Max inline data : 0.04[B]
           13.36                   13.68
 524288  10000          23.74          25.17        23.83              23.84            0.04            23.93                   24.30
 1048576 10000          44.92          46.46        45.01              45.03            0.05            45.16                   45.35
 2097152 10000          87.33          88.77        87.41              87.42            0.04            87.53                   87.88
 4194304 10000          172.10         180.25       172.21             172.27           0.48            173.21                  180.00
 8388608 10000          342.33         394.65       342.52             345.79           6.22            356.91                  380.58
---------------------------------------------------------------------------------------

RDMA Write Bandwidth (ib_write_bw)

To check the latency of RDMA write follow those notes:

  • Make sure to use the core local to the HCA, in this example the HDR InfiniBand adapter is local to core 80

  • More iterations helps to make the output more smooth. In this example, We are using 10000 iterations.

  • NPS Configuration should set to 1 (or 2) for HDR for maximum bandwidth.

Command Example:

Code Block
# numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 & 
ssh rome002 numactl --physcpubind=80  ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F  rome001 -n 10000

Output example, tested on Rome cluster.

Code Block
breakoutModefull-width
$ numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 & ssh rome008 numactl --physcpubind=80  ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F  rome007 -n 10000                                                                                                                                                                                                             
[1] 59777                                                                                                                                                                                                     

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test                                                 
 Dual-port       : OFF          Device         : mlx5_2                                
 Number of qps   : 1            Transport type : IB                                    
 Connection type : RC           Using SRQ      : OFF                                   
 CQ Moderation   : 100                                                                 
 Mtu             : 4096[B]                                                             
 Link type       : IB                                                                  
 Max inline data : 0[B]                                                                
 rdma_cm QPs     : OFF                                                                 
 Data ex. method : Ethernet                                                            
---------------------------------------------------------------------------------------
 local address: LID 0xba QPN 0x0135 PSN 0xddd5cc RKey 0x01c3be VAddr 0x002b948b000000  
 remote address: LID 0xd1 QPN 0x0136 PSN 0x8fb73f RKey 0x019c76 VAddr 0x002ab363c00000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_2
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0xd1 QPN 0x0136 PSN 0x8fb73f RKey 0x019c76 VAddr 0x002ab363c00000
 remote address: LID 0xba QPN 0x0135 PSN 0xddd5cc RKey 0x01c3be VAddr 0x002b948b000000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2          10000           0.066914            0.066468            4.154222
 4          10000            0.13               0.13               4.157486
 8          10000            0.27               0.27               4.164931
 16         10000            0.53               0.53               4.147037
 32         10000            1.07               1.07               4.163076
 64         10000            2.13               2.12               4.132880
 128        10000            4.27               4.26               4.157987
 256        10000            8.55               8.51               4.157105
 512        10000            17.07              17.00              4.150256
 1024       10000            33.82              33.57              4.097722
 2048       10000            67.27              66.95              4.086370
 4096       10000            133.57             133.17             4.064149
 8192       10000            186.65             186.58             2.846967
 16384      10000            192.50             192.38             1.467769
 32768      10000            197.07             197.06             0.751713
 65536      10000            196.64             196.62             0.375025
 131072     10000            197.48             197.47             0.188319
 262144     10000            197.54             197.53             0.094191
 524288     10000            197.57             197.54             0.047097
 1048576    10000            197.57             197.54             0.023549
 2097152    10000            197.58             197.56             0.011775
 4194304    10000            197.58             197.57             0.005888
 8388608    10000            197.55             197.53             0.002943 
 ---------------------------------------------------------------------------------------

RDMA Write Bi-Directional Bandwidth (ib_write_bw -b)

To check the latency of RDMA write follow those notes:

  • Make sure to use the core local to the HCA, in this example the HDR InfiniBand adapter is local to core 80

  • More iterations helps to make the output more smooth. In this example, I am using also 10000.

  • Expected RDMA write bandwidth line rate around 32K message size for Rome 7742 2.25GHz using HDR InfiniBand adapter over single HDR switch.

  • NPS Configuration should set to 1 (or 2) for HDR for maximum bandwidth.

Command Example:

Code Block
# numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 -b & 
ssh rome002 numactl --physcpubind=80  ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F  rome001 -n 10000 -b

Output example, tested on Rome cluster.

Code Block
breakoutModefull-width
$ numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 -b

************************************
* Waiting for client to connect... *
************************************rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0xd1 QPN 0x0136 PSN 0x8fb73f RKey 0x019c76 VAddr 0x002ab363c00000
 remote address: LID 0xba QPN 0x0135 PSN 0xddd5cc RKey 0x01c3be VAddr 0x002b948b000000
----------------------------------------------------------------------------------------
 #bytes     #iterations    BW average[Gb/sec]
 2          10000           0.066468
 4          10000   RDMA_Write Bidirectional BW Test  Dual-port   0.13  
 :8 OFF         10000 Device         : mlx5_2 0.27  Number
of qps16   : 1     10000       Transport type : IB 0.53 Connection type
: RC32         10000  Using SRQ      : OFF 1.07 TX depth
 64      : 128  CQ10000 Moderation   : 100  Mtu    2.12  
 128     : 4096[B]  Link10000 type       : IB  Max4.26 inline data
: 0[B] 256   rdma_cm QPs    10000 : OFF  Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0xba QPN 0x0138 PSN 0x210c6 RKey 0x01e1ef VAddr 0x002b7835800000
 remote address: LID 0xd1 QPN 0x0139 PSN 0x99d1e6 RKey 0x01dae9 VAddr 0x002abd1dc00000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2      8.51  
 512        10000           17.00 
 1024       10000           33.57 
 2048       10000           66.95 
 4096       10000           133.17
 8192       10000           0.133457 186.58
 16384      10000           0.132682192.38
 32768      10000       8.292601  4  197.06
 65536      10000           196.62
0.27 131072     10000         0.27  197.47
 262144     10000      8.312058  8   197.53
 524288     10000            0197.5354
 1048576    10000         0.53  197.54
 2097152    10000       8.270148  16  197.56
  4194304    10000            1197.0757
 8388608    10000         1.06               8.289442
 32         10000            2.13               2.12               8.287524
 64         10000            4.24               4.22               8.251245
 128        10000            8.53               8.50197.53
 ---------------------------------------------------------------------------------------

RDMA Write Bi-Directional Bandwidth (ib_write_bw -b)

To check the latency of RDMA write follow those notes:

  • Please use the core local to the HCA, in this example the HDR InfiniBand adapter is local to core 80

  • More iterations helps to make the output more smooth. In this example, we are using 10000 iterations.

  • NPS Configuration should set to 1 (or 2) for HDR for maximum bandwidth.

Command Example:

Code Block
# numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 -b & 
ssh rome002 numactl --physcpubind=80  ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F  rome001 -n 10000 -b

Output example, tested on Rome cluster.

Code Block
breakoutModefull-width
$ numactl --physcpubind=80 ib_write_bw -a -d mlx5_2 -i 1 --report_gbits -F -n 10000 -b

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
              8.301570  256    RDMA_Write Bidirectional BW Test
10000            16.99              16.93         Dual-port      8.264573 : 512OFF        10000  Device          33.94: mlx5_2
 Number of qps   : 1      33.83      Transport type : IB
 Connection type : 8.258309RC  1024       10000  Using SRQ      : OFF
 66.96TX depth        : 128
 CQ Moderation 66.53  : 100
 Mtu         8.120855  2048  : 4096[B]
 Link type 10000      : IB
 Max inline data 132.49 : 0[B]
 rdma_cm QPs     : OFF
 Data 131ex.87 method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0xba QPN 0x0138 PSN 0x210c6 8.048677RKey 0x01e1ef 4096VAddr 0x002b7835800000
 remote address: LID 0xd1 10000QPN 0x0139 PSN 0x99d1e6 RKey 0x01dae9 VAddr 0x002abd1dc00000
---------------------------------------------------------------------------------------
 #bytes  261.92   #iterations    BW      261.10             7.968046
 8192average[Gb/sec]
 2          10000            3580.14132682
 4           357.8810000             50.46077527  16384 
 8   10000       10000     379.43             379.220.53  
       16   2.893244  32768    10000  10000         1.06  
391.65 32         10000   391.56        2.12  
 64 1.493699  65536      10000           4.22  391.03
 128        10000   390.98        8.50  
 256 0.745735  131072     10000           16.93 393.45
 512        10000   393.42        33.83 
 1024  0.375196  262144   10000  10000         66.53 
 393.682048       10000      393.66     131.87
 4096      0.187714 10000 524288     10000     261.10
 8192     393.78  10000           393357.7688
 16384      10000     0.093879  1048576    10000379.22
 32768      10000    393.85       391.56
 65536    393.80  10000           0390.04694598
 131072 2097152    10000            393.8842
 262144     10000      393.82     393.66
 524288     10000 0.023473  4194304    10000    393.76
 1048576    10000  393.87         393.80
 2097152  393.82  10000           0393.01173782
 83886084194304    10000            393.8582
 8388608    10000       393.83             0.005869
393.83
---------------------------------------------------------------------------------------

Note: All trademarks are property of their respective owners. All information is provided “As-Is” without any kind of warranty.   The HPC-AI Advisory Council makes no representation to the accuracy and completeness of the information contained herein.  HPC-AI Advisory Council undertakes no duty and assumes no obligation to update or correct any information presented herein.

References