Getting Started with Basic QoS Test (Strict Priority)

Before you start, make sure you understand the concepts, see Understanding Basic InfiniBand QoS.

For a basic test, you can have two hosts connected via an InfiniBand switch, sending RDMA traffic on two different service levels.

In the test we can use two CPU cores (core 0, core 1), each will run ib_write_bw on different SL, expecting to see the high priority traffic reaching maximum performance.

Configuration

Check the mapping between SL to VL use smpquery sl2vl (with the lid address for example)

$ sudo smpquery sl2vl -L 141
# SL2VL table: Lid 91
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  0: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|

To change the mapping configuration use OpenSM config file:

$ diff /etc/opensm/opensm.conf /etc/opensm/opensm.conf.orig
< qos TRUE
> qos FALSE

< qos_max_vls 2
< qos_high_limit 255
< qos_vlarb_high 1:192
< qos_vlarb_low 0:64
< qos_sl2vl 0,1

> qos_max_vls 0
> qos_high_limit -1
> qos_vlarb_high (null)
> qos_vlarb_low (null)
> qos_sl2vl (null)

Start the OpenSM

$ sudo opensm -g 0x98039b03009fcfd6 -F /etc/opensm/opensm.conf -B
-------------------------------------------------
OpenSM 5.4.0.MLNX20190422.ed81811
Config file is `/etc/opensm/opensm.conf`:
 Reading Cached Option File: /etc/opensm/opensm.conf
 Loading Cached Option:qos = TRUE
 Loading Changed QoS Cached Option:qos_max_vls = 2
 Loading Changed QoS Cached Option:qos_high_limit = 255
 Loading Changed QoS Cached Option:qos_vlarb_low = 0:64
 Loading Changed QoS Cached Option:qos_vlarb_high = 1:192
 Loading Changed QoS Cached Option:qos_sl2vl = 0,1
 Warning: Cached Option qos_sl2vl: < 16 VLs listed
Command Line Arguments:
 Guid <0x98039b03009fcfd6>
 Daemon mode
 Log File: /var/log/opensm.log

Check sl2vl mapping table:

$ sudo smpquery sl2vl -L 141
# SL2VL table: Lid 141
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  0: | 0| 1| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|

Check VL arbiter tables:

$ sudo smpquery vlarb 141
# VLArbitration tables: Lid 141 port 0 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
WEIGHT: |0x40|0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
# High priority VL Arbitration Table:
VL    : |0x1 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
WEIGHT: |0xC0|0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |

check SL port counters

$ sudo perfquery -X 141 1
# PortXmitDataSL counters: Lid 141 port 1
PortSelect:......................1
CounterSelect:...................0x0000
XmtDataSL0:......................3677098498
XmtDataSL1:......................2771713603
XmtDataSL2:......................0
XmtDataSL3:......................0
XmtDataSL4:......................0
XmtDataSL5:......................0
XmtDataSL6:......................0
XmtDataSL7:......................0
XmtDataSL8:......................0
XmtDataSL9:......................0
XmtDataSL10:.....................0
XmtDataSL11:.....................0
XmtDataSL12:.....................0
XmtDataSL13:.....................0
XmtDataSL14:.....................0
XmtDataSL15:.....................0

Run RDMA Traffic

Low priority traffic: (core 0, SL 0)

$ numactl --cpunodebind=0 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=0 -D 10 

High priority traffic (core 1, SL1)

$ numactl --cpunodebind=1 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=1 -D 10

Make sure you get 0 Gb/s on SL 0 (no packet could be sent)

$ numactl --cpunodebind=0 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=0 -D 10 
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x8f QPN 0xdd16 PSN 0x25f4a4 RKey 0x0e1848 VAddr 0x002b65b2130000
 remote address: LID 0x8d QPN 0x02c6 PSN 0xdb2c00 RKey 0x17d997 VAddr 0x002b8263ed0000
---------------------------------------------------------------------------------------
 #bytes     #iterations BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 65536      0           0.000000           0.000000             0.000000 
 ---------------------------------------------------------------------------------------

Make sure you get close to 100 Gb/s on SL 1

$ numactl --cpunodebind=1 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=1 -D 10 
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test                                                 
 Dual-port       : OFF          Device         : mlx5_0                                
 Number of qps   : 1            Transport type : IB                                    
 Connection type : RC           Using SRQ      : OFF                                   
 CQ Moderation   : 100                                                                 
 Mtu             : 4096[B]                                                             
 Link type       : IB                                                                  
 Max inline data : 0[B]                                                                
 rdma_cm QPs     : OFF                                                                 
 Data ex. method : Ethernet                                                            
---------------------------------------------------------------------------------------
 #bytes     #iterations   BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 65536      1104900       0.00               96.55              0.184149 
---------------------------------------------------------------------------------------

Getting Started with Basic QoS Test (WRR)

Before you start, make sure you understand the concepts, see Understanding Basic InfiniBand QoS.

Weighted Round Robin (WRR) arbitrer allows to split the possible bandwidth between high and low priority traffic, without possible starvation as it may happen in the strict priority example. With WRR, you may give different weight for every SL while the arbitrer will perform WRR between them.

For this basic test, you can have two hosts connected via an InfiniBand switch, sending RDMA traffic on two different service levels.

In the test we can use two CPU cores (core 0, core 1), each will run ib_write_bw on different SL, expecting to see the high priority traffic reaching 3/4 of the link speed.

Configuration

Check the mapping between SL to VL use smpquery sl2vl (with the lid address for example)

$ sudo smpquery sl2vl -L 141
# SL2VL table: Lid 91
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  0: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|

To change the mapping configuration use OpenSM config file:

$ diff /etc/opensm/opensm.conf /etc/opensm/opensm.conf.orig
< qos TRUE
> qos FALSE

< qos_max_vls 2
< qos_high_limit -1
< qos_vlarb_high 1:192
< qos_vlarb_low 0:64
< qos_sl2vl 0,1

> qos_max_vls 0
> qos_high_limit -1
> qos_vlarb_high (null)
> qos_vlarb_low (null)
> qos_sl2vl (null)

Start the OpenSM

$ sudo opensm -g 0x98039b03009fcfd6 -F /etc/opensm/opensm.conf -B
-------------------------------------------------
OpenSM 5.4.0.MLNX20190422.ed81811
Config file is `/etc/opensm/opensm.conf`:
 Reading Cached Option File: /etc/opensm/opensm.conf
 Loading Cached Option:qos = TRUE
 Loading Changed QoS Cached Option:qos_max_vls = 2
 Loading Changed QoS Cached Option:qos_vlarb_low = 0:64
 Loading Changed QoS Cached Option:qos_vlarb_high = 1:192
 Loading Changed QoS Cached Option:qos_sl2vl = 0,1
 Warning: Cached Option qos_sl2vl: < 16 VLs listed
Command Line Arguments:
 Guid <0x98039b03009fcfd6>
 Daemon mode
 Log File: /var/log/opensm.log

Check sl2vl mapping table:

$ sudo smpquery sl2vl -L 141
# SL2VL table: Lid 141
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  0: | 0| 1| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|

Check VL arbiter tables:

$ sudo smpquery vlarb 141
# VLArbitration tables: Lid 141 port 0 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
WEIGHT: |0x40|0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
# High priority VL Arbitration Table:
VL    : |0x1 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
WEIGHT: |0xC0|0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |

To check SL port counters

$ sudo perfquery -X 141 1
# PortXmitDataSL counters: Lid 141 port 1
PortSelect:......................1
CounterSelect:...................0x0000
XmtDataSL0:......................3677098498
XmtDataSL1:......................2771713603
XmtDataSL2:......................0
XmtDataSL3:......................0
XmtDataSL4:......................0
XmtDataSL5:......................0
XmtDataSL6:......................0
XmtDataSL7:......................0
XmtDataSL8:......................0
XmtDataSL9:......................0
XmtDataSL10:.....................0
XmtDataSL11:.....................0
XmtDataSL12:.....................0
XmtDataSL13:.....................0
XmtDataSL14:.....................0
XmtDataSL15:.....................0

Run RDMA Traffic

Low priority traffic: (core 0, SL 0)

$ numactl --cpunodebind=0 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=0 -D 10 

High priority traffic (core 1, SL1)

$ numactl --cpunodebind=1 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=1 -D 10

Make sure you get ~1/4 of the link speed n SL 0

$ numactl --cpunodebind=0 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=0 -D 10 
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x8f QPN 0xdd16 PSN 0x25f4a4 RKey 0x0e1848 VAddr 0x002b65b2130000
 remote address: LID 0x8d QPN 0x02c6 PSN 0xdb2c00 RKey 0x17d997 VAddr 0x002b8263ed0000
---------------------------------------------------------------------------------------
 #bytes     #iterations BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 65536      276300           0.00               24.14              0.046050 
 ---------------------------------------------------------------------------------------

Make sure you get close to 3/4 Gb/s on SL 1

$ numactl --cpunodebind=1 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=1 -D 10 
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test                                                 
 Dual-port       : OFF          Device         : mlx5_0                                
 Number of qps   : 1            Transport type : IB                                    
 Connection type : RC           Using SRQ      : OFF                                   
 CQ Moderation   : 100                                                                 
 Mtu             : 4096[B]                                                             
 Link type       : IB                                                                  
 Max inline data : 0[B]                                                                
 rdma_cm QPs     : OFF                                                                 
 Data ex. method : Ethernet                                                            
---------------------------------------------------------------------------------------
 #bytes     #iterations   BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
  65536      828900           0.00               72.43              0.138149
---------------------------------------------------------------------------------------

Useful commands

$ sudo kill $(ps -ef | grep opensm | grep root | awk '{print $2}') ; sudo opensm -g 0x98039b03009fcfd6 -F /etc/opensm/opensm.conf -B

References