Getting Started with InfiniBand QoS

Getting Started with Basic QoS Test (Strict Priority)

Before you start, make sure you understand the concepts, see Understanding Basic InfiniBand QoS.

 

For a basic test, you can have two hosts connected via an InfiniBand switch, sending RDMA traffic on two different service levels.

  • SL 0 - to be used for best effort traffic

  • SL 1 - to be used for high priority traffic

 

In the test we can use two CPU cores (core 0, core 1), each will run ib_write_bw on different SL, expecting to see the high priority traffic reaching maximum performance.

Configuration

Check the mapping between SL to VL use smpquery sl2vl (with the lid address for example)

 

$ sudo smpquery sl2vl -L 141 # SL2VL table: Lid 91 # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| ports: in 0, out 0: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|

 

To change the mapping configuration use OpenSM config file:

$ diff /etc/opensm/opensm.conf /etc/opensm/opensm.conf.orig < qos TRUE > qos FALSE < qos_max_vls 2 < qos_high_limit 255 < qos_vlarb_high 1:192 < qos_vlarb_low 0:64 < qos_sl2vl 0,1 > qos_max_vls 0 > qos_high_limit -1 > qos_vlarb_high (null) > qos_vlarb_low (null) > qos_sl2vl (null)

 

Start the OpenSM

$ sudo opensm -g 0x98039b03009fcfd6 -F /etc/opensm/opensm.conf -B ------------------------------------------------- OpenSM 5.4.0.MLNX20190422.ed81811 Config file is `/etc/opensm/opensm.conf`: Reading Cached Option File: /etc/opensm/opensm.conf Loading Cached Option:qos = TRUE Loading Changed QoS Cached Option:qos_max_vls = 2 Loading Changed QoS Cached Option:qos_high_limit = 255 Loading Changed QoS Cached Option:qos_vlarb_low = 0:64 Loading Changed QoS Cached Option:qos_vlarb_high = 1:192 Loading Changed QoS Cached Option:qos_sl2vl = 0,1 Warning: Cached Option qos_sl2vl: < 16 VLs listed Command Line Arguments: Guid <0x98039b03009fcfd6> Daemon mode Log File: /var/log/opensm.log

 

Check sl2vl mapping table:

 

Check VL arbiter tables:

 

 

check SL port counters

Run RDMA Traffic

 

Low priority traffic: (core 0, SL 0)

 

High priority traffic (core 1, SL1)

 

Make sure you get 0 Gb/s on SL 0 (no packet could be sent)

 

Make sure you get close to 100 Gb/s on SL 1

 

Getting Started with Basic QoS Test (WRR)

Before you start, make sure you understand the concepts, see Understanding Basic InfiniBand QoS.

Weighted Round Robin (WRR) arbitrer allows to split the possible bandwidth between high and low priority traffic, without possible starvation as it may happen in the strict priority example. With WRR, you may give different weight for every SL while the arbitrer will perform WRR between them.

For this basic test, you can have two hosts connected via an InfiniBand switch, sending RDMA traffic on two different service levels.

  • SL 0 - to be used for best effort traffic (1/4 of the traffic in this example)

  • SL 1 - to be used for high priority traffic (3/4 of the traffic in this example)

 

In the test we can use two CPU cores (core 0, core 1), each will run ib_write_bw on different SL, expecting to see the high priority traffic reaching 3/4 of the link speed.

Configuration

Check the mapping between SL to VL use smpquery sl2vl (with the lid address for example)

 

 

To change the mapping configuration use OpenSM config file:

 

Start the OpenSM

 

Check sl2vl mapping table:

 

Check VL arbiter tables:

 

 

To check SL port counters

Run RDMA Traffic

 

Low priority traffic: (core 0, SL 0)

 

High priority traffic (core 1, SL1)

 

Make sure you get ~1/4 of the link speed n SL 0

 

Make sure you get close to 3/4 Gb/s on SL 1

 

Useful commands

  • kill openSM

 

 

References