Getting Started with Basic QoS Test (Strict Priority)
Before you start, make sure you understand the concepts, see Understanding Basic InfiniBand QoS.
For a basic test, you can have two hosts connected via an InfiniBand switch, sending RDMA traffic on two different service levels.
SL 0 - to be used for best effort traffic
SL 1 - to be used for high priority traffic
In the test we can use two CPU cores (core 0, core 1), each will run ib_write_bw on different SL, expecting to see the high priority traffic reaching maximum performance.
Configuration
Check the mapping between SL to VL use smpquery sl2vl (with the lid address for example)
$ sudo smpquery sl2vl -L 141 # SL2VL table: Lid 91 # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| ports: in 0, out 0: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
To change the mapping configuration use OpenSM config file:
$ diff /etc/opensm/opensm.conf /etc/opensm/opensm.conf.orig < qos TRUE > qos FALSE < qos_max_vls 2 < qos_high_limit 255 < qos_vlarb_high 1:192 < qos_vlarb_low 0:64 < qos_sl2vl 0,1 > qos_max_vls 0 > qos_high_limit -1 > qos_vlarb_high (null) > qos_vlarb_low (null) > qos_sl2vl (null)
Start the OpenSM
$ sudo opensm -g 0x98039b03009fcfd6 -F /etc/opensm/opensm.conf -B ------------------------------------------------- OpenSM 5.4.0.MLNX20190422.ed81811 Config file is `/etc/opensm/opensm.conf`: Reading Cached Option File: /etc/opensm/opensm.conf Loading Cached Option:qos = TRUE Loading Changed QoS Cached Option:qos_max_vls = 2 Loading Changed QoS Cached Option:qos_high_limit = 255 Loading Changed QoS Cached Option:qos_vlarb_low = 0:64 Loading Changed QoS Cached Option:qos_vlarb_high = 1:192 Loading Changed QoS Cached Option:qos_sl2vl = 0,1 Warning: Cached Option qos_sl2vl: < 16 VLs listed Command Line Arguments: Guid <0x98039b03009fcfd6> Daemon mode Log File: /var/log/opensm.log
Check sl2vl mapping table:
$ sudo smpquery sl2vl -L 141 # SL2VL table: Lid 141 # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| ports: in 0, out 0: | 0| 1| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
Check VL arbiter tables:
$ sudo smpquery vlarb 141 # VLArbitration tables: Lid 141 port 0 LowCap 8 HighCap 8 # Low priority VL Arbitration Table: VL : |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | WEIGHT: |0x40|0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | # High priority VL Arbitration Table: VL : |0x1 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | WEIGHT: |0xC0|0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
check SL port counters
$ sudo perfquery -X 141 1 # PortXmitDataSL counters: Lid 141 port 1 PortSelect:......................1 CounterSelect:...................0x0000 XmtDataSL0:......................3677098498 XmtDataSL1:......................2771713603 XmtDataSL2:......................0 XmtDataSL3:......................0 XmtDataSL4:......................0 XmtDataSL5:......................0 XmtDataSL6:......................0 XmtDataSL7:......................0 XmtDataSL8:......................0 XmtDataSL9:......................0 XmtDataSL10:.....................0 XmtDataSL11:.....................0 XmtDataSL12:.....................0 XmtDataSL13:.....................0 XmtDataSL14:.....................0 XmtDataSL15:.....................0
Run RDMA Traffic
Low priority traffic: (core 0, SL 0)
$ numactl --cpunodebind=0 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=0 -D 10
High priority traffic (core 1, SL1)
$ numactl --cpunodebind=1 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=1 -D 10
Make sure you get 0 Gb/s on SL 0 (no packet could be sent)
$ numactl --cpunodebind=0 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=0 -D 10 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF CQ Moderation : 100 Mtu : 4096[B] Link type : IB Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0x8f QPN 0xdd16 PSN 0x25f4a4 RKey 0x0e1848 VAddr 0x002b65b2130000 remote address: LID 0x8d QPN 0x02c6 PSN 0xdb2c00 RKey 0x17d997 VAddr 0x002b8263ed0000 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 0 0.000000 0.000000 0.000000 ---------------------------------------------------------------------------------------
Make sure you get close to 100 Gb/s on SL 1
$ numactl --cpunodebind=1 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=1 -D 10 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF CQ Moderation : 100 Mtu : 4096[B] Link type : IB Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 1104900 0.00 96.55 0.184149 ---------------------------------------------------------------------------------------
Getting Started with Basic QoS Test (WRR)
Before you start, make sure you understand the concepts, see Understanding Basic InfiniBand QoS.
Weighted Round Robin (WRR) arbitrer allows to split the possible bandwidth between high and low priority traffic, without possible starvation as it may happen in the strict priority example. With WRR, you may give different weight for every SL while the arbitrer will perform WRR between them.
For this basic test, you can have two hosts connected via an InfiniBand switch, sending RDMA traffic on two different service levels.
SL 0 - to be used for best effort traffic (1/4 of the traffic in this example)
SL 1 - to be used for high priority traffic (3/4 of the traffic in this example)
In the test we can use two CPU cores (core 0, core 1), each will run ib_write_bw on different SL, expecting to see the high priority traffic reaching 3/4 of the link speed.
Configuration
Check the mapping between SL to VL use smpquery sl2vl (with the lid address for example)
$ sudo smpquery sl2vl -L 141 # SL2VL table: Lid 91 # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| ports: in 0, out 0: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
To change the mapping configuration use OpenSM config file:
$ diff /etc/opensm/opensm.conf /etc/opensm/opensm.conf.orig < qos TRUE > qos FALSE < qos_max_vls 2 < qos_high_limit -1 < qos_vlarb_high 1:192 < qos_vlarb_low 0:64 < qos_sl2vl 0,1 > qos_max_vls 0 > qos_high_limit -1 > qos_vlarb_high (null) > qos_vlarb_low (null) > qos_sl2vl (null)
Start the OpenSM
$ sudo opensm -g 0x98039b03009fcfd6 -F /etc/opensm/opensm.conf -B ------------------------------------------------- OpenSM 5.4.0.MLNX20190422.ed81811 Config file is `/etc/opensm/opensm.conf`: Reading Cached Option File: /etc/opensm/opensm.conf Loading Cached Option:qos = TRUE Loading Changed QoS Cached Option:qos_max_vls = 2 Loading Changed QoS Cached Option:qos_vlarb_low = 0:64 Loading Changed QoS Cached Option:qos_vlarb_high = 1:192 Loading Changed QoS Cached Option:qos_sl2vl = 0,1 Warning: Cached Option qos_sl2vl: < 16 VLs listed Command Line Arguments: Guid <0x98039b03009fcfd6> Daemon mode Log File: /var/log/opensm.log
Check sl2vl mapping table:
$ sudo smpquery sl2vl -L 141 # SL2VL table: Lid 141 # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| ports: in 0, out 0: | 0| 1| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
Check VL arbiter tables:
$ sudo smpquery vlarb 141 # VLArbitration tables: Lid 141 port 0 LowCap 8 HighCap 8 # Low priority VL Arbitration Table: VL : |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | WEIGHT: |0x40|0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | # High priority VL Arbitration Table: VL : |0x1 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | WEIGHT: |0xC0|0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
To check SL port counters
$ sudo perfquery -X 141 1 # PortXmitDataSL counters: Lid 141 port 1 PortSelect:......................1 CounterSelect:...................0x0000 XmtDataSL0:......................3677098498 XmtDataSL1:......................2771713603 XmtDataSL2:......................0 XmtDataSL3:......................0 XmtDataSL4:......................0 XmtDataSL5:......................0 XmtDataSL6:......................0 XmtDataSL7:......................0 XmtDataSL8:......................0 XmtDataSL9:......................0 XmtDataSL10:.....................0 XmtDataSL11:.....................0 XmtDataSL12:.....................0 XmtDataSL13:.....................0 XmtDataSL14:.....................0 XmtDataSL15:.....................0
Run RDMA Traffic
Low priority traffic: (core 0, SL 0)
$ numactl --cpunodebind=0 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=0 -D 10
High priority traffic (core 1, SL1)
$ numactl --cpunodebind=1 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=1 -D 10
Make sure you get ~1/4 of the link speed n SL 0
$ numactl --cpunodebind=0 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=0 -D 10 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF CQ Moderation : 100 Mtu : 4096[B] Link type : IB Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0x8f QPN 0xdd16 PSN 0x25f4a4 RKey 0x0e1848 VAddr 0x002b65b2130000 remote address: LID 0x8d QPN 0x02c6 PSN 0xdb2c00 RKey 0x17d997 VAddr 0x002b8263ed0000 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 276300 0.00 24.14 0.046050 ---------------------------------------------------------------------------------------
Make sure you get close to 3/4 Gb/s on SL 1
$ numactl --cpunodebind=1 ib_write_bw -d mlx5_0 -i 1 --report_gbits -F --sl=1 -D 10 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF CQ Moderation : 100 Mtu : 4096[B] Link type : IB Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 828900 0.00 72.43 0.138149 ---------------------------------------------------------------------------------------
Useful commands
kill openSM
$ sudo kill $(ps -ef | grep opensm | grep root | awk '{print $2}') ; sudo opensm -g 0x98039b03009fcfd6 -F /etc/opensm/opensm.conf -B