InfiniBand is one of the most popular interconnection network standards in HPC. InfiniBand standard defines variety of routing algorithm to be configured to via the Subnet Manager (SM). InfiniBand architecture supports deterministic routing. This may prevents packets from using alternative paths when the requested output port is busy, thus may lead to network performance degradation .
...
This post discussed the needs for Adaptive Routing for HPC, and supply configuration examples and Performance Analysis.
Table of Contents |
---|
Overview
What is Adaptive Routing?
Adaptive Routing is the network ability of the switches to dynamically select the best route for each packet based on queue size, latency and bandwidth available.
Configuration
Refer to the example here:
OpenSM Configuration for AR
Get the current OpenSM configuration (opensm -c <filename>), and check the following parameters.
Code Block |
---|
# Routing engine # Multiple routing engines can be specified separated by # commas so that specific ordering of routing algorithms will # be tried if earlier routing engines fail. # Supported engines: minhop, updn, dnup, file, ftree, lash, # dor, torus-2QoS, kdor-hc, dfsssp (EXPERIMENTAL), # sssp (EXPERIMENTAL), chain, # pqft (EXPERIMENTAL), # dfp, ar_updn, ar_ftree, ar_torus, ar_dor (AR) routing_engine (null) # AR SL mask - 16 bit bitmask indicating which SLs should be configured for AR ar_sl_mask 0xFFFF # Enable adaptive routing only to devices that support packet reordering. # When enabled, state in ARLFT entries for devices which does not support packet # reordering is set to static. # When disabled, ARLFT entries remains as determined by routing engine. enable_ar_by_device_cap TRUE # Advanced routing - Adaptive routing mode # Supported values: # 0 - Adaptive routing disabled. # 1 - Enable adaptive routing. # 2 - Enable adaptive routing with notifications. # 3 - Auto mode in which adaptive routing is determined by routing engine. ar_mode 3 # Advanced routing - Advanced routing engine # Supported values: # none - advanced routing is not enabled. # ar_lag - Ports groups are created out of "parallel" links. Links that connect the same pair of switches. # ar_tree - All the ports with minimal hops to destination are in the same group. Must run together with UPDN routing engine. # auto - the advanced routing engine is selected based on routing engine. Works for ar_updn, ar_ftree, ar_torus, ar_dor engines. adv_routing_engine auto # AR Transport mask - indicates which transport types are enabled for AR # Bit 0 = UD, Bit 1 = RC, Bit 2 = UC, Bit 3 = DCT, Bits 4-7 are reserved. ar_transport_mask 0x000A |
Enable AR
One option to enable AR is to set the routing engine to a routing engine supported by AR.
for example, chance opensm.conf as follows, in case you have fat-tree configuration:
Code Block |
---|
routing_engine ar_ftree |
In addition, set the root_guid config file
For example: set in opensm.conf:
Code Block |
---|
root_guid_file /etc/opensm/root_guid.cfg |
Create a file:
Code Block |
---|
# cat root_guid.cfg
0xb8599f0300fcca6c #thor-qm8700-S1
0xb8599f0300df8faa #thor-qm8790-S2
0xb8599f0300fcc2cc #rome-qm8700-S1 |
Enable AR per SL
By default all SLs are enabled with AR is enabled. To set AR enabled per SL, set the following bitmap parameter:
In the following example AR is disabled on SL0 and SL2.
Code Block |
---|
ar_sl_mask 0xFFFA # A=0x1010 SL1, SL3 enabled with AR |
UCX Support
FW 12.29.0356 added support for ooo_sl_mask per vport that’s now used by UCX.
With that, UCX has a way to request AR support. It could be done using UCX_IB_AR_ENABLE={ yes | no | try}. The default value is “try”.
UCX_IB_AR_ENABLE=yes strictly selects SL with AR, if there no SLs with AR, it fails UCX
UCX_IB_AR_ENABLE=try selects SL with AR, if there no SLs with AR, it selects the first available (or specified by UCX_IB_SL=<sl>)
UCX_IB_AR_ENABLE=no strictly selects SL without AR, if there no SLs without AR, it fails UCX
Specific SL still could be specified by user through UCX_IB_SL=[0..15]
If the UCX_IB_AR_ENABLE=yes/no was asked on old FW, it fails UCX since UCX is not able to detect ooo_sl_mask.
The feature is part of the UCX v1.10 release.
Validation
smparquery
Check that AR is enabled on a switch (using the switch lid), us the smparquery (supported on OFED 5.x)
Use for example, ibswitches to get the switch lids in your network.
Code Block |
---|
$ sudo ibswitches -C mlx5_1
Switch : 0xb8599f0300fccaac ports 81 "MF0;thor-qm8700-2:MQM8700/U1" enhanced port 0 lid 136 lmc 0
Switch : 0xb8599f0300fcca6c ports 41 "MF0;thor-qm8700-4:MQM8700/U1" enhanced port 0 lid 118 lmc 0
Switch : 0xb8599f0300df8faa ports 41 "Quantum Mellanox Technologies" base port 0 lid 375 lmc 0
Switch : 0xb8599f0300fcca4c ports 81 "MF0;thor-qm8700-3:MQM8700/U1" enhanced port 0 lid 105 lmc 0 |
Use amparquery ARInfo to get the AR information on the lid.
In this example AR is disabled on the switch.
Code Block |
---|
$ sudo smparquery ARInfo -L 136 -C mlx5_1
op = ARInfo, dest = 136, rest =
-I- Getting ARInfo from lid=136
-I- ARInfo:
AR Status.........................Enabled
Is ARN Supported..................Yes
Is FRN Supported..................Yes
Is FR Supported...................Yes
FR Enabled........................Yes
RN Xmit Enabled...................Yes
AR Sub Groups Active..............0
AR Groups Copy Supported..........15
Direction Num Supported...........4
AR Fallback.......................Enabled
AR IS4 mode.......................No
AR Glb Group......................Yes
AR By SL Cap......................Yes
AR By Transport Cap...............Yes
AR Dynamic Cap Calc...............Yes
AR Group Capability...............1792
AR Group Top......................1
AR Group Table Capability.........1
RN String Width Capability........3
AR Sub Groups Capability..........0x3
AR Version........................2
RN Version........................0
AR By SL Mask Enable..............0 (All SLs enabled)
AR SL Mask........................N/A
AR By TransportDisable............0x5
AR Ageing Time Value..............0 |
Note that the switch SL map is always enabled for all SLs. The SL mask configuration is done on the adapter.
Get the SL Mask Configuration on the hosts
ibdiagnet gathers the SL Mask on the adapters. look for the OOOSLMASK column in the ibdiagnet2.db_csv output. In this example, SL mask is set to 0xFFFA.
Code Block |
---|
cat /var/tmp/ibdiagnet2/ibdiagnet2.db_csv | grep -A 1 OOOSLMask
NodeGuid,PortGuid,PortNum, ... ,OOOSLMask,CapMsk2,FECActv,RetransActv
0x248a0703009c01e6,0x248a0703009c01e6,1, ... ,0xfffa,2,3,0
--
NodeGuid,PortGuid,PortNum, ... , ,HDRFECModeEnabled,OOOSLMask
0x248a0703009c01e6,0x248a0703009c01e6,1,,0x0000,0xfffa |
ibdiagnet
Refer to the example here.