Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

InfiniBand is one of the most popular interconnection network standards in HPC. InfiniBand standard defines variety of routing algorithm to be configured via the Subnet Manager (SM). InfiniBand architecture supports deterministic routing. This may prevents packets from using alternative paths when the requested output port is busy, thus may lead to network performance degradation .

Adaptive Routing (AR) algorithms will dynamically select the route of a packet based on the network switches availability to deliver the packet. AR is controlled by the Subnet Manager (SM) while the switch is performing the routing decision to achieve lowest latency and maximum bandwidth accumulated over all pairs in the network achieving highest possible efficiency of the network.

This post discussed the needs for Adaptive Routing for HPC, and supply configuration examples and Performance Analysis.

Table of Contents

Overview

What is Adaptive Routing?

Adaptive Routing is the network ability of the switches to dynamically select the best route for each packet based on queue size, latency and bandwidth available.

Configuration

Refer to the example here:

OpenSM Configuration for AR

Get the current OpenSM configuration (opensm -c <filename>), and check the following parameters.

Code Block
# Routing engine
# Multiple routing engines can be specified separated by
# commas so that specific ordering of routing algorithms will
# be tried if earlier routing engines fail.
# Supported engines: minhop, updn, dnup, file, ftree, lash,
#    dor, torus-2QoS, kdor-hc, dfsssp (EXPERIMENTAL),
#    sssp (EXPERIMENTAL), chain,
#    pqft (EXPERIMENTAL),
#    dfp, ar_updn, ar_ftree, ar_torus, ar_dor (AR)
routing_engine (null)

# AR SL mask - 16 bit bitmask indicating which SLs should be configured for AR
ar_sl_mask 0xFFFF

# Enable adaptive routing only to devices that support packet reordering.
# When enabled, state in ARLFT entries for devices which does not support packet
# reordering is set to static.
# When disabled, ARLFT entries remains as determined by routing engine.
enable_ar_by_device_cap TRUE

# Advanced routing - Adaptive routing mode
# Supported values:
# 0 - Adaptive routing disabled.
# 1 - Enable adaptive routing.
# 2 - Enable adaptive routing with notifications.
# 3 - Auto mode in which adaptive routing is determined by routing engine.
ar_mode 3

# Advanced routing - Advanced routing engine
# Supported values:
# none - advanced routing is not enabled.
# ar_lag - Ports groups are created out of "parallel" links. Links that connect the same pair of switches.
# ar_tree - All the ports with minimal hops to destination are in the same group. Must run together with UPDN routing engine.
# auto - the advanced routing engine is selected based on routing engine. Works for ar_updn, ar_ftree, ar_torus, ar_dor engines.
adv_routing_engine auto

# AR Transport mask - indicates which transport types are enabled for AR
# Bit 0 = UD, Bit 1 = RC, Bit 2 = UC, Bit 3 = DCT, Bits 4-7 are reserved.
ar_transport_mask 0x000A

Enable AR

One option to enable AR is to set the routing engine to a routing engine supported by AR.

for example, chance opensm.conf as follows, in case you have fat-tree configuration:

Code Block
routing_engine ar_ftree

In addition, set the root_guid config file

For example: set in opensm.conf:

Code Block
root_guid_file /etc/opensm/root_guid.cfg

Create a file:

Code Block
# cat root_guid.cfg
0xb8599f0300fcca6c #thor-qm8700-S1
0xb8599f0300df8faa #thor-qm8790-S2
0xb8599f0300fcc2cc #rome-qm8700-S1

Enable AR per SL

By default all SLs are enabled with AR is enabled. To set AR enabled per SL, set the following bitmap parameter:

In the following example AR is disabled on SL0 and SL2.

Code Block
ar_sl_mask 0xFFFA # A=0x1010 SL1, SL3 enabled with AR

UCX Support

FW 12.29.0356 added support for ooo_sl_mask per vport that’s now used by UCX.

With that, UCX has a way to request AR support. It could be done using UCX_IB_AR_ENABLE={ yes | no | try}. The default value is “try”.

  • UCX_IB_AR_ENABLE=yes strictly selects SL with AR, if there no SLs with AR, it fails UCX

  • UCX_IB_AR_ENABLE=try selects SL with AR, if there no SLs with AR, it selects the first available (or specified by UCX_IB_SL=<sl>)

  • UCX_IB_AR_ENABLE=no strictly selects SL without AR, if there no SLs without AR, it fails UCX

 

Specific SL still could be specified by user through UCX_IB_SL=[0..15]

If the UCX_IB_AR_ENABLE=yes/no was asked on old FW, it fails UCX since UCX is not able to detect ooo_sl_mask.

The feature is part of the UCX v1.10 release.

Validation

smparquery

Check that AR is enabled on a switch (using the switch lid), us the smparquery (supported on OFED 5.x)

Use for example, ibswitches to get the switch lids in your network.

Code Block
$ sudo ibswitches -C mlx5_1
Switch	: 0xb8599f0300fccaac ports 81 "MF0;thor-qm8700-2:MQM8700/U1" enhanced port 0 lid 136 lmc 0
Switch	: 0xb8599f0300fcca6c ports 41 "MF0;thor-qm8700-4:MQM8700/U1" enhanced port 0 lid 118 lmc 0
Switch	: 0xb8599f0300df8faa ports 41 "Quantum Mellanox Technologies" base port 0 lid 375 lmc 0
Switch	: 0xb8599f0300fcca4c ports 81 "MF0;thor-qm8700-3:MQM8700/U1" enhanced port 0 lid 105 lmc 0

Use amparquery ARInfo to get the AR information on the lid.

In this example AR is disabled on the switch.

Code Block
$ sudo smparquery ARInfo -L 136 -C mlx5_1 
op = ARInfo, dest = 136, rest = 
-I- Getting ARInfo from lid=136
-I- ARInfo:
AR Status.........................Enabled
Is ARN Supported..................Yes
Is FRN Supported..................Yes
Is FR Supported...................Yes
FR Enabled........................Yes
RN Xmit Enabled...................Yes
AR Sub Groups Active..............0
AR Groups Copy Supported..........15
Direction Num Supported...........4
AR Fallback.......................Enabled
AR IS4 mode.......................No
AR Glb Group......................Yes
AR By SL Cap......................Yes
AR By Transport Cap...............Yes
AR Dynamic Cap Calc...............Yes
AR Group Capability...............1792
AR Group Top......................1
AR Group Table Capability.........1
RN String Width Capability........3
AR Sub Groups Capability..........0x3
AR Version........................2
RN Version........................0
AR By SL Mask Enable..............0 (All SLs enabled)
AR SL Mask........................N/A
AR By TransportDisable............0x5
AR Ageing Time Value..............0

Note that the switch SL map is always enabled for all SLs. The SL mask configuration is done on the adapter.

Get the SL Mask Configuration on the hosts

ibdiagnet gathers the SL Mask on the adapters. look for the OOOSLMASK column in the ibdiagnet2.db_csv output. In this example, SL mask is set to 0xFFFA.

Code Block
cat /var/tmp/ibdiagnet2/ibdiagnet2.db_csv | grep -A 1 OOOSLMask
NodeGuid,PortGuid,PortNum, ... ,OOOSLMask,CapMsk2,FECActv,RetransActv
0x248a0703009c01e6,0x248a0703009c01e6,1, ... ,0xfffa,2,3,0
--
NodeGuid,PortGuid,PortNum, ... , ,HDRFECModeEnabled,OOOSLMask
0x248a0703009c01e6,0x248a0703009c01e6,1,,0x0000,0xfffa

ibdiagnet

Refer to the example here.