InfiniBand is one of the most popular interconnection network standards in HPC. InfiniBand standard defines variety of routing algorithm to be configured via the Subnet Manager (SM). InfiniBand architecture supports deterministic routing. This may prevents packets from using alternative paths when the requested output port is busy, thus may lead to network performance degradation .
Adaptive Routing (AR) algorithms will dynamically select the route of a packet based on the network switches availability to deliver the packet. AR is controlled by the Subnet Manager (SM) while the switch is performing the routing decision to achieve lowest latency and maximum bandwidth accumulated over all pairs in the network achieving highest possible efficiency of the network.
This post discussed the needs for Adaptive Routing for HPC, and supply configuration examples and Performance Analysis.
Overview
What is Adaptive Routing?
Adaptive Routing is the network ability of the switches to dynamically select the best route for each packet based on queue size, latency and bandwidth available.
Configuration
Refer to the example here:
Validation
smparquery
Check that AR is enabled on a switch (using the switch lid), us the smparquery
Use for example, ibswitches to get the switch lid : In this example switch lid 136.
$ sudo ibswitches Switch : 0xb8599f0300fccaac ports 81 "MF0;thor-qm8700-2:MQM8700/U1" enhanced port 0 lid 136 lmc 0
Use amparquery ARInfo to get the AR information on the lid.
In this example AR is disabled on the switch.
$ sudo smparquery ARInfo 136 op = ARInfo, dest = 136, rest = -I- Getting ARInfo from lid=136 -I- ARInfo: AR Status.........................Disabled Is ARN Supported..................Yes Is FRN Supported..................Yes Is FR Supported...................Yes FR Enabled........................No RN Xmit Enabled...................No AR Sub Groups Active..............0 AR Groups Copy Supported..........15 Direction Num Supported...........4 AR Fallback.......................Enabled AR IS4 mode.......................No AR Glb Group......................Yes AR By SL Cap......................Yes AR By Transport Cap...............Yes AR Dynamic Cap Calc...............Yes AR Group Capability...............1792 AR Group Top......................0 AR Group Table Capability.........1 RN String Width Capability........3 AR Sub Groups Capability..........0x3 AR Version........................2 RN Version........................0 AR By SL..........................Disabled AR Enable By SL Mask..............0x00 AR By TransportDisable............0x0 AR Ageing Time Value..............0
ibdiagnet
Refer to the example here:
Add Comment