InfiniBand is one of the most popular interconnection network standards in HPC. InfiniBand standard defines variety of routing algorithm to be configured to via the Subnet Manager (SM). InfiniBand architecture supports deterministic routing. This may prevents packets from using alternative paths when the requested output port is busy, thus may lead to network performance degradation .
...
This post discussed the needs for Adaptive Routing for HPC, and supply configuration examples and Performance Analysis.
Table of Contents |
---|
What is Adaptive Routing?
Adaptive Routing is the network ability of the switches to dynamically select the best route for each packet based on queue size, latency and bandwidth available.
Validation
smparquery
Check that AR is enabled on a switch (using the switch lid), us the smparquery
Use for example, ibswitches to get the switch lid : In this example switch lid 136.
Code Block |
---|
$ sudo ibswitches
Switch : 0xb8599f0300fccaac ports 81 "MF0;thor-qm8700-2:MQM8700/U1" enhanced port 0 lid 136 lmc 0 |
Use amparquery ARInfo to get the AR information on the lid.
In this example AR is disabled on the switch.
Code Block |
---|
$ sudo smparquery ARInfo 136
op = ARInfo, dest = 136, rest =
-I- Getting ARInfo from lid=136
-I- ARInfo:
AR Status.........................Disabled
Is ARN Supported..................Yes
Is FRN Supported..................Yes
Is FR Supported...................Yes
FR Enabled........................No
RN Xmit Enabled...................No
AR Sub Groups Active..............0
AR Groups Copy Supported..........15
Direction Num Supported...........4
AR Fallback.......................Enabled
AR IS4 mode.......................No
AR Glb Group......................Yes
AR By SL Cap......................Yes
AR By Transport Cap...............Yes
AR Dynamic Cap Calc...............Yes
AR Group Capability...............1792
AR Group Top......................0
AR Group Table Capability.........1
RN String Width Capability........3
AR Sub Groups Capability..........0x3
AR Version........................2
RN Version........................0
AR By SL..........................Disabled
AR Enable By SL Mask..............0x00
AR By TransportDisable............0x0
AR Ageing Time Value..............0 |
ibdiagnet
Refer to the example here: