Testing RDMA Message rate

This post will help you to test RDMA Message rate between two nodes.

Setup Example: Two nodes connected with InfiniBand HDR/HDR100 link

You will need to create a script to run on full PPN, or 50% PPN for best performance.

Things to check:

  • Number of PPN on the node

  • Numa locality for the adapter

 

Server side:

Run the following script for example, on 32 core node:

  • Message size 2 (with WQE inline enable)

  • Run ib_write_bw server instance on each core using -p

for i in {0..31} do numactl --physcpubind=$i ib_write_bw -s 2 -d mlx5_0 -i 1 --report_gbits -F -D 20 --inline_size=2 -c RC -p $((10000+$i)) --output=message_rate & ~ done

Client side:

Run the following script for example, on 32 core node:

  • Message size 2 (with WQE inline enable)

  • Run ib_write_bw server instance on each core using -p

  • Collect the message rate results to a file and sum the numbers

rm -f file.out for i in {0..31} do numactl --physcpubind=$i ib_write_bw -s 2 -d mlx5_0 -i 1 --report_gbits -F thor001 -D 20 --output=message_rate --inline_size=2 -c RC -p $((10000+$i)) | awk '{ print $1 }' >> file.out & ~ done ~ wait cat file.out | awk '{ SUM += $1} END { print "RDMA Message Rate =" SUM }'

 

Note: for HDR Socket Direct, you will need to map the network device to the right numa.

This is an example from our Helios cluster.

 

Server side:

# Using mlx5_0 device for the first 20 cores for i in {0..19} do numactl --physcpubind=$i ib_write_bw -s 2 -d mlx5_0 -i 1 --report_gbits -F -D 20 --inline_size=2 -c RC -p $((10000+$i)) --output=message_rate & done # Using mlx5_2 device for the second 20 cores for i in {20..39} do numactl --physcpubind=$i ib_write_bw -s 2 -d mlx5_2 -i 1 --report_gbits -F -D 20 --inline_size=2 -c RC -p $((10000+$i)) --output=message_rate & done

 

Client side:

Â