References 


Basic information

1. Numa

How to check the mapping of the Network adapter to the NUMA node?

Each of the EPYC CPUs comes with 4 NUMA nodes (8 NUMAs on 2 sockets).


Here is an example from our Venus cluster.


$ lscpu                                                           
Architecture:          x86_64                                                                
CPU op-mode(s):        32-bit, 64-bit                                                        
Byte Order:            Little Endian                                                         
CPU(s):                64                                                                    
On-line CPU(s) list:   0-63                                                                  
Thread(s) per core:    1                                                                     
Core(s) per socket:    32                                                                    
Socket(s):             2                                                                     
NUMA node(s):          8                                                                     
Vendor ID:             AuthenticAMD                                                          
CPU family:            23                                                                    
Model:                 1                                                                     
Model name:            AMD EPYC 7551 32-Core Processor                                       
Stepping:              2                                                                     
CPU MHz:               2000.000                                                              
CPU max MHz:           2000.0000                                                             
CPU min MHz:           1200.0000                                                             
BogoMIPS:              3999.39                                                               
Virtualization:        AMD-V                                                                 
L1d cache:             32K                                                                   
L1i cache:             64K                                                                   
L2 cache:              512K                                                                  
L3 cache:              8192K                                                                 
NUMA node0 CPU(s):     0-7                                                                   
NUMA node1 CPU(s):     8-15                                                                  
NUMA node2 CPU(s):     16-23                                                                 
NUMA node3 CPU(s):     24-31                                                                 
NUMA node4 CPU(s):     32-39                                                                 
NUMA node5 CPU(s):     40-47                                                                 
NUMA node6 CPU(s):     48-55                                                                 
NUMA node7 CPU(s):     56-63 


Note that other OEMs may map differently the cores to CPUs. Here is another example from Dell R7415 with one socket equipped with EPYC 7551.

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
...
Thread(s) per core:    1
Core(s) per socket:    32
Socket(s):             1
NUMA node(s):          4
Vendor ID:             AuthenticAMD
...
Model name:            AMD EPYC 7551 32-Core Processor
...
CPU MHz:               1996.203
...
NUMA node0 CPU(s):     0,4,8,12,16,20,24,28
NUMA node1 CPU(s):     1,5,9,13,17,21,25,29
NUMA node2 CPU(s):     2,6,10,14,18,22,26,30
NUMA node3 CPU(s):     3,7,11,15,19,23,27,31
...


2. Check the adapter you have:

$ sudo lspci | grep Mel
21:00.0 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
21:00.1 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]


3. Check with Numa is connected to the network adapter:

There are several options to do it.

Using MST tools 


$ ibdev2netdev
mlx5_0 port 1 ==> ib0 (Up)
mlx5_1 port 1 ==> ib1 (Down)

$ sudo mst start
...

$ sudo mst status -v
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded
PCI devices:
------------
DEVICE_TYPE MST                             PCI      RDMA   NET     NUMA
ConnectX5(rev:0) /dev/mst/mt4121_pciconf0.1 21:00.1  mlx5_1 net-ib1  2
 
ConnectX5(rev:0) /dev/mst/mt4121_pciconf0   21:00.0  mlx5_0 net-ib0  2


 Using the net device:

$ cat /sys/class/net/ib0/device/numa_node

2


Note: Knowing which numa is connected to the network adapter is important for performance tuning.


Another option is to use lstopo-no-graphics 

You can see under numa 2, the two port adapter.

$ lstopo-no-graphics 
Machine (256GB total)
  Package L#0
    NUMANode L#0 (P#0 32GB)
      L3 L#0 (8192KB)
        L2 L#0 (512KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0)
        L2 L#1 (512KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1)
        L2 L#2 (512KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 (P#2)
        L2 L#3 (512KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 (P#3)
      L3 L#1 (8192KB)
        L2 L#4 (512KB) + L1d L#4 (32KB) + L1i L#4 (64KB) + Core L#4 + PU L#4 (P#4)
        L2 L#5 (512KB) + L1d L#5 (32KB) + L1i L#5 (64KB) + Core L#5 + PU L#5 (P#5)
        L2 L#6 (512KB) + L1d L#6 (32KB) + L1i L#6 (64KB) + Core L#6 + PU L#6 (P#6)
        L2 L#7 (512KB) + L1d L#7 (32KB) + L1i L#7 (64KB) + Core L#7 + PU L#7 (P#7)
      HostBridge L#0
        PCIBridge
          PCIBridge
            PCI 1a03:2000
              GPU L#0 "card0"
              GPU L#1 "controlD64"
    NUMANode L#1 (P#1 32GB)
      L3 L#2 (8192KB)
        L2 L#8 (512KB) + L1d L#8 (32KB) + L1i L#8 (64KB) + Core L#8 + PU L#8 (P#8)
        L2 L#9 (512KB) + L1d L#9 (32KB) + L1i L#9 (64KB) + Core L#9 + PU L#9 (P#9)
        L2 L#10 (512KB) + L1d L#10 (32KB) + L1i L#10 (64KB) + Core L#10 + PU L#10 (P#10)
        L2 L#11 (512KB) + L1d L#11 (32KB) + L1i L#11 (64KB) + Core L#11 + PU L#11 (P#11)
      L3 L#3 (8192KB)
        L2 L#12 (512KB) + L1d L#12 (32KB) + L1i L#12 (64KB) + Core L#12 + PU L#12 (P#12)
        L2 L#13 (512KB) + L1d L#13 (32KB) + L1i L#13 (64KB) + Core L#13 + PU L#13 (P#13)
        L2 L#14 (512KB) + L1d L#14 (32KB) + L1i L#14 (64KB) + Core L#14 + PU L#14 (P#14)
        L2 L#15 (512KB) + L1d L#15 (32KB) + L1i L#15 (64KB) + Core L#15 + PU L#15 (P#15)
      HostBridge L#3
        PCIBridge
          PCI 8086:1521
            Net L#2 "eth0"
          PCI 8086:1521
            Net L#3 "eno2"
          PCI 8086:1521
            Net L#4 "eno3"
          PCI 8086:1521
            Net L#5 "eno4"
        PCIBridge
          PCI 1022:7901
            Block(Disk) L#6 "sda"
    NUMANode L#2 (P#2 32GB)
      L3 L#4 (8192KB)
        L2 L#16 (512KB) + L1d L#16 (32KB) + L1i L#16 (64KB) + Core L#16 + PU L#16 (P#16)
        L2 L#17 (512KB) + L1d L#17 (32KB) + L1i L#17 (64KB) + Core L#17 + PU L#17 (P#17)
        L2 L#18 (512KB) + L1d L#18 (32KB) + L1i L#18 (64KB) + Core L#18 + PU L#18 (P#18)
        L2 L#19 (512KB) + L1d L#19 (32KB) + L1i L#19 (64KB) + Core L#19 + PU L#19 (P#19)
      L3 L#5 (8192KB)
        L2 L#20 (512KB) + L1d L#20 (32KB) + L1i L#20 (64KB) + Core L#20 + PU L#20 (P#20)
        L2 L#21 (512KB) + L1d L#21 (32KB) + L1i L#21 (64KB) + Core L#21 + PU L#21 (P#21)
        L2 L#22 (512KB) + L1d L#22 (32KB) + L1i L#22 (64KB) + Core L#22 + PU L#22 (P#22)
        L2 L#23 (512KB) + L1d L#23 (32KB) + L1i L#23 (64KB) + Core L#23 + PU L#23 (P#23)
      HostBridge L#6
        PCIBridge
          PCI 15b3:1019
            Net L#7 "ib0_mlx5"
            OpenFabrics L#8 "mlx5_0"
          PCI 15b3:1019
            Net L#9 "ib0"
            OpenFabrics L#10 "mlx5_1"
    NUMANode L#3 (P#3 32GB)
      L3 L#6 (8192KB)