Configure E2E Test in L3 Switching Environment

Creating E2E connectivity via L3 switches requires additional network and device configurations beyond those needed for L2 switching. This section describes the additional configuration requirements, how to obtain them, and how to configure the E2E test to utilize them.

A Layer 3 switch combines the functionalities of both a switch and a router. It serves as a switch to connect devices within the same subnet or virtual LAN, while also incorporating IP routing capabilities to function as a router. This allows it to support routing protocols, inspect incoming packets, and make routing decisions based on source and destination addresses. Layer 3 switch is commonly used for routing packets between different VLANs.

For example, Cluster07 in Intel Developer Cloud (IDC) has Arista switches configured as Layer 3 switches. This configuration requires assigning an IP address to each Gaudi port and using these addresses for communication between ports. Each Arista port is configured as its own subnet with a netmask of /30 or 255.255.255.252. This setup allows for four addresses in total, following the standard configuration - a broadcast address, a network address, and two node addresses. The Arista port itself is assigned the highest node address within the subnet. It is expected that the device connected to the port will use the other available node address. For example, the network 10.210.8.120/30 includes the following:

  • Network address: 10.210.8.120

  • Broadcast address: 10.210.8.123

  • Host IP range: 10.210.8.121 - 10.210.8.122

  • Arista port: 10.210.8.122

You can use https://www.calculator.net/ip-subnet-calculator.html for performing the calculation.

Prerequisites

If not already installed, make sure to have the latest Intel Gaudi software stack installed as detailed in the Installation Guide.

Note

If you are not using the latest Intel Gaudi software stack, make sure to install the correct version.

Configuration

Obtaining Arista Port Information

On the Arista host, load the habanalabs driver and bring up its interfaces:

  1. Load habanalabs drivers:

    1. Unload the drivers in this order - habanalabs, habanalabs_cn, habanalabs_en and habanalabs_ib:

    sudo modprobe -r <driver name>
    
    1. Load the drivers in this order - habanalabs_en and habanalabs_ib, habanalabs_cn, habanalabs:

    sudo modprobe <driver name>
    
  2. Bring up the interfaces:

    /opt/habanalabs/qual/gaudi3/bin/manage_network_ifs.sh --up
    
  3. Assuming eth5 is the interface in which you want to connect to, run sudo lldpcli:

    sudo lldpcli
    
    [lldpcli] # show neighbors ports eth5
    -------------------------------------------------------------------------------
    LLDP neighbors:
    -------------------------------------------------------------------------------
    Interface:    eth5, via: LLDP, RID: 15, Time: 0 day, 00:00:06
    Chassis:
       ChassisID:    mac 94:8e:d3:c8:52:69
       SysName:      2b29u25n.idc9.habana-labs.com
       SysDescr:     Arista Networks EOS version 4.26.4M running on an Arista
    Networks DCS-7060DX4-32
       MgmtIP:       10.210.255.115
       Capability:   Bridge, on
       Capability:   Router, on
    Port:
       PortID:       ifname Ethernet22/7
       PortDescr:    no-alert 10.210.8.122/30
       TTL:          120
    
    -------------------------------------------------------------------------------
    

Obtaining Gaudi Port IP Information

Note

The numbers used in this section are examples only.

The Arista port is configured to show, in addition to other details, the following information:

  • MAC address: 94:8e:d3:c8:52:69

  • Port IP and netmask: 10.210.8.122/30

This information is used to determine the eth5 IP address (10.210.8.122/30) and the destination MAC address (94:8e:d3:c8:52:69). For example, to connect a device with port named eth5 to the above port/net, use address 10.210.8.121 as follows:

sudo ip addr add 10.210.8.121/30 dev eth5
sudo ifconfig eth5 up

The connectivity with Arista can be verified using ping 10.210.8.122. This provides another way to determine the Arista MAC address will be used as the destination MAC address in your configuration. The address can be viewed using ARP:

arp
Address                  HWtype  HWaddress           Flags Mask            Iface
10.210.8.122             ether   94:8e:d3:c8:52:69   C                     eth5

To connect to a peer Gaudi port or an entire subnet, add the appropriate entry to the routing table:

sudo ip route add 10.210.0.0/16 via 10.210.8.122 dev eth5

Creating IP Connectivity Between the Peer Ports

The below steps, assume the following setup:

../../_images/Assumed_Setup.png

On host 0, perform the following steps:

  1. Load habanalabs drivers:

    1. Unload the drivers in this order - habanalabs, habanalabs_cn, habanalabs_en and habanalabs_ib:

    sudo modprobe -r <driver name>
    
    1. Load the drivers in this order - habanalabs_en and habanalabs_ib, habanalabs_cn, habanalabs:

    sudo modprobe <driver name>
    
  2. Bring up the interfaces:

    /opt/habanalabs/qual/gaudi3/bin/manage_network_ifs.sh --up
    
  3. Run lldpcli show neighbors ports eth5 to obtain the ChassisID: mac 94:8e:d3:c9:88:2d and PortDescr: no-alert 10.210.8.122/30.

  4. Assign the IP address (10.210.8.121/30):

    sudo ip addr add 10.210.8.121/30 dev eth5`
    
  5. To check if the IP is assigned successfully, run the following command:

    ping 10.210.8.122
    
  6. Add a route to all Arista subnets. This can be done per subnet in case there is more than one port:

    sudo ip route add 10.210.0.0/16 via 10.210.8.122 dev eth5
    

On host 1, perform the following steps:

  1. Load habanalabs drivers:

    1. Unload the drivers in this order - habanalabs, habanalabs_cn, habanalabs_en and habanalabs_ib:

    sudo modprobe -r <driver name>
    
    1. Load the drivers in this order - habanalabs_en and habanalabs_ib, habanalabs_cn, habanalabs:

    sudo modprobe <driver name>
    
  2. Bring up the interfaces:

    /opt/habanalabs/qual/gaudi3/bin/manage_network_ifs.sh --up
    
  3. Run lldpcli show neighbors ports eth6 to obtain the ChassisID: mac 94:8e:d3:c8:52:69 and PortDescr: no-alert 10.210.15.174/30.

  4. Assign the IP address (10.210.15.173/30) to the eth6 interface:

  sudo ip addr add 10.210.15.173/30 dev eth6

To check if the IP is assigned successfully, run the following command:
ping 10.210.15.174
  1. Add a route to all Arista subnets. This can be done per subnet in case there is more than one port:

    sudo ip route add 10.210.0.0/16 via 10.210.15.174 dev eth6
    
  2. Ping should now work between Host0:Port5 and Host1:Port6: To verify that the assigned IP address is routed correctly to the other host, run the following commands:

    1. On Host 0, run:

      ping 10.210.15.173
      
    1. On Host1, run:

      ping 10.210.8.121
      

Generating a Gaudinet.json Example

This example assumes a reference network design with a three-tier leaf-spine topology. Each Gaudi 3 server is connected to all three tiers via its 6 QSPF-DD ports: ports 1&4 to ply0, ports 2&5 to ply1 and ports 3&6 to ply2.

../../_images/gaudinet_image.png

On the Gaudi 3 server side, the /etc/gaudinet.json file is required. This file should include the Gaudi NIC MAC address, IP address, subnet mask, and gateway MAC address for each of the 24 NICs in the following format:

{
"NIC_NET_CONFIG": [
{
"NIC_MAC": "b0:fd:0b:d9:22:4d",
"NIC_IP": "10.208.0.1",
"SUBNET_MASK": "255.255.255.252",
"GATEWAY_MAC": "e8:b2:65:79:b8:38"
},
{
"NIC_MAC": "b0:fd:0b:d9:22:5b",
"NIC_IP": "10.209.0.1",
"SUBNET_MASK": "255.255.255.252",
"GATEWAY_MAC": "ec:8a:48:43:c9:81"
},
{
"NIC_MAC": "b0:fd:0b:d9:22:5c",
"NIC_IP": "10.210.0.1",
"SUBNET_MASK": "255.255.255.252",
"GATEWAY_MAC": "ec:8a:48:44:3b:41"
},
…
]
}

To generate the :code:`gaudinet.json` file, perform the following steps:

  1. From hl-smi, retrieve the mapping of Gaudi module ID to bus ID by running the following command:

    hl-smi -Q module_id,bus_id -f csv,noheader
    

    The first column in the output is the Gaudi module ID, while the second column is the bus ID as shown in the following example:

    6, 0000:9a:00.0
    2, 0000:33:00.0
    3, 0000:34:00.0
    7, 0000:9b:00.0
    4, 0000:b3:00.0
    0, 0000:4d:00.0
    1, 0000:4e:00.0
    5, 0000:b4:00.0
    
  2. Obtain three MAC addresses (one address for each ply) for each Gaudi module:

  1. Replace the bus_id in the following command with the bus_id found in Step 1:

cat /sys/bus/pci/drivers/habanalabs/{bus_id}/net/\*/address \| sort
  1. To get the three MAC addresses for Gaudi module 0 in the above example, run the following:

cat /sys/bus/pci/drivers/habanalabs/0000:4d:00.0/net/\*/address \| sort
b0:fd:0b:d9:22:4d #MAC for ply0
b0:fd:0b:d9:22:5b #MAC for ply1
b0:fd:0b:d9:22:5c #MAC for ply2

Repeat the steps for Gaudi modules 1 through 7 to generate a list comprising of 24 lines in total.

  1. Assign the NIC IP addresses. Use the following to determine the IP address format:

    10.(starting_second_octect+ply_id).(leaf_switch_id).(1+port_seq_idx4)/30
    

    The following table describes each parameter included in an IP address:

    Parameter

    Description

    starting_second_octet

    User’s choice

    ply_id

    0, 1, 2

    leaf_switch_id

    ID of the connected leaf switch

    port_seq_id

    The sequence number of the 100Gb/s interfaces across all servers connected to the same leaf switch. Each server has 8 interfaces connected to each of the 3 leaf switches, and the current server may not be the first server connected to a leaf switch.

    /30 netmask

    Subnet mask is 255.255.255.252 for point-to-point /30 network.

    Examples 1: For the first server (connected to the lowest numbered switch port facing Gaudi servers) connected to the first leaf switch, the IP address is assigned to the NIC in Gaudi module 0 for ply0 as 10.(208+0).(0).(1+0x4)/30 = 10.208.0.1/30.

    Examples 2: For the second server connected to the second leaf switch, the IP address is assigned to the NIC in Gaudi module 2 for ply1 as 10.(208+1).(1).(1+10x4)/30 = 10.209.1.41/30. Here, port_seq_id is 10 because this is the second server connected to the switch whose first 8 Gaudis facing interfaces are connected to the first server. This NIC is in Gaudi module 2, so 8+2 = 10.

  1. Pull the gateway MAC address which is the MAC address of the connected switch. It can be pulled either from the switch or the lldpctl showneighbor command on the server.

Priority Flow Control (PFC)

If degraded performance is observed, check the switch counters for dropped packets. If packet loss is detected, enabling PFC (Priority Flow Control) may be beneficial. Examples of commands and configurations are included in Arista EOS.

To check packet loss on the switch, use the following command:

show interface counter queue.

PFC/Buffer Configuration in Switch

To configure PFC/buffer in the switch, perform the following steps:

  1. Add the following lines to the switch global configuration to adjust the buffer settings. Note that the threshold and headroom values provided are specific to the Arista 7060-DX4-32 and may vary for other switch models.

    platform trident mmu queue profile PFC_Profile
    ingress threshold 1
    ingress headroom 165100
    platform trident mmu queue profile PFC_Profile apply
    
  2. For each interface, add the following lines to enable PFC in its configuration:

    qos trust dscp
    priority-flow-control on
    priority-flow-control priority 0 no-drop
    priority-flow-control priority 1 no-drop
    priority-flow-control priority 2 no-drop
    priority-flow-control priority 3 no-drop
    uc-tx-queue 2
    no priority
    uc-tx-queue 3
    no priority
    

Examples of a full interface configuration:

Example 1:

interface Ethernet1/1
mtu 9198
speed 400g-8
error-correction encoding reed-solomon
no switchport
ip address 10.208.128.1/30
qos trust dscp
priority-flow-control on
priority-flow-control priority 0 no-drop
priority-flow-control priority 1 no-drop
priority-flow-control priority 2 no-drop
priority-flow-control priority 3 no-drop
uc-tx-queue 2
   no priority
uc-tx-queue 3
   no priority

Example 2:

interface Ethernet2/1
mtu 9198
speed 100g-2
error-correction encoding reed-solomon
no switchport
ip address 10.208.0.2/30
qos trust dscp
priority-flow-control on
priority-flow-control priority 0 no-drop
priority-flow-control priority 1 no-drop
priority-flow-control priority 2 no-drop
priority-flow-control priority 3 no-drop
uc-tx-queue 2
   no priority
uc-tx-queue 3
   no priority

Enable PFC in Gaudi Server

PFC should also be enabled in servers for the flow control mechanism to function effectively.

  1. To enable PFC, run the following command:

    /opt/habanalabs/qual/gaudi3/bin/manage_network_ifs.sh --set-pfc
    
  2. To verify that the PFC is enabled, run the following command:

    /opt/habanalabs/qual/gaudi3/bin/manage_network_ifs.sh -check-pfc
    

You should receive output similar to the following, indicating enabled=15.

check_pfc 'enp0n0'

enabled=15

check_pfc 'enp0n1'

enabled=15

Disable PFC in Gaudi Server

To disable PFC, run the following command:

/opt/habanalabs/qual/gaudi3/bin/manage_network_ifs.sh --unset-pfc

You should receive output similar to the following, indicating enabled=0.

check_pfc 'enp0n0'

enabled=0

check_pfc 'enp0n1'

enabled=0