Expected Switch Configuration

Please note the following two priorities are used in Gaudi 3. They are set automatically by the driver without user/application intervention:

  • Priority 2 for data traffic (RDMA WRITE packets) which is mapped to DSCP 24

  • Priority 3 for ACK traffic which is mapped to DSCP 16

The following configuration is required:

Feature

Description

Settings

PFC with priority

Enables priority flow control with priority 0 in the switches.

  • interface Ethernet<>

  • priority-flow-control on

  • priority-flow-control priority 0 no-drop

  • priority-flow-control priority 1 no-drop

  • priority-flow-control priority 2 no-drop

  • priority-flow-control priority 3 no-drop

Headroom settings

Headroom pool buffers are exclusively used for in flight messages. Increasing this headroom size essentially allows the switch to store packets longer instead of dropping packets when we run out of pool buffers. Each cell in Arista switch is 254 bytes and we typically allocate 250 cells per interface which approximates to 64000 bytes.

  • platform trident mmu queue profile PFC_Profile

  • ingress headroom 64000

Alpha value

This value corresponds to the “ingress threshold <>” value mentioned in the switch spec. Increasing this threshold value implies to the switch to use more amount of buffer from the shared pool for the congested queue. Allocating a lot of headroom would imply less space for shared pool.

  • platform trident mmu queue profile PFC_Profile

  • ingress threshold 1

ECMP and recommended hash function

Given a leaf switch is connected to several spine switches, we need a well distributed hash function to thoroughly distribute the packet across all spine switches.

Default ECMP hash function that uses five tuples including source ip, destination ip, source port, destination port and ingress port.