Monitoring Switch and Gaudi 3 Accelerator
Monitoring Switch and Gaudi 3 Accelerator¶
The following are the steps required to monitor the switch:
Clear counters before each experiment
Ingress and egress port packet drops/discards
Tx/Rx PFC pauses including different priority lanes (0-8)
Queue length size
PFC on/off
Headroom settings
Hash function
Only priority 0 must be enabled since we only use priority 0
Check for the following settings in switch:
platform trident mmu queue profile PFC_Profile
ingress threshold 1
ingress headroom 64000
interface Ethernet1/1-32/7
priority-flow-control on
priority-flow-control priority 0 no-drop
priority-flow-control priority 1 no-drop
priority-flow-control priority 2 no-drop
priority-flow-control priority 3 no-drop
uc-tx-queue 2 no priority
uc-tx-queue 3 no priority
platform trident mmu queue profile PFC_Profile apply
The following are the steps required to monitor each Gaudi:
Clear counters as show below:
for i in {0..7} ; do echo 1 \| sudo tee ${HL_DBG_PATH}/"${HL_DBG_NAME}${i}"/nic_reset_cnt ; done
Monitor Psn_out_of_range and psn_out_of_sequence in all Ethernet interfaces.
Dmesg for port toggling/reset.