Log Analysis

The log analysis is used to examine hl_qual reports and other log files (such as dmesg, UART, samples, etc.) to detect known hardware issues, recommend troubleshooting steps, and provide insights from the test reports. The analysis process involves iterating through the test report directories and applying analyzer logic to detect anomalies and issues.

Switches and Usage

The diag_tool.py script analyzes the test reports generated. The following is a run command example. This command generates a diagnostic report in the terminal and saves it to the specified CSV output file. To generate and execute a test plan, see instructions in Test Plan Automation.

python diag_tool.py -i <test_plan_result_file> -o <output_file>

Options:

Option

Description

-i <path_to_file>

Path to the test plan run logs. Includes data required for the analysis. This path should be identical to the output path set together with the --exec run_test_plan switch when running the diag_tool_automation.py script. For more details, see Switches and Usage.

-o <path_to_file>

Path to the output file. This file holds the tool report in CSV format.

After running the script, a textual report will appear in your terminal.

Example textual report:

===================================== Functional_Extreme_Serdes Report =====================================
========= OAM 1 Analysis  =========
- The device encountered a double error during the test.
        This could indicate a memory-related issue.
        Further troubleshooting is required.
===================================== Functional_High_Serdes Report =====================================
========= OAM 0 Analysis  =========

If the -o option is specified, the tool generates a CSV output file that contains the complete tool analysis, with each test represented as a row and each device as a column, making it easier to read the full report in a structured format:

Test/Device

OAM 0

OAM1

OAM2

OAM3

OAM4

OAM5

OAM6

OAM7

Functional_Extreme_Serdes

The device encountered a double error …

Functional_High_Serdes

Events Detection

The log analysis is an event-driven diagnostic system that detects test failures caused by system hardware events or device hardware events. Each pre-defined event is linked to a specific analyzer, which is responsible for detecting the following events in the system:

Event

Description

Thermal Event TempAnalyzer

A failure caused by overheating issues, which may result from system-related issues issues such as airflow obstructions, or device-specific failures such as a damaged heatsink.

Power Event PowerAnalyzer

A failure that occurs when a device fails to reach its maximum power capacity.

RMA Event RMAAnalyzer

A set of failures indicating a hardware malfunction. Devices flagged under this event should be treated as potential RMAs and undergo further investigation.

NIC Issue Event PortsAnalyzer

A failure caused by malfunctioning NIC ports.