Log Analysis
On this Page
Log Analysis¶
The log analysis is used to examine hl_qual reports and other log files (such as dmesg, UART, samples, etc.) to detect known hardware issues, recommend troubleshooting steps, and provide insights from the test reports. The analysis process involves iterating through the test report directories and applying analyzer logic to detect anomalies and issues.
Switches and Usage¶
The diag_tool.py
script analyzes the test reports generated. The following is a run command example.
This command generates a diagnostic report in the terminal and saves it to the specified CSV output file.
To generate and execute a test plan, see instructions in Test Plan Automation.
python diag_tool.py -i <test_plan_result_file> -o <output_file>
Options:
Option |
Description |
---|---|
|
Path to the test plan run logs. Includes data required for
the analysis. This path should be identical to the output
path set together with the |
|
Path to the output file. This file holds the tool report in CSV format. |
After running the script, a textual report will appear in your terminal.
Example textual report:
===================================== Functional_Extreme_Serdes Report =====================================
========= OAM 1 Analysis =========
- The device encountered a double error during the test.
This could indicate a memory-related issue.
Further troubleshooting is required.
===================================== Functional_High_Serdes Report =====================================
========= OAM 0 Analysis =========
If the -o
option is specified, the tool generates a CSV output file that contains the complete tool analysis, with each test represented as a row
and each device as a column, making it easier to read the full report in a structured format:
Test/Device |
OAM 0 |
OAM1 |
OAM2 |
OAM3 |
OAM4 |
OAM5 |
OAM6 |
OAM7 |
Functional_Extreme_Serdes |
The device encountered a double error … |
|||||||
Functional_High_Serdes |
… |
Events Detection¶
The log analysis is an event-driven diagnostic system that detects test failures caused by system hardware events or device hardware events. Each pre-defined event is linked to a specific analyzer, which is responsible for detecting the following events in the system:
Event |
Description |
---|---|
Thermal Event |
A failure caused by overheating issues, which may result from system-related issues issues such as airflow obstructions, or device-specific failures such as a damaged heatsink. |
Power Event |
A failure that occurs when a device fails to reach its maximum power capacity. |
RMA Event |
A set of failures indicating a hardware malfunction. Devices flagged under this event should be treated as potential RMAs and undergo further investigation. |
NIC Issue Event |
A failure caused by malfunctioning NIC ports. |