Kernel Module Diagnostics
Kernel Module Diagnostics¶
This section provides diagnostic information for the kernel module that manages Intel® Gaudi® AI accelerator hardware. It helps to interpret error messages and warnings that appear in system logs and understand their root causes.
This guide is organized into three main sections:
Dmesg Error Causes: Software-level error messages from kernel module functions
Scalable Ethernet Interface (SEI) error causes: Hardware-level errors from the SEI components
Queue Pair (QP) error causes: RDMA-specific errors related to QP operations
Understanding these diagnostics enables faster root cause analysis and more effective troubleshooting of networking and communication issues in Gaudi systems.