Dmesg Error Causes

The following table lists error and warning messages that may be generated by kernel module functions. Each entry includes:

  • Function name: The kernel function that generates the message.

  • Dmesg message: The exact message that appears in dmesg. Each message has one of four types: ERR, WARN, DBG, or INFO. ERR and WARN messages are generated by default, while DBG and INFO messages require setting the kernel module logging level.

    • ERR: An error message indicating a critical condition.

    • WARN: A warning message indicating a potential issue.

    • DBG: A debug message containing additional diagnostic information that can be safely ignored during normal operation.

    • INFO: An informative message that can be safely ignored during normal operation.

  • Possible reasons: Common causes for this error condition that may help you find the root cause.

  • Error code: Standard Linux error codes returned by the function.

Function

Dmesg message

Possible reasons

Error code

gaudi3_cn_alloc_rings_resources

[ERR] Failed to allocate: fifo ring, WQ ring, EQ ring, or CQ ring

Kernel memory allocation failure.

ENOMEM (12)

gaudi3_db_fifo_allocate

[DBG] Invalid DB fifo mode: %d. Allocation failed

Invalid FIFO mode supplied by user.

EINVAL(22)

gaudi3_cn_hw_mac_ch_reset

[ERR] Timeout while MAC channel %d reset

MAC failed to reset, possibly due to a PHY issue.

ETIMEDOUT(110)

gaudi3_cn_port_hw_init

[ERR] Failed to register port %d CQ %d with cn_eq_sw

Ethernet CQ registration with event queue dispatcher failed due to incorrect CQ number or ethernet context not found.

EINVAL(22) or ENODATA(61)

[ERR] Failed to allocate Raw DB FIFO, port: %d

Failed to allocate kernel memory using gen_pool_alloc.

ENOMEM(12)

gaudi3_cn_qpc_write_masked

[DBG] QPC write during hard reset, port: %u, qpn: %u

Writing QP context during hard reset is not allowed

No error, only early return.

[ERR] Cannot write to port %d QP %d %s QPC, GW is busy

Writing QP context when gateway busy bit is set.

EBUSY(16), or no error in the simulator if teardown is already in progress.

[ERR] %s QPC GW write timeout, port: %d, qpn: %u

Requester/responder QPC write timed out.

ETIMEDOUT(110)

gaudi3_handle_qp_error_retry

[ERR] Adaptive retry, port %d, QP: %d is null

During retry after QP timedout error, QP is NULL.

No error, as it indicates that the QP may already have been destroyed.

[ERR] Failed to write QPC port %d, %d, err %d

QPC gateway is either busy or timed out (waiting to clear).

EBUSY(16) or ETIMEDOUT(110)

[ERR] Failed to read QPC port %d, %d, err %d

[ERR] Failed to clear QPC error port %d, %d

Could not clear QP context to adapt retry timeout value.

No error Report QP timeout error to user.

[DBG] Dropping Port-%d QP error qp %d

QP timeout error is ignored, since a newer (adaptive) timeout value will be used.

No error

gaudi3_cn_qpc_read

[DBG] QPC read during hard reset, port: %u, qpn: %u

Writing QP context during hard reset is not allowed.

No error, only early return.

[ERR] Cannot read from port %d QP %d %s QPC, GW is busy

Reading QP context when gateway busy bit is set.

EBUSY(16), or no error in the simulator if teardown is already in progress.

gaudi3_cn_qpc_query

[ERR] %s QPC GW read timeout, port: %d, qpn: %u

Requester/responder QPC read timed out.

ETIMEDOUT(110)

gaudi3_cn_hw_macro_config

[DBG] macro:%u %s

Additional sleep on Palladium platforms

No error

__qpc_get_req_rem_ci

[ERR] Requester port %d QPC %d read failed

QPC gateway is either busy or timed out (waiting to clear).

EBUSY(16) or ETIMEDOUT(110)

__qpc_periodic_maintenance

[ERR] Responder port %d QPC %d read failed

QPC gateway is either busy or timed out (waiting to clear).

No error returned since this is a kernel worker.

[ERR] Port %d: responder peer QP is not the same as QP (%d vs %d)!

Peer QP does not match requester QP.

[ERR] Port %d QPC %d got stuck (CIs: rem_ci %u vs peer_rem_ci %u)!

Remote CI does not match peer remote CI.

[ERR] Port %d QPC %d failed dispatching EQ event %d

QP error EQ event could not be dispatched.

[ERR] Responder port %d QPC %d write failed

QP write failed because QPC gateway is either busy or timed out (waiting to clear).

gaudi3_qps_maintenance_work_init

[ERR] Failed to create QP maintenance workqueue

Kernel failed to create ordered work-queue for QP maintenance.

ENOMEM (12)

gaudi3_cn_core_init

[ERR] DRAM allocation for CN (%lluMB) shouldn’t exceed %lluMB

Kernel memory allocation for the driver exceeds the limit.

ENOMEM (12)

gaudi3_encap_set

[ERR] Encap header size(%d) must be a multiple of %ld

Encapsulation header size must be a multiple of 4 bytes.

EINVAL (22)

[DBG] Decap mask 0x%x is not valid

Decapsulation mask read from the receiver buffer (RXB) is invalid.

gaudi3_set_ip_addr_encap

[DBG] Failed to get interface IP, using 0

Source IP address not specified for encapsulation; use 0.

No error

gaudi3_cn_set_req_qp_ctx

[ERR] CQ not provided for port %u, qp: %u

CQ number not provided for setting the requester.

EINVAL(22)

[DBG] MTU of %u is not supported

MTU provided for the requester is not supported.

[DBG] WQ size, denoted by wq_size (%u), is not power of two

WQ size provided by user is not a power of 2.

[DBG] Port %u: LAG idx is invalid. Idx: %d, size %d

LAG ID exceeds the collective LAG size on this NIC macro.

[DBG] Unsupported priority value %u, port %d

Input pririty value is not supported.

[DBG] Encapsulation ID %d can’t be set when encapsulation disable, port %d

Encapsulation ID supplied when encapsulation is disabled.

[DBG] With SACK, congestion window size(%u) can’t be > max allowed size(%u), port %d

With Selective ACK feature enabled, the supplied congestion control window exceeds the maximum allowed size.

[DBG] Adaptive timeout is not currently supported on coll QPs

Adaptive timeout feature is not supported on collective QPs.

[DBG] CQ %d is invalid, port %d

CQ corresponding to supplied CQ number is invalid.

gaudi3_cn_set_res_qp_ctx

[ERR] CQ not provided for port %u, qp: %u

CQ number not provided for setting the responder.

EINVAL(22)

[DBG] Unsupported priority value %u, port %d

Input priority value is not supported.

[DBG] wq_peer_size cannot be 0 for an rdv QP

For rendezvous QP, the WQ peer size supplied is 0.

[DBG] wq_peer_size (0x%x) cannot be bigger than collective WQ entries number (0x%x)

Supplied WQ peer size (for rendezvous, collective) is greater than the total number of collective WQ entries.

[DBG] wq_peer_size (0x%x) cannot be bigger than WQ entries number (0x%x)

Supplied WQ peer size (for rendezvous) is greater than the total number of WQ entries.

[DBG] Encapsulation ID %d can’t be set when encapsulation disable, port %d

Encapsulation ID was supplied when encapsulation is disabled.

[DBG] CQ %d is invalid, port %d

CQ associated with the supplied CQ number is invalid.

[DBG] conn_peer %d is invalid, port %d

Input peer QP (for rendezvous) is invalid.

gaudi3_user_wq_arr_set

[DBG] User WQ array address shouldn’t be set: 0x%llx

User WQ array address is not set.

EINVAL(22)

[ERR] MMU not enabled. For allocations greater than %llx, MMU needs to be enabled, wq_arr_size : 0x%llx, port: %d

Memory required for WQ array is greater than the maximum contiguous DMA coherent memory allocatable by the kernel, and host MMU is disabled.

[ERR] Failed to allocate WQ: %d

Failed to allocate virtual memory for WQ array.

EINVAL(22) or ENOMEM (12)

[DBG] Port %d: WQ-> type:%u addr=0x%llx log_size:%llu wqe_asid:%u mmu_bp:%u

Debug print with WQ array allocation details.

No error

gaudi3_user_cq_set

[ERR] User CQ %d buffer allocation failed, rc %d, port %d

Kernel memory allocation for CQ buffer failed.

EINVAL(22) or ENOMEM (12)

[ERR] User CQ %d PI allocation failed, rc %d, port %d

Kernel memory allocation for CQ Produce Index failed.

[ERR] Failed to get user CQ %d UMR handle, rc %d, port %d

Failed to get handle for CQ CI User Mapped Region (UMR)

No error

[ERR] Failed to register CQ %d, rc %d, port %d

Failed to register CQ to get user EQ events.

EINVAL(22) or ENODATA (61)

gaudi3_user_set_app_params

[DBG] App params were already set, port %d

Application parameters for this port were already set.

EPERM (1)

[DBG] Port %u: plain RDMA mode is not supported with Rx drop ECO fix

Plain RDMA is not supported when H9-5384 ECO fix is enabled (default).

No error

[DBG] Port %u: advanced features are set but advanced flag is disabled

Features not supported in non-advanced mode.

EINVAL(22)

[DBG] Port %u: Working in Plain RDMA mode

Debug message.

No error

[DBG] Port %u: too many bp offsets requested. Requested - %d, available %d

Number of back pressure offsets requested exceeds the available number.

EINVAL(22)

[DBG] Port %u: bp %u invalid BP offset 0x%x

The input back pressure offset is invalid.

gaudi3_cn_read_mac_fec_stats_odd_port

[DBG] Clearing MAC stats timeout, port %d

Wait for clearing the MAC stats registers timed out.

No error

[DBG] Capture MAC stats timeout, port %d

Wait to signal that MAC stats were captured timed out.

gaudi3_cn_port_sw_init

[ERR] Failed to alloc rings, port: %d, %d

Failed to allocate ring buffers for FIFO, WQ, EQ, or CQ.

ENOMEM (12)

[ERR] Failed to allocate port %d host memory to emulate DRAM

Failed to allocate host memory to emulate HBM when HBM is not present.

gaudi3_cn_macro_sw_init

[ERR] Failed to allocate port %d host memory to emulate DRAM

Failed to allocate host memory to emulate HBM when HBM is not present.

ENOMEM (12)

[ERR] Failed to create gen_pool to manage db fifo

Gen pool creation for door-bell FIFO failed.

[ERR] Failed adding memory to db fifo gen pool

Failed to add memory base for gen pool for door-bell FIFO.

gaudi3_cn_ring_tx_doorbell

[DBG] Port %d DB fifo full. PI %d, CI %d

No space left in door-bell FIFO.

EBUSY(16)

gaudi3_cn_en_db_fifo_reset

[ERR] Port %d user doorbell %d fifo reset timed out, %d

Timed out waiting for door-bell FIFO to reset.

ETIMEDOUT(110)

gaudi3_cn_db_fifo_reset

[ERR] Failed to retrieve port %d db fifo CI memory

No buffer corresponding to this handle.

None, buffer is NULL.

[ERR] Port %d user doorbell %d fifo CI %d reset timed out, %d

Timed out waiting for door-bell FIFO to reset.

ETIMEDOUT(110)

gaudi3_handle_cn_port_reset_locked

[ERR] Port %d, going to reset

Informational message for reset operation.

No error

gaudi3_cn_handle_spi_event

[ERR] QPC EQ error on port %d

EQ error interrupt received.

No error

[DBG] RXE SPI error on macro %d cause: %s.

RXE SPI error interrupt received.

[DBG] RXB CORE SPI error on macro %d cause: %s. error count:%u

RXB core error interrupt received.

gaudi3_eq_poll

[INFO] Got EQE invalid entry while expecting a valid one

The valid bit of EQE is not set.

ENODATA (61)

[DBG] dropping Port-%d event %d report to user

Completion event is dropped since it can not be disabled.

[INFO] Dropping Port-%d event %d report to user

Unknown EQ event was seen and reported to user.

validate_dir_dup_mask

[DBG] Invalid dir_dup_ports_mask: 0x%x for 400G mode, port: %d

Incorrect direct patcher DUP port mask enabled for 400G mode.

EINVAL (22)

[DBG] Invalid dir_dup_ports_mask: 0x%x for 200G mode, port: %d

Incorrect direct patcher DUP port mask for 200G mode.

[DBG] Invalid dir_dup_port_mask: 0x%x, Other Port: %d of the macro is not enabled, port: %d

In 200G mode, direct patcher DUP mask is invalid since the other port of the macro is not enabled.

gaudi3_db_fifo_set

[DBG] Failed to set DB FIFO, SOB offset 0x%llx is out of range, port %d

Failed to set door-bell FIFO since the input SOB offset is out of range.

EINVAL (22)

[WARN] Truncating number of SOBs %d -> %d

Truncating the number of SOBs to a lower value that is a power of 2.

No error

gaudi3_db_fifo_unset

[DBG] Port %d user DB fifo %d SOB reset timed out, %d

Timed out waiting for door-bell FIFO to reset.

ETIMEDOUT(110)

gaudi3_cn_get_cnts_values

[ERR] Failed to get SPMU counters, port %d

Failed to read SPMU counters.

EINVAL(22)

gaudi3_ignore_coll_qp_hw_bug_event

[DBG] Failed to find matching collective QP %u, port %u

Could not find a QP corresponding to the input collective QP ID.

No error, as this function determines whether a hardware issue has occurred and whether the event should be ignored.

[DBG] No QP for port %u under collective QP %u

QP is not from a port that is part of the communications group.

[DBG] Got event on unused Collective QP %u

Hardware bug case where QP is from a port that is part of the communications group, but QP is in RESET state.

gaudi3_cn_qp_post_destroy

[WARN] Failed to invalidate port %u TXWQC, val: %u

Timed out waiting for TX WQ cache to be invalidated.

ETIMEDOUT(110)

gaudi3_cn_adaptive_tmr_reset

[ERR] Retry count is %lld, but current gran is already reset

For the adaptive retries feature, although the retry counter is non-zero, the timeout granularity is already reset.

No error

gaudi3_cn_send_cpucp_packet

[ERR] Failed to send cpucp packet, port %d packet id %d, val %d, error %d

Failed to send NIC status packet to the embedded firmware.

ENOMEM(12), EAGAIN(11), ETIMEDOUT(110), EIO(5)

gaudi3_cn_is_encap_supported

[DBG] Encap is not supported

Due to a hardware bug fix, encapsulation is not supported.

None.

gaudi3_cn_set_dram_properties

[ERR] NIC DRAM memory allocation overflow (reserved %lu, allocated %lu)

DRAM memory allocation exceeds the size reserved for the driver.

ENOMEM (12)

gaudi3_cn_get_hw_block_addr

[ERR] Failed to get hw block address for register 0x%x

Unable to get hardware block address since the register offset exceeds the PCI BAR size.

EINVAL(22)

gaudi3_cn_read_mem_data

[ERR] Failed to retrieve WQ memory for handle 0x%llx

Memory handle is invalid or unallocated.

EINVAL(22)

gaudi3_cn_write_mem_data

[ERR] Failed to retrieve WQ memory for handle 0x%llx

Memory handle is invalid or unallocated.

EINVAL(22)

gaudi3_cn_migrate_qp_prepare_req

[DBG] CQ %d is invalid, port %d

No CQ allocated for the input CQ number.

EINVAL(22)

[ERR] CQ not provided for port %u

No input CQ number provided.

[DBG] Failed to read old requester QPC

QPC gateway is either busy or timed out (waiting to clear).

EBUSY(16) or ETIMEDOUT(110)

[DBG] Failed to invalidate old requester QPC

[DBG] Failed to set EI in new QPC

[DBG] Failed to read new responder QPC

[ERR] Failed to write new requester QPC

[ERR] Failed to read sWQE[%u], port: %u, qp: %u

Failed to get memory from the send WQE handle.

EINVAL(22)

[ERR] Failed to write sWQE[%u], port: %u, qp: %u

gaudi3_cn_migrate_qp_prepare_res

[DBG] CQ %d is invalid, port %d

No CQ allocated for the input CQ number.

EINVAL(22)

[ERR] CQ not provided for port %u

No input CQ number provided.

[DBG] Failed to read old responder QPC

QPC gateway is either busy or timed out (waiting to clear).

EBUSY(16) or ETIMEDOUT(110)

[DBG] conn_peer %d is invalid, port %d

Peer QP (for rendezvous) on the new port is invalid.

EINVAL(22)

gaudi3_cn_migrate_qp_work

[ERR] Failed to read old requester QPC

QPC gateway is either busy or timed out (waiting to clear).

EBUSY(16) or ETIMEDOUT(110)

[ERR] Failed to read new requester QPC

gaudi3_cn_trigger_link_shutdown

[DBG] Port %d: error bcast link shutdown event

Unable to enqueue the broadcast link shutdown event.

ENOSPC (28)