Per device APIs

hlmlDeviceGetHandleByPCIBusId(pci_addr: str) -> hlml_t.HLML_DEVICE.TYPE

Operation:

Acquires the handle for a device based on its PCI address.

Parameters:

Parameter

Description

pci_addr

The bus ID of the target AIP (The tuple domain:bus:device.function).

Return Value:

Device handle

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if pci_addr is invalid or device is NULL.

  • HLMLError_NotFound if the PCI address does not exist.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetHandleByIndex(index: int) -> hlml_t.HLML_DEVICE.TYPE

Operation:

Acquires the handle for a device based on its index.

Parameters:

Parameter

Description

index

Index is a valid integer {x} of existing entry /dev/hl{x} in file system (index of a device that was successfully initialized by the driver).

Return Value:

Device handle

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if index is invalid or device is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetHandleByUUID(uuid: str) -> hlml_t.HLML_DEVICE.TYPE

Operation:

Acquires the handle for a device based on its UUID.

Parameters:

Parameter

Description

uuid

The UUID of the target AIP.

Return Value:

Device handle

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if UUID is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetName(device: hlml_t.HLML_DEVICE.TYPE) -> str

Operation:

Retrieves the name of the target AIP.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Product name

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid, or name is NULL.

  • HLMLError_InsufficientSize if length is too small.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetPCIInfo(device: hlml_t.HLML_DEVICE.TYPE) -> pyhlml.hlml_types.c_hlml_pci_info

Operation:

Retrieves the PCI attributes of the target AIP.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Return Value:

PCI info:

class HLML_DEFINE:
    # ...
    PCI_ADDR_LEN                            = ( PCI_DOMAIN_LEN + 10 )
    # ...

class c_hlml_pci_info(_PrintS):
"""
/*
    * bus - The bus on which the device resides, 0 to 0xf
    * bus_id - The tuple domain:bus:device.function
    * device - The device's id on the bus, 0 to 31
    * domain - The PCI domain on which the device's bus resides
    * pci_device_id - The combined 16b deviceId and 16b vendor id
*/
"""
_fields_ = [("bus", ctypes.c_uint),
            ("bus_id", ctypes.c_char * HLML_DEFINE.PCI_ADDR_LEN),
            ("device", ctypes.c_uint),
            ("domain", ctypes.c_uint),
            ("pci_device_id", ctypes.c_uint),
            ("caps", c_hlml_pci_cap)
           ]

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid or pci is NULL.

  • HLMLError_AipIsLost if PCI data is missing.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetClockInfo(device: hlml_t.HLML_DEVICE.TYPE, clock_type=0) -> int

Operation:

Retrieves the current clock speeds of the device.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

clock_type

Identify which clock to query.

Return Value:

Clock speed in MHz:

class HLML_CLOCK_TYPE:
    TYPE                                    = ctypes.c_uint()
    HLML_CLOCK_SOC                          = 0
    HLML_CLOCK_IC                           = 1
    HLML_CLOCK_MME                          = 2
    HLML_CLOCK_TPC                          = 3
    HLML_CLOCK_COUNT                        = 4

Note

HLML_CLOCK_SOC is supported only for Gaudi. HLML_CLOCK_IC, HLML_CLOCK_MME and HLML_CLOCK_TPC is not supported.

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device or type is invalid.

  • HLMLError_NotSupported if clock is NULL.

  • HLMLError_AipIsLost if the target AIP has fallen off the bus or is otherwise inaccessible.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetMaxClockInfo(device: hlml_t.HLML_DEVICE.TYPE, clock_type=0) -> int

Operation:

Retrieves the maximum clock speeds of the device.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

type

Identify which clock to query.

Return Value:

Max clock speed in MHz:

class HLML_CLOCK_TYPE:
    TYPE                                    = ctypes.c_uint()
    HLML_CLOCK_SOC                          = 0
    HLML_CLOCK_IC                           = 1
    HLML_CLOCK_MME                          = 2
    HLML_CLOCK_TPC                          = 3
    HLML_CLOCK_COUNT                        = 4

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device or type is invalid or clock is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetUtilizationRates(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Returns the utilization over the past second, in percentage, during which one or more kernels was running on the AIP.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Return Values:

Utilization information

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid or utilization is NULL.

  • HLMLError_Unknown on any unexpected error.

  • HLMLError_NotSupported if the device does not support this feature.

hlmlDeviceGetMemoryInfo(device: hlml_t.HLML_DEVICE.TYPE) -> hlml_t.c_hlml_memory

Operation:

Retrieves the total, used and free memory.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Returns Value:

Memory information:

class c_hlml_memory(_PrintS):
    _fields_ = [("free", ctypes.c_ulonglong),
                ("total", ctypes.c_ulonglong),
                ("used", ctypes.c_ulonglong)
                ]

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid or memory is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetTemperature(device: hlml_t.HLML_DEVICE.TYPE, sensor_type: hlml_t.HLML_TEMP_SENS.TYPE) -> int

Operation:

Retrieves the current temperature of the higher sensor_type, in degrees C.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

sensor_type

Flag that indicates if the sensor is on the AIP or on the board.

Return Value:

Temperature reading:

class HLML_TEMP_SENS:
    TYPE                                    = ctypes.c_uint()
    HLML_TEMPERATURE_ON_AIP                 = 0
    HLML_TEMPERATURE_ON_BOARD               = 1
    HLML_TEMPERATURE_OTHER                  = 2

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid, sensorType is invalid or temp is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetTemperatureThreshold(device: hlml_t.HLML_DEVICE.TYPE, threshold_type: int) -> int

Operation:

Retrieves the known temperature threshold for the AIP with the specified threshold type in degrees C. Currently, this is a hard-coded value for all the types.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Threshold_type

The type of threshold value queried.

Return value:

Temperature reading:

class HLML_TEMP_THRESH:
    TYPE                                    = ctypes.c_uint()
    HLML_TEMPERATURE_THRESHOLD_SHUTDOWN     = 0
    HLML_TEMPERATURE_THRESHOLD_SLOWDOWN     = 1
    HLML_TEMPERATURE_THRESHOLD_MEM_MAX      = 2
    HLML_TEMPERATURE_THRESHOLD_GPU_MAX      = 3
    HLML_TEMPERATURE_THRESHOLD_COUNT        = 4

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid, threshold_type is invalid or temp is NULL.

hlmlDeviceGetPersistenceMode(device: hlml_t.HLML_DEVICE.TYPE) -> hlml_t.HLML_ENABLE_STATE

Operation:

API is not supported.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Current driver persistence mode:

class HLML_ENABLE_STATE:
    TYPE                                    = ctypes.c_uint()
    HLML_FEATURE_DISABLED                   = 0
    HLML_FEATURE_ENABLED                    = 1

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized

  • HLMLError_InvalidArgument if device is invalid or mode is NULL

  • HLMLError_NotSupported if the device does not support this feature

hlmlDeviceGetPerformanceState(device: hlml_t.HLML_DEVICE.TYPE) -> hlml_t.HLML_P_STATES

Operation:

API is not supported.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Performance state reading:

class HLML_P_STATES:
    TYPE                                    = ctypes.c_uint()
    HLML_PSTATE_0                           = 0
    HLML_PSTATE_UNKNOWN                     = 32

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid or p_state is NULL.

  • HLMLError_NotSupported if the device does not support this feature.

hlmlDeviceGetPowerUsage(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Retrieves power usage for the target AIP in milliwatts and its associated circuitry (e.g. memory).

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Power usage information

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid or power is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetPowerManagementDefaultLimit(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Retrieves default power management limit on this device, in milliwatts. The default power management limit is a power management limit that the device boots with.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Default power management limit in milliwatts

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid or default_limit is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetECCMode(device: hlml_t.HLML_DEVICE.TYPE) -> hlml_t.hlml_ecc_mode

Operation:

Retrieves the current and pending ECC modes of the device.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Current and pending ECC modes:

class hlml_ecc_mode(_PrintS):
    _fields_ = [("current", ctypes.c_uint),
                ("pending", ctypes.c_uint)
               ]

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid or current or pending are NULL.

hlmlDeviceGetTotalECCErrors(device: hlml_t.HLML_DEVICE.TYPE, error_type: hlml_t.HLML_MEMORY_ERROR.TYPE, counter_type: hlml_t. HLML_ECC_COUNTER) -> int

Operation:

Returns the number of ECC errors for a specific device, since the last device reset, or since the driver was installed. Only the number of uncorrected errors is supported.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

error_type

Flag that specifies the type of the errors.

counter_type

Flag that specifies the countertype of the errors.

Return Value:

Specified ECC errors:

class HLML_MEMORY_ERROR:
    TYPE                                    = ctypes.c_uint()
    HLML_MEMORY_ERROR_TYPE_CORRECTED        = 0 # NOT SUPPORTED BY HLML
    HLML_MEMORY_ERROR_TYPE_UNCORRECTED      = 1
    HLML_MEMORY_ERROR_TYPE_COUNT            = 2

class HLML_ECC_COUNTER:
    TYPE                                    = ctypes.c_uint()
    HLML_VOLATILE_ECC                       = 0
    HLML_AGGREGATE_ECC                      = 1
    HLML_ECC_COUNTER_TYPE_COUNT             = 2

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device, error type or counter type is invalid, or ecc counts is NULL.

  • HLMLError_NotSupported if the device does not support this feature.

  • HLMLError_Unknown if error occurred during ECC error retrieval

hlmlDeviceGetMemoryErrorCounter(device: hlml_t.HLML_DEVICE.TYPE, error_type: hlml_t.HLML_MEMORY_ERROR.TYPE, counter_type: hlml_t. HLML_ECC_COUNTER.TYPE, location: hlml_t.HLML_MEMORY_LOCATION.TYPE) -> int

Operation:

Returns the number of ECC errors for a specific device and location, since the last device reset, or since the driver was installed.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

error_type

Flag that specifies the type of errors.

counter_type

Flag that specifies the countertype of the errors.

location

Flag that specifies the location of the errors

Return Value:

Specified ECC errors:

class HLML_MEMORY_ERROR:
    TYPE                                    = ctypes.c_uint()
    HLML_MEMORY_ERROR_TYPE_CORRECTED        = 0 # NOT SUPPORTED BY HLML
    HLML_MEMORY_ERROR_TYPE_UNCORRECTED      = 1
    HLML_MEMORY_ERROR_TYPE_COUNT            = 2

class HLML_ECC_COUNTER:
    TYPE                                    = ctypes.c_uint()
    HLML_VOLATILE_ECC                       = 0
    HLML_AGGREGATE_ECC                      = 1
    HLML_ECC_COUNTER_TYPE_COUNT             = 2

class HLML_MEMORY_LOCATION:
    TYPE                                    = ctypes.c_uint()
    HLML_MEMORY_LOCATION_SRAM               = 0
    HLML_MEMORY_LOCATION_DRAM               = 1
    HLML_MEMORY_LOCATION_COUNT              = 2

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device, error type or counter type is invalid, or ecc counts is NULL.

  • HLMLError_NotSupported if the device does not support this feature.

  • HLMLError_Unknown if error occurred during ECC error retrieval

hlmlDeviceGetUUID(device: hlml_t.HLML_DEVICE.TYPE) -> str

Operation:

Returns the UUID for the device as string.

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Return Value:

The UUID

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid or UUID is NULL

  • HLMLError_InsufficientSize if the UUID string length is longer than the size allocated by the user.

hlmlDeviceGetMinorNumber(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Retrieves the minor number of the device. The minor number of the device is such that the Gaudi device node file for each device will have the following form: /sys/class/habanalabs/hl[minor number].

Parameters:

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Minor number of the device

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid or minor_number is NULL.

  • HLMLError_NoData if unable to retrieve the minor number for any reason.

hlmlEventSetCreate() -> hlml_t.HLML_EVENT_SET.TYPE

Operation:

Creates an empty set of events. Event set should be freed by hlml_event_set_free.

Return Value:

Event handle:

class HLML_EVENT_SET:
    TYPE                                    = ctypes.c_void_p()

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if set is NULL.

  • HLMLError_Memory if failed to allocate a set.

hlmlEventSetFree(st: hlml_t.HLML_EVENT_SET.TYPE) -> None

Operation:

Releases a set of events.

Parameters:

Parameter

Description

set

Reference to events to be released.

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if set is invalid.

hlmlDeviceRegisterEvents(device: hlml_t.HLML_DEVICE.TYPE, event_types: int, st: hlml_t.HLML_EVENT_SET.TYPE) -> None

Operation:

Starts recording of events on specified devices and add the events to the specified hlml_event_set_t. This call starts recording of events on a specific device. All events that occurred before this call are not recorded.

Supported events:

  • ECC single/double bit errors – BIT(0)

  • Critical errors that occurred – BIT(1)

  • Clock rate changes – BIT(2)

Parameters:

Parameter

Description

device

The identifier of the target AIP.

event_types

Bitmask of event types to record.

set

Set to which add new event types.

class HLML_EVENT_SET:
    TYPE                                    = ctypes.c_void_p()

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device or set is invalid or event_types is 0.

  • HLMLError_Unknown if the failed to retrieve information regarding events.

hlmlEventSetWait(st: hlml_t.HLML_EVENT_SET.TYPE, timeout: int) -> hlml_t.c_hlml_event_data

Operation:

Waits on events and delivers events. If some events are ready to be delivered at the time of the call, function returns immediately. If there are no events ready to be delivered, function sleeps until the event arrives but not longer than the specified timeout.

Parameters:

Parameter

Description

set

Reference to set of events to wait on.

timeoutms

Maximum amount of wait time in milliseconds for registered event.

Return Value:

Event data:

class HLML_EVENT_SET:
    TYPE                                    = ctypes.c_void_p()

class c_hlml_event_data(_PrintS):
    _fields_ = [("device", ctypes.c_void_p),
                ("event_type", ctypes.c_ulonglong)
               ]

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if set is invalid or data is NULL.

  • HLMLError_Unknown if the failed to retrieve information regarding events.

  • HLMLError_Timeout if we did not get any events during timeout ms.

hlmlDeviceGetMACInfo(device: hlml_t.HLML_DEVICE.TYPE, count=20, start=1) -> hlml_t.c_hlml_mac_info

Operation:

Get MAC addresses of device.

class COMMON_DEFINE:
    # ...
    ETHER_ADDR_LEN = 6
    # ...

class c_hlml_mac_info(_PrintS):
    _fields_ = [("addr", ctypes.c_ubyte * COMMON_DEFINE.ETHER_ADDR_LEN), # unsigned char
                ("id", ctypes.c_int)
               ]

Parameters

Parameter

Description

device

The identifier of the target AIP.

count

Number of requested elements.

start

MAC id to start from. Number in the range of [1…20].

Return Value:

Array of size <count> of MAC addresses

Raises:

  • HLMLError_InvalidArgument if device/mac_info/actual_mac_info_count are invalid. Or if start_mac_id is <1 or >20.

  • HLMLError_NoData if requested start MAC address is bigger than the MAC count for the device.

hlmlDeviceGetHLRevision(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Get the HL Revision.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

HL Revision number

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized

  • HLMLError_InvalidArgument if device is invalid, or hl_revision is NULL.

  • HLMLError_NotFound if failed to retrieve hl_revision.

hlmlDeviceGetPCBInfo(device: hlml_t.HLML_DEVICE.TYPE) -> hlml_t.c_hlml_pcb_info

Operation:

Gets the PCB info.

class HLML_DEFINE:
    # ...
    HL_FIELD_MAX_SIZE = 32
    # ...

class c_hlml_pcb_info(_PrintS):
    _fields_ = [("pcb_ver", ctypes.c_char * HLML_DEFINE.HL_FIELD_MAX_SIZE),
                ("pcb_assembly_ver", ctypes.c_char * HLML_DEFINE.HL_FIELD_MAX_SIZE)
               ]

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

PCB info

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized

  • HLMLError_InvalidArgument if device is invalid, or pcb is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetSerial(device: hlml_t.HLML_DEVICE.TYPE) -> str

Operation:

Retrieves the globally unique board serial number associated with the device’s board.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Board/module serial number

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized

  • HLMLError_InvalidArgument if device is invalid, or serial is NULL.

  • HLMLError_InsufficientSize if length is too small.

hlmlDeviceGetModuleID(device: hlml_t.HLML_DEVICE.TYPE) -> int:

Operation:

Retrieves the module id configured on the device.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

The module id configured on the device.

Raises:

  • HLMLError_InvalidArgument if device is invalid or board_id is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetBoardID(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Retrieves the device boardId from 0-7.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Device’s board ID

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized

  • HLMLError_InvalidArgument if device is invalid or board_id is NULL.

  • HLMLError_NotFound if no AIP matching device was found.

hlmlDeviceGetPCIEThroughput(device: hlml_t.HLML_DEVICE.TYPE, counter_type: int) -> int

Operation:

Retrieves PCIe utilization information. This function is querying PCIe throughput that was calculated over a 10ms interval.

class HLML_PCIE_UTIL_COUNTER:
    TYPE                                    = ctypes.c_uint()
    HLML_PCIE_UTIL_TX_BYTES                 = 0
    HLML_PCIE_UTIL_RX_BYTES                 = 1
    HLML_PCIE_UTIL_COUNT                    = 2

Parameters

Parameter

Description

device

The identifier of the target AIP.

counter_type

The specific counter that should be queried.

Return Value:

Throughput in KB/s

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized

  • HLMLError_InvalidArgument if device or counter is invalid, or value is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetPCIEReplayCounter(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Retrieves the PCIe replay counter.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Counter’s value

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized

  • HLMLError_InvalidArgument if device invalid, or value is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetCurrPCIELinkGeneration(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Retrieves the current PCIe link generation.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Current PCIe link generation

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized

  • HLMLError_InvalidArgument if device is invalid, or curr_link_gen is NULL.

  • HLMLError_NotSupported if PCIe link information is not available.

hlmlDeviceGetCurrPCIELinkWidth(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Retrieves the current PCIe link width.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Current PCIe link width

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized

  • HLMLError_InvalidArgument if device or counter is invalid, or curr_link_width is NULL.

  • HLMLError_NotSupported if PCIe link information is not available.

hlmlDeviceGetCurrentClocksThrottleReasons(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Retrieves current clock throttling reasons.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Bitmask of active clocks throttle reasons

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized

  • HLMLError_InvalidArgument if device is invalid, or clocks_throttle_reasons is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetTotalEnergyConsumption(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Retrieves total energy consumption in millijoules (mJ) since the driver was last reloaded.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Energy consumption

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized

  • HLMLError_InvalidArgument if device is invalid, or clocks_throttle_reasons is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetMacAddrInfo(device: hlml_t.HLML_DEVICE.TYPE) -> Tuple[Tuple[int, int], Tuple[int, int]]

Operation:

Retrieves the masks for supported ports and external ports.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

(mask, ext_mask):

  • mask is a tuple of two ints containing bitmask for supported ports.

  • mask_ext is a tuple of two ints containing bitmask for external ports within the supported ports.

Raises:

  • HLMLError_InvalidArgument if device is invalid, or mask is NULL or ext_mask is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceNicGetStatistics(device: hlml_t.HLML_DEVICE.TYPE, port: int, num_of_counts: int = None) -> hlml_t.c_hlml_nic_stats_info

Operation:

Retrieves the NICs statistics for the requested internal ports.

class c_hlml_nic_stats_info(_PrintS):
    _fields_ = [("port", ctypes.c_uint32),
                ("str_buf", ctypes.POINTER(ctypes.c_char)),
                ("val_buf", ctypes.POINTER(ctypes.c_uint64)),
                ("num_of_counters_out", ctypes.POINTER(ctypes.c_uint32))
                ]

    def __init__(self, port: int, num_of_counters: int = None):
        num_of_counters = num_of_counters or COMMON_DEFINE.HABANA_LINK_CNT_MAX_NUM
        self.port = port

        str_buf_size = num_of_counters * 32
        self.str_buf = ctypes.cast(ctypes.create_string_buffer(str_buf_size), ctypes.POINTER(ctypes.c_char))

        val_buf_size = num_of_counters * ctypes.sizeof(ctypes.c_uint64)
        self.val_buf = (ctypes.c_uint64 * val_buf_size)()

        self.num_of_counters_out = (ctypes.c_uint32 * 1)()

Parameters

Parameter

Description

device

The identifier of the target AIP.

port

Port for which the statistics are requested.

num_of_counts

[Optional] num of counts to allocate.

Return value:

NIC statistics

Raises:

  • HLMLError_InvalidArgument if device is invalid, stats_info.str_buf is NULL, stats_info.val_buf is NULL or stats_info.num_of_counters_out is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceClearCpuAffinity(device: hlml_t.HLML_DEVICE.TYPE) -> None

Operation:

Clears all affinity bindings for the calling process.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Raises:

  • HLMLError_InvalidArgument if device is invalid.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetCpuAffinity(device: hlml_t.HLML_DEVICE.TYPE, cpu_set_size: int) -> ctypes.Array

Operation:

Retrieves an array of unsigned longs (sized to cpu_set_size) of bitmasks with the ideal CPU affinity for the device. For example (64 bit machine), if processors 0, 1, 64, and 65 are ideal for the device and cpuSetSize == 2, result[0] = 0x3, result[1] = 0x3. This is equivalent to calling hlml_device_get_cpu_affinity_within_scope with HLML_AFFINITY_SCOPE_NODE.

Parameters

Parameter

Description

device

The identifier of the target AIP.

cpu_set_size

The size of the cpu_set array that is safe to access,

Return Value:

ctypes array with bitmask of CPUs, 64 CPUs per unsigned long on 64-bit machines, 32 on 32-bit machines

Raises:

  • HLMLError_InvalidArgument if device is invalid, or cpu_set is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetCpuAffinityWithinScope(device: hlml_t.HLML_DEVICE.TYPE, cpu_set_size: int, scope: hlml_t.HLML_AFFINITY_SCOPE.TYPE) -> ctypes.Array

Operation:

Retrieves an array of unsigned ints (sized to cpu_set_size) of bitmasks with the ideal CPU affinity within node or socket for the device. For example (64 bit machine), if processors 0, 1, 64, and 65 are ideal for the device and cpuSetSize == 2, result[0] = 0x3, result[1] = 0x3.

class HLML_AFFINITY_SCOPE:
    TYPE                                    = ctypes.c_uint()
    HLML_AFFINITY_SCOPE_NODE                = 0
    HLML_AFFINITY_SCOPE_SOCKET              = 1

Parameters

Parameter

Description

device

The identifier of the target AIP.

cpu_set_size

The size of the cpu_set array that is safe to access.

scop

Scope that changes the default behavior.

Return Value:

ctypes array with bitmask of CPUs, 64 CPUs per unsigned long on 64-bit machines, 32 on 32-bit machines

Raises:

  • HLMLError_InvalidArgument if device is invalid, or cpu_set is NULL.

  • HLMLError_NotSupported if scope is not supported

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetMemoryAffinity(device: hlml_t.HLML_DEVICE.TYPE, node_set_size: int, scope: hlml_t.HLML_AFFINITY_SCOPE.TYPE) -> ctypes.Array

Operation:

Retrieves an array of unsigned longs (sized to cpu_set_size) of bitmasks with the ideal memory affinity within node or socket for the device. For example, if NUMA node 0, 1 are ideal within the socket for the device and node_set_size == 1, result[0] = 0x3.

Parameters

Parameter

Description

device

The identifier of the target AIP.

node_set_size

The size of the node_set array that is safe to access.

scop

Scope that change the default behavior.

Return Value:

ctypes array with bitmask of NODEs, 64 NODEs per unsigned long on 64-bit machines, 32 on 32-bit machines

Raises:

  • HLMLError_InvalidArgument if device is invalid, or node_set is NULL.

  • HLMLError_NotSupported if scope is not supported

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceSetCpuAffinity(device: hlml_t.HLML_DEVICE.TYPE) -> None

Operation:

Sets the ideal affinity for the calling thread and device using the guidelines given in hlml_device_clear_cpu_affinity().

Parameters

Parameter

Description

device

The identifier of the target AIP.

Raises:

  • HLMLError_InvalidArgument if device is invalid.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetViolationStatus(device: hlml_t.HLML_DEVICE.TYPE, perf_policy: hlml_t.HLML_PERF_POLICY.TYPE) -> hlml_t.c_hlml_violation_time

Operation:

Gets the duration of time during which the device was throttled (lower than requested clocks) due to power or thermal constraints.

The method is important to users who are trying to understand if their AIPs throttle at any point during their applications. If the event is currently in progress, then the duration is measured from the start until up to this point in time.

class HLML_PERF_POLICY:
    TYPE                                    = ctypes.c_uint()
    HLML_PERF_POLICY_POWER                  = 0,
    HLML_PERF_POLICY_THERMAL                = 1,
    HLML_PERF_POLICY_COUNT                  = 0
class c_hlml_violation_time(_PrintS):
    _fields_ = [("reference_time", ctypes.c_ulonglong),
                ("violation_time", ctypes.c_ulonglong)
                ]
  • reference_time - Represents CPU timestamp in microseconds - time of the start of the event (unique for each event).

  • violation_time - Indicates the duration of the event in nanoseconds.

Parameters

Parameter

Description

device

The identifier of the target AIP.

perf_policy

Represents performance policy which can trigger AIP throttling.

Return Value:

Violation time related information

Raises:

  • HLMLError_InvalidArgument if device is invalid, or viol_time is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetReplacedRows(device: hlml_t.HLML_DEVICE.TYPE, cause: hlml_t.HLML_ROW_REPLACEMENT_CAUSE.TYPE, row_count: int) -> ctypes.Array

Operation:

Returns the list of replaced rows including rows that are pending replacement. The address information provided from this API is the full address of the row that was retired (see struct below).

class HLML_ROW_REPLACEMENT_CAUSE:
    TYPE                                                      = ctypes.c_uint()
    HLML_ROW_REPLACEMENT_CAUSE_MULTIPLE_SINGLE_BIT_ECC_ERRORS = 0,
    HLML_ROW_REPLACEMENT_CAUSE_DOUBLE_BIT_ECC_ERROR           = 1,
    HLML_ROW_REPLACEMENT_CAUSE_COUNT                          = 2

Row address info struct (type in returned array):

class c_hlml_row_address(_PrintS):
    _fields_ = [("hbm_idx", ctypes.c_uint8),
                ("pc", ctypes.c_uint8),
                ("sid", ctypes.c_uint8),
                ("bank_idx", ctypes.c_uint8),
                ("row_addr", ctypes.c_uint16)
                ]

Parameters

Parameter

Description

device

The identifier of the target AIP.

cause

Filter replaced rows by cause of retirement.

row_count

Reference in which to provide the addresses buffer size. Set to 0 to query the size without allocating an addresses buffer.

Return Value:

Row addresses info

Raises:

  • HLMLError_InvalidArgument if device is invalid, row_count is NULL or row_count is not 0 while addresses is NULL.

  • HLMLError_InsufficientSize if row_count indicates that the addresses buffer is not large enough to store all the matching replaced rows. row_count will be set to the required size.

  • HLMLError_Unknown on any unexpected error.

hlmlDeviceGetReplacedRowsPendingStatus(device: hlml_t.HLML_DEVICE.TYPE) -> int

Operation:

Checks if any rows that are pending replacement require a reboot to be replaced.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

Pending status

Raises:

  • HLMLError_InvalidArgument if device is invalid or is_pending is NULL.

  • HLMLError_Unknown on any unexpected error.

hlmlGetHLMLVersion() -> str

Operation:

Returns version of HLML.

Return Value:

HLML version

Raises:

  • HLMLError_InvalidArgument if version is NULL.

  • HLMLError_InsufficientSize if length is too small.

hlmlDeviceGetOperationStatus(device: hlml_t.HLML_DEVICE.TYPE) -> str

Operation:

Retrieves AIP’s status.

Parameters

Parameter

Description

device

The identifier of the target AIP.

Return Value:

A Descriptive status of the target AIP. Available status description (case insensitive):

  • operational

  • in reset

  • disabled

  • need reset

  • in device creation

  • in reset after device release

Raises:

  • HLMLError_Uninitialized if the library has not been successfully initialized.

  • HLMLError_InvalidArgument if device is invalid or status is NULL.

  • HLMLError_Unknown on any unexpected error.

  • HLMLError_NotSupported if the device does not support this feature.