Per Device APIs
On this Page
Per Device APIs¶
hlmlDeviceGetHandleByPCIBusId(pci_addr: str) -> hlml_t.HLML_DEVICE.TYPE¶
Operation:
Acquires the handle for a device based on its PCI address.
Parameters:
Parameter |
Description |
---|---|
pci_addr |
The bus ID of the target AIP (The tuple domain:bus:device.function). |
Return Value:
Device handle.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if pci_addr is invalid or device is NULL.
HLMLError_NotFound if the PCI address does not exist.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetHandleByIndex(index: int) -> hlml_t.HLML_DEVICE.TYPE¶
Operation:
Acquires the handle for a device based on its index.
Parameters:
Parameter |
Description |
---|---|
index |
Index is a valid integer {x} of existing entry /dev/hl{x} in file system (index of a device that was successfully initialized by the driver). |
Return Value:
Device handle.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if index is invalid or device is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetHandleByUUID(uuid: str) -> hlml_t.HLML_DEVICE.TYPE¶
Operation:
Acquires the handle for a device based on its UUID.
Parameters:
Parameter |
Description |
---|---|
uuid |
The UUID of the target AIP. |
Return Value:
Device handle.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if UUID is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetName(device: hlml_t.HLML_DEVICE.TYPE) -> str¶
Operation:
Retrieves the name of the target AIP.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Product name.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid, or name is NULL.
HLMLError_InsufficientSize if length is too small.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetPCIInfo(device: hlml_t.HLML_DEVICE.TYPE) -> pyhlml.hlml_types.c_hlml_pci_info¶
Operation:
Retrieves the PCI attributes of the target AIP.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
PCI info:
class HLML_DEFINE:
# ...
PCI_ADDR_LEN = ( PCI_DOMAIN_LEN + 10 )
# ...
class c_hlml_pci_info(_PrintS):
"""
/*
* bus - The bus on which the device resides, 0 to 0xf
* bus_id - The tuple domain:bus:device.function
* device - The device's id on the bus, 0 to 31
* domain - The PCI domain on which the device's bus resides
* pci_device_id - The combined 16b deviceId and 16b vendor id
*/
"""
_fields_ = [("bus", ctypes.c_uint),
("bus_id", ctypes.c_char * HLML_DEFINE.PCI_ADDR_LEN),
("device", ctypes.c_uint),
("domain", ctypes.c_uint),
("pci_device_id", ctypes.c_uint),
("caps", c_hlml_pci_cap)
]
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or pci is NULL.
HLMLError_AipIsLost if PCI data is missing.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetClockInfo(device: hlml_t.HLML_DEVICE.TYPE, clock_type=0) -> int¶
Operation:
Retrieves the current clock speeds of the device.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
clock_type |
Identify which clock to query. |
Return Value:
Clock speed in MHz:
class HLML_CLOCK_TYPE:
TYPE = ctypes.c_uint()
HLML_CLOCK_SOC = 0
HLML_CLOCK_IC = 1
HLML_CLOCK_MME = 2 # Not supported
HLML_CLOCK_TPC = 3 # Not supported
HLML_CLOCK_COUNT = 4
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device or type is invalid.
HLMLError_NotSupported if clock is NULL.
HLMLError_AipIsLost if the target AIP has fallen off the bus or is otherwise inaccessible.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetMaxClockInfo(device: hlml_t.HLML_DEVICE.TYPE, clock_type=0) -> int¶
Operation:
Retrieves the maximum clock speeds of the device.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
type |
Identify which clock to query. |
Return Value:
Max clock speed in MHz:
class HLML_CLOCK_TYPE:
TYPE = ctypes.c_uint()
HLML_CLOCK_SOC = 0
HLML_CLOCK_IC = 1
HLML_CLOCK_MME = 2
HLML_CLOCK_TPC = 3
HLML_CLOCK_COUNT = 4
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device or type is invalid or clock is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetClockLimitInfo(device: hlml_t.HLML_DEVICE.TYPE, clock_type=0 ) -> int¶
Operation:
Retrieves the clock frequency limit speeds of the device.
Parameters:
Parameter |
Description |
---|---|
device |
[in] The identifier of the target AIP. |
type |
[in] Identify which clock to query. |
Return Value:
Frequency limit of the selected clock in in MHz.
class HLML_CLOCK_TYPE:
TYPE = ctypes.c_uint()
HLML_CLOCK_SOC = 0
HLML_CLOCK_IC = 1
HLML_CLOCK_MME = 2
HLML_CLOCK_TPC = 3
HLML_CLOCK_COUNT = 4
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device or type is invalid or clock is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetUtilizationRates(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Returns the utilization over the past second, in percentage, during which one or more kernels was running on the AIP.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Values:
Utilization information.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or utilization is NULL.
HLMLError_Unknown on any unexpected error.
HLMLError_NotSupported if the device does not support this feature.
hlmlDeviceGetMemoryInfo(device: hlml_t.HLML_DEVICE.TYPE) -> hlml_t.c_hlml_memory¶
Operation:
Retrieves the total, used and free memory.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Returns Value:
Memory information:
class c_hlml_memory(_PrintS):
_fields_ = [("free", ctypes.c_ulonglong),
("total", ctypes.c_ulonglong),
("used", ctypes.c_ulonglong)
]
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or memory is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetTemperature(device: hlml_t.HLML_DEVICE.TYPE, sensor_type: hlml_t.HLML_TEMP_SENS.TYPE) -> int¶
Operation:
Retrieves the current temperature of the higher sensor_type, in degrees C.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
sensor_type |
Flag that indicates if the sensor is on the AIP or on the board. |
Return Value:
Temperature reading:
class HLML_TEMP_SENS:
TYPE = ctypes.c_uint()
HLML_TEMPERATURE_ON_AIP = 0
HLML_TEMPERATURE_ON_BOARD = 1
HLML_TEMPERATURE_OTHER = 2
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid, sensorType is invalid or temp is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetTemperatureThreshold(device: hlml_t.HLML_DEVICE.TYPE, threshold_type: int) -> int¶
Operation:
Retrieves the known temperature threshold for the AIP with the specified threshold type in degrees C. Currently, this is a hard-coded value for all the types.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Threshold_type |
The type of threshold value queried. |
Return Value:
Temperature reading:
class HLML_TEMP_THRESH:
TYPE = ctypes.c_uint()
HLML_TEMPERATURE_THRESHOLD_SHUTDOWN = 0
HLML_TEMPERATURE_THRESHOLD_SLOWDOWN = 1
HLML_TEMPERATURE_THRESHOLD_MEM_MAX = 2
HLML_TEMPERATURE_THRESHOLD_GPU_MAX = 3
HLML_TEMPERATURE_THRESHOLD_COUNT = 4
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid, threshold_type is invalid or temp is NULL.
hlmlDeviceGetPersistenceMode(device: hlml_t.HLML_DEVICE.TYPE) -> hlml_t.HLML_ENABLE_STATE¶
Operation:
API is not supported.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Current driver persistence mode:
class HLML_ENABLE_STATE:
TYPE = ctypes.c_uint()
HLML_FEATURE_DISABLED = 0
HLML_FEATURE_ENABLED = 1
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or mode is NULL.
HLMLError_NotSupported if the device does not support this feature.
hlmlDeviceGetPerformanceState(device: hlml_t.HLML_DEVICE.TYPE) -> hlml_t.HLML_P_STATES¶
Operation:
API is not supported.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Performance state reading:
class HLML_P_STATES:
TYPE = ctypes.c_uint()
HLML_PSTATE_0 = 0
HLML_PSTATE_UNKNOWN = 32
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or p_state is NULL.
HLMLError_NotSupported if the device does not support this feature.
hlmlDeviceGetPowerUsage(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Retrieves power usage for the target AIP in milliwatts and its associated circuitry (e.g. memory).
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Power usage information.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or power is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetPowerManagementDefaultLimit(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Retrieves default power management limit on this device, in milliwatts. The default power management limit is a power management limit that the device boots with.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Default power management limit in milliwatts.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or default_limit is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetPowerManagementLimit(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Retrieves the power management limit on this device, in milliwatts. The power management limit defines the upper boundary for the card’s power draw. If the card’s total power draw reaches this limit, the power management algorithm is triggered.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Power management limit in milliwatts.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or limit is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetPowerManagementMode(device: hlml_t.HLML_DEVICE.TYPE) -> hlml_t.HLML_ENABLE_STATE¶
Operation:
Retrieves the power management mode associated with this device.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
class HLML_ENABLE_STATE:
TYPE = ctypes.c_uint()
HLML_FEATURE_DISABLED = 0
HLML_FEATURE_ENABLED = 1
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or mode is NULL.
HLMLError_NotSupported if the device does not support this feature.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetECCMode(device: hlml_t.HLML_DEVICE.TYPE) -> hlml_t.hlml_ecc_mode¶
Operation:
Retrieves the current and pending ECC modes of the device.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Current and pending ECC modes:
class hlml_ecc_mode(_PrintS):
_fields_ = [("current", ctypes.c_uint),
("pending", ctypes.c_uint)
]
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or current or pending are NULL.
hlmlDeviceGetTotalECCErrors(device: hlml_t.HLML_DEVICE.TYPE, error_type: hlml_t.HLML_MEMORY_ERROR.TYPE, counter_type: hlml_t. HLML_ECC_COUNTER) -> int¶
Operation:
Returns the number of ECC errors for a specific device, since the last device reset, or since the driver was installed. Only the number of uncorrected errors is supported.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
error_type |
Flag that specifies the type of the errors. |
counter_type |
Flag that specifies the countertype of the errors. |
Return Value:
Specified ECC errors:
class HLML_MEMORY_ERROR:
TYPE = ctypes.c_uint()
HLML_MEMORY_ERROR_TYPE_CORRECTED = 0 # NOT SUPPORTED BY HLML
HLML_MEMORY_ERROR_TYPE_UNCORRECTED = 1
HLML_MEMORY_ERROR_TYPE_COUNT = 2
class HLML_ECC_COUNTER:
TYPE = ctypes.c_uint()
HLML_VOLATILE_ECC = 0
HLML_AGGREGATE_ECC = 1
HLML_ECC_COUNTER_TYPE_COUNT = 2
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device, error type or counter type is invalid, or ecc counts is NULL.
HLMLError_NotSupported if the device does not support this feature.
HLMLError_Unknown if error occurred during ECC error retrieval.
hlmlDeviceGetMemoryErrorCounter(device: hlml_t.HLML_DEVICE.TYPE, error_type: hlml_t.HLML_MEMORY_ERROR.TYPE, counter_type: hlml_t. HLML_ECC_COUNTER.TYPE, location: hlml_t.HLML_MEMORY_LOCATION.TYPE) -> int¶
Operation:
Returns the number of ECC errors for a specific device and location, since the last device reset, or since the driver was installed.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
error_type |
Flag that specifies the type of errors. |
counter_type |
Flag that specifies the countertype of the errors. |
location |
Flag that specifies the location of the errors. |
Return Value:
Specified ECC errors:
class HLML_MEMORY_ERROR:
TYPE = ctypes.c_uint()
HLML_MEMORY_ERROR_TYPE_CORRECTED = 0 # NOT SUPPORTED BY HLML
HLML_MEMORY_ERROR_TYPE_UNCORRECTED = 1
HLML_MEMORY_ERROR_TYPE_COUNT = 2
class HLML_ECC_COUNTER:
TYPE = ctypes.c_uint()
HLML_VOLATILE_ECC = 0
HLML_AGGREGATE_ECC = 1
HLML_ECC_COUNTER_TYPE_COUNT = 2
class HLML_MEMORY_LOCATION:
TYPE = ctypes.c_uint()
HLML_MEMORY_LOCATION_SRAM = 0
HLML_MEMORY_LOCATION_DRAM = 1
HLML_MEMORY_LOCATION_COUNT = 2
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device, error type or counter type is invalid, or ecc counts is NULL.
HLMLError_NotSupported if the device does not support this feature.
HLMLError_Unknown if error occurred during ECC error retrieval.
hlmlDeviceGetUUID(device: hlml_t.HLML_DEVICE.TYPE) -> str¶
Operation:
Returns the UUID for the device as string.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
The UUID.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or UUID is NULL.
HLMLError_InsufficientSize if the UUID string length is longer than the size allocated by the user.
hlmlDeviceGetMinorNumber(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Retrieves the minor number of the device. The minor number of the device is such that the Gaudi device node file for each device will have the following form: /sys/class/habanalabs/hl[minor number].
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Minor number of the device.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or minor_number is NULL.
HLMLError_NoData if unable to retrieve the minor number for any reason.
hlmlEventSetCreate() -> hlml_t.HLML_EVENT_SET.TYPE¶
Operation:
Creates an empty set of events. Event set should be freed by hlml_event_set_free.
Return Value:
Event handle:
class HLML_EVENT_SET:
TYPE = ctypes.c_void_p()
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if set is NULL.
HLMLError_Memory if failed to allocate a set.
hlmlEventSetFree(st: hlml_t.HLML_EVENT_SET.TYPE) -> None¶
Operation:
Releases a set of events.
Parameters:
Parameter |
Description |
---|---|
set |
Reference to events to be released. |
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if set is invalid.
hlmlDeviceRegisterEvents(device: hlml_t.HLML_DEVICE.TYPE, event_types: int, st: hlml_t.HLML_EVENT_SET.TYPE) -> None¶
Operation:
Starts recording of events on specified devices and add the events to the specified hlml_event_set_t. This call starts recording of events on a specific device. All events that occurred before this call are not recorded.
Supported events:
ECC single/double bit errors – BIT(0)
Critical errors that occurred – BIT(1)
Clock rate changes – BIT(2)
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
event_types |
Bitmask of event types to record. |
set |
Set to which add new event types. |
class HLML_EVENT_SET:
TYPE = ctypes.c_void_p()
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device or set is invalid or event_types is 0.
HLMLError_Unknown if the failed to retrieve information regarding events.
hlmlEventSetWait(st: hlml_t.HLML_EVENT_SET.TYPE, timeout: int) -> hlml_t.c_hlml_event_data¶
Operation:
Waits on events and delivers events. If some events are ready to be delivered at the time of the call, function returns immediately. If there are no events ready to be delivered, function sleeps until the event arrives but not longer than the specified timeout.
Parameters:
Parameter |
Description |
---|---|
set |
Reference to set of events to wait on. |
timeoutms |
Maximum amount of wait time in milliseconds for registered event. |
Return Value:
Event data:
class HLML_EVENT_SET:
TYPE = ctypes.c_void_p()
class c_hlml_event_data(_PrintS):
_fields_ = [("device", ctypes.c_void_p),
("event_type", ctypes.c_ulonglong)
]
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if set is invalid or data is NULL.
HLMLError_Unknown if the failed to retrieve information regarding events.
HLMLError_Timeout if we did not get any events during timeout ms.
hlmlDeviceGetMACInfo(device: hlml_t.HLML_DEVICE.TYPE, count=20, start=1) -> hlml_t.c_hlml_mac_info¶
Operation:
Gets MAC addresses of device.
class COMMON_DEFINE:
# ...
ETHER_ADDR_LEN = 6
# ...
class c_hlml_mac_info(_PrintS):
_fields_ = [("addr", ctypes.c_ubyte * COMMON_DEFINE.ETHER_ADDR_LEN), # unsigned char
("id", ctypes.c_int)
]
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
count |
Number of requested elements. |
start |
MAC id to start from. Number in the range of [1…20]. |
Return Value:
Array of size <count> of MAC addresses.
Raises:
HLMLError_InvalidArgument if device/mac_info/actual_mac_info_count are invalid. Or if start_mac_id is <1 or >20.
HLMLError_NoData if requested start MAC address is bigger than the MAC count for the device.
hlmlDeviceGetHLRevision(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Gets the HL Revision.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
HL Revision number.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid, or hl_revision is NULL.
HLMLError_NotFound if failed to retrieve hl_revision.
hlmlDeviceGetPCBInfo(device: hlml_t.HLML_DEVICE.TYPE) -> hlml_t.c_hlml_pcb_info¶
Operation:
Gets the PCB info.
class HLML_DEFINE:
# ...
HL_FIELD_MAX_SIZE = 32
# ...
class c_hlml_pcb_info(_PrintS):
_fields_ = [("pcb_ver", ctypes.c_char * HLML_DEFINE.HL_FIELD_MAX_SIZE),
("pcb_assembly_ver", ctypes.c_char * HLML_DEFINE.HL_FIELD_MAX_SIZE)
]
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
PCB info.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid, or PCB is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetSerial(device: hlml_t.HLML_DEVICE.TYPE) -> str¶
Operation:
Retrieves the globally unique board serial number associated with the device’s board.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Board/module serial number.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid, or serial is NULL.
HLMLError_InsufficientSize if length is too small.
hlmlDeviceGetModuleID(device: hlml_t.HLML_DEVICE.TYPE) -> int:¶
Operation:
Retrieves the module id configured on the device.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
The module id configured on the device.
Raises:
HLMLError_InvalidArgument if device is invalid or board_id is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetBoardID(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Retrieves the device boardId from 0-7.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Device’s board ID.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or board_id is NULL.
HLMLError_NotFound if no AIP matching device was found.
hlmlDeviceGetPCIEThroughput(device: hlml_t.HLML_DEVICE.TYPE, counter_type: int) -> int¶
Operation:
Retrieves PCIe utilization information. This function is querying PCIe throughput that was calculated over a 10ms interval.
class HLML_PCIE_UTIL_COUNTER:
TYPE = ctypes.c_uint()
HLML_PCIE_UTIL_TX_BYTES = 0
HLML_PCIE_UTIL_RX_BYTES = 1
HLML_PCIE_UTIL_COUNT = 2
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
counter_type |
The specific counter that should be queried. |
Return Value:
Throughput in KB/s.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device or counter is invalid, or value is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetPCIEReplayCounter(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Retrieves the PCIe replay counter.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Counter’s value.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device invalid, or value is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetCurrPCIELinkGeneration(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Retrieves the current PCIe link generation.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Current PCIe link generation.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid, or curr_link_gen is NULL.
HLMLError_NotSupported if PCIe link information is not available.
hlmlDeviceGetCurrPCIELinkWidth(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Retrieves the current PCIe link width.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Current PCIe link width.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device or counter is invalid, or curr_link_width is NULL.
HLMLError_NotSupported if PCIe link information is not available.
hlmlDeviceGetCurrentClocksThrottleReasons(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Retrieves current clock throttling reasons.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Bitmask of active clocks throttle reasons.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid, or clocks_throttle_reasons is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetTotalEnergyConsumption(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Retrieves total energy consumption in millijoules (mJ) since the driver was last reloaded.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Energy consumption.
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid, or clocks_throttle_reasons is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetMacAddrInfo(device: hlml_t.HLML_DEVICE.TYPE) -> Tuple[Tuple[int, int], Tuple[int, int]]¶
Operation:
Retrieves the masks for supported ports and external ports.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
(mask, ext_mask):
mask is a tuple of two ints containing bitmask for supported ports.
mask_ext is a tuple of two ints containing bitmask for external ports within the supported ports.
Raises:
HLMLError_InvalidArgument if device is invalid, or mask is NULL or ext_mask is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceNicGetLink(device: hlml_t.HLML_DEVICE.TYPE, port: int) -> bool¶
Operation:
Retrieves the NICs link status (up/down) for the requested internal ports.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
port |
Port for which the status is requested. |
Return Value:
Status of the port (is up).
Raises:
HLMLError_InvalidArgument if device is invalid, or up is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceNicGetStatistics(device: hlml_t.HLML_DEVICE.TYPE, port: int, num_of_counts: int = None) -> hlml_t.c_hlml_nic_stats_info¶
Operation:
Retrieves the NICs statistics for the requested internal ports.
class c_hlml_nic_stats_info(_PrintS):
_fields_ = [("port", ctypes.c_uint32),
("str_buf", ctypes.POINTER(ctypes.c_char)),
("val_buf", ctypes.POINTER(ctypes.c_uint64)),
("num_of_counters_out", ctypes.POINTER(ctypes.c_uint32))
]
def __init__(self, port: int, num_of_counters: int = None):
num_of_counters = num_of_counters or COMMON_DEFINE.HABANA_LINK_CNT_MAX_NUM
self.port = port
str_buf_size = num_of_counters * 32
self.str_buf = ctypes.cast(ctypes.create_string_buffer(str_buf_size), ctypes.POINTER(ctypes.c_char))
val_buf_size = num_of_counters * ctypes.sizeof(ctypes.c_uint64)
self.val_buf = (ctypes.c_uint64 * val_buf_size)()
self.num_of_counters_out = (ctypes.c_uint32 * 1)()
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
port |
Port for which the statistics are requested. |
num_of_counts |
[Optional] num of counts to allocate. |
Return Value:
NIC statistics.
Raises:
HLMLError_InvalidArgument if device is invalid, stats_info.str_buf is NULL, stats_info.val_buf is NULL or stats_info.num_of_counters_out is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceClearCpuAffinity(device: hlml_t.HLML_DEVICE.TYPE) -> None¶
Operation:
Clears all affinity bindings for the calling process.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Raises:
HLMLError_InvalidArgument if device is invalid.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetCpuAffinity(device: hlml_t.HLML_DEVICE.TYPE, cpu_set_size: int) -> ctypes.Array¶
Operation:
Retrieves an array of unsigned longs (sized to cpu_set_size) of bitmasks with the ideal CPU affinity for the device. For example (64 bit machine), if processors 0, 1, 64, and 65 are ideal for the device and cpuSetSize == 2, result[0] = 0x3, result[1] = 0x3. This is equivalent to calling hlml_device_get_cpu_affinity_within_scope with HLML_AFFINITY_SCOPE_NODE.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
cpu_set_size |
The size of the cpu_set array that is safe to access, |
Return Value:
ctypes array with bitmask of CPUs, 64 CPUs per unsigned long on 64-bit machines, 32 on 32-bit machines.
Raises:
HLMLError_InvalidArgument if device is invalid, or cpu_set is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetCpuAffinityWithinScope(device: hlml_t.HLML_DEVICE.TYPE, cpu_set_size: int, scope: hlml_t.HLML_AFFINITY_SCOPE.TYPE) -> ctypes.Array¶
Operation:
Retrieves an array of unsigned ints (sized to cpu_set_size) of bitmasks with the ideal CPU affinity within node or socket for the device. For example (64 bit machine), if processors 0, 1, 64, and 65 are ideal for the device and cpuSetSize == 2, result[0] = 0x3, result[1] = 0x3.
class HLML_AFFINITY_SCOPE:
TYPE = ctypes.c_uint()
HLML_AFFINITY_SCOPE_NODE = 0
HLML_AFFINITY_SCOPE_SOCKET = 1
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
cpu_set_size |
The size of the cpu_set array that is safe to access. |
scop |
Scope that changes the default behavior. |
Return Value:
ctypes array with bitmask of CPUs, 64 CPUs per unsigned long on 64-bit machines, 32 on 32-bit machines.
Raises:
HLMLError_InvalidArgument if device is invalid, or cpu_set is NULL.
HLMLError_NotSupported if scope is not supported.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetMemoryAffinity(device: hlml_t.HLML_DEVICE.TYPE, node_set_size: int, scope: hlml_t.HLML_AFFINITY_SCOPE.TYPE) -> ctypes.Array¶
Operation:
Retrieves an array of unsigned longs (sized to cpu_set_size) of bitmasks with the ideal memory affinity within node or socket for the device. For example, if NUMA node 0, 1 are ideal within the socket for the device and node_set_size == 1, result[0] = 0x3.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
node_set_size |
The size of the node_set array that is safe to access. |
scop |
Scope that change the default behavior. |
Return Value:
ctypes array with bitmask of NODEs, 64 NODEs per unsigned long on 64-bit machines, 32 on 32-bit machines.
Raises:
HLMLError_InvalidArgument if device is invalid, or node_set is NULL.
HLMLError_NotSupported if scope is not supported.
HLMLError_Unknown on any unexpected error.
hlmlDeviceSetCpuAffinity(device: hlml_t.HLML_DEVICE.TYPE) -> None¶
Operation:
Sets the ideal affinity for the calling thread and device using the guidelines given in hlml_device_clear_cpu_affinity().
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Raises:
HLMLError_InvalidArgument if device is invalid.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetViolationStatus(device: hlml_t.HLML_DEVICE.TYPE, perf_policy: hlml_t.HLML_PERF_POLICY.TYPE) -> hlml_t.c_hlml_violation_time¶
Operation:
Gets the duration of time during which the device was throttled (lower than requested clocks) due to power or thermal constraints.
The method is important to users who are trying to understand if their AIPs throttle at any point during their applications. If the event is currently in progress, then the duration is measured from the start until up to this point in time.
class HLML_PERF_POLICY:
TYPE = ctypes.c_uint()
HLML_PERF_POLICY_POWER = 0,
HLML_PERF_POLICY_THERMAL = 1,
HLML_PERF_POLICY_COUNT = 0
class c_hlml_violation_time(_PrintS):
_fields_ = [("reference_time", ctypes.c_ulonglong),
("violation_time", ctypes.c_ulonglong)
]
reference_time represents CPU timestamp in microseconds - time of the start of the event (unique for each event).
violation_time indicates the duration of the event in nanoseconds.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
perf_policy |
Represents performance policy which can trigger AIP throttling. |
Return Value:
Violation time related information.
Raises:
HLMLError_InvalidArgument if device is invalid, or viol_time is NULL.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetReplacedRows(device: hlml_t.HLML_DEVICE.TYPE, cause: hlml_t.HLML_ROW_REPLACEMENT_CAUSE.TYPE, row_count: int) -> ctypes.Array¶
Operation:
Returns the list of replaced rows including rows that are pending replacement. The address information provided from this API is the full address of the row that was retired (see struct below).
class HLML_ROW_REPLACEMENT_CAUSE:
TYPE = ctypes.c_uint()
HLML_ROW_REPLACEMENT_CAUSE_MULTIPLE_SINGLE_BIT_ECC_ERRORS = 0,
HLML_ROW_REPLACEMENT_CAUSE_DOUBLE_BIT_ECC_ERROR = 1,
HLML_ROW_REPLACEMENT_CAUSE_COUNT = 2
Row address info struct (type in returned array):
class c_hlml_row_address(_PrintS):
_fields_ = [("hbm_idx", ctypes.c_uint8),
("pc", ctypes.c_uint8),
("sid", ctypes.c_uint8),
("bank_idx", ctypes.c_uint8),
("row_addr", ctypes.c_uint16)
]
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
cause |
Filter replaced rows by cause of retirement. |
row_count |
Reference in which to provide the addresses buffer size. Set to 0 to query the size without allocating an addresses buffer. |
Return Value:
Row addresses info.
Raises:
HLMLError_InvalidArgument if device is invalid, row_count is NULL or row_count is not 0 while addresses is NULL.
HLMLError_InsufficientSize if row_count indicates that the addresses buffer is not large enough to store all the matching replaced rows. row_count will be set to the required size.
HLMLError_Unknown on any unexpected error.
hlmlDeviceGetReplacedRowsPendingStatus(device: hlml_t.HLML_DEVICE.TYPE) -> int¶
Operation:
Checks if any rows that are pending replacement require a reboot to be replaced.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
Pending status.
Raises:
HLMLError_InvalidArgument if device is invalid or is_pending is NULL.
HLMLError_Unknown on any unexpected error.
hlmlGetHLMLVersion() -> str¶
Operation:
Returns version of HLML.
Return Value:
HLML version.
Raises:
HLMLError_InvalidArgument if version is NULL.
HLMLError_InsufficientSize if length is too small.
hlmlDeviceGetOperationStatus(device: hlml_t.HLML_DEVICE.TYPE) -> str¶
Operation:
Retrieves AIP’s status.
Parameters:
Parameter |
Description |
---|---|
device |
The identifier of the target AIP. |
Return Value:
A Descriptive status of the target AIP. Available status description (case insensitive):
operational
in reset
disabled
need reset
in device creation
in reset after device release
Raises:
HLMLError_Uninitialized if the library has not been successfully initialized.
HLMLError_InvalidArgument if device is invalid or status is NULL.
HLMLError_Unknown on any unexpected error.
HLMLError_NotSupported if the device does not support this feature.