# Debugging Possible Model Errors¶

## Generate Logs¶

If you encounter problems while training a model on Gaudi, it is frequently useful to generate and inspect your log files. By inspecting log files, you can pinpoint where a model failure is occurring, and alter your model or training script to resolve or work around defects.

The generation of logging information and the location of logged information is controlled by environment variables. For example, if you set the following environment variables before training your model, a large amount of information will be generated under ~/.habana_logs/:

$export HABANA_LOGS=~/.habana_logs$ export LOG_LEVEL_ALL=0
$# Train your model as usual  The below details the various environment variables and the description of their values. ### Location of Log Files¶ ENABLE_CONSOLE=true outputs the logs to the console. If ENABLE_CONSOLE is not set at all or not set to true, logs are output in the directory specified by HABANA_LOGS. For example, if you set the following environment variables, all SynapseAI errors will be logged to the console: $ export ENABLE_CONSOLE=true
$export LOG_LEVEL_ALL=4$ # Train your model as usual


### Log Levels¶

 0 Trace Log everything including traces of progress 1 Debug Log all errors, warnings and all information useful for debugging 2 Info Log errors, warnings and some informative messages 3 Warning Log all errors and warnings 4 Error Log all errors 5 Critical Log only critical errors 6 Off Log nothing

### Component-Level Logs¶

The value of LOG_LEVEL_ALL=[log level] sets the logging level for all components. However, it is sometimes useful to view detailed information for a single component.

To specify the log level for a particular component, append the name of the component to LOG_LEVEL_.

For example, if you set the following environment variable, all components will log only critical errors (set with LOG_LEVEL_ALL=5) except for the Synapse API (set with LOG_LEVEL_SYN_API=3), which will log all errors and warnings:

$export HABANA_LOGS=~/.habana_logs$ export LOG_LEVEL_ALL=5
$export LOG_LEVEL_SYN_API=3$ # Train your model as usual


### Names of Components that Produce Logs¶

 The Synapse API SYN_API The profiling subsystem SYN_PROF, PROF_hl[0-7] and HLPROF The graph compiler PARSER, GC, and GRAPH_DATA The Habana performance library PERF_LIB The Habana Communication Library HCL and HCL_SUBMISSIONS

## Generate PyTorch Logs¶

You can set the following environment variables to obtain PyTorch Habana Bridge level logs:

$export PT_HPU_LOG_MOD_MASK=xxxx$ export PT_HPU_LOG_TYPE_MASK=yyyy


Please refer to the Runtime Flags section for a description of the above environment variables.

## Error Codes¶

When making calls directly to the SynapseAI API, it is useful to check the return codes against the following symbolic or integer values to understand the outcome of the operation.

 synSuccess 0 The operation succeeded synInvalidArgument 1 An argument was invalid synCbFull 2 The command buffer is full synOutOfHostMemory 3 Out of host memory synOutOfDeviceMemory 4 Out of device memory synObjectAlreadyInitialized 5 The object being initialized is already initialized synObjectNotInitialized 6 The object must be initialized before the operation can be performed synCommandSubmissionFailure 7 The command buffer could not be submitted synNoDeviceFound 8 No Habana device was found synDeviceTypeMismatch 9 The operation is for the wrong device type synFailedToInitializeCb 10 The command buffer failed to initialize synFailedToFreeCb 11 The command buffer could not be freed synFailedToMapCb 12 The command buffer could not be mapped synFailedToUnmapCb 13 The command buffer could not be unmapped synFailedToAllocateDeviceMemory 14 Device memory could not be allocated synFailedToFreeDeviceMemory 15 Device memory could not be freed synFailedNotEnoughDevicesFound 16 A free device could not be found synDeviceReset 17 The operation failed because the device is being reset synUnsupported 18 The requested operation is not supported synWrongParamsFile 19 While loading a recipe, the binary parameters file failed to load synDeviceAlreadyAcquired 20 The referenced device is already occupied synNameIsAlreadyUsed 21 A tensor with the same name has already been created synBusy 22 The operation failed to complete within the timeout period synAllResourcesTaken 23 The event could not be created due to lack of resources synUnavailable 24 The time an event finished could not be retrieved synInvalidTensorDimensions 25 High-rank tensor is attached to a node that does not support it synFail 26 The operation failed synOutOfResources 27 The operation failed due to lack of SynapseAI memory synUninitialized 28 SynapseAI Library was not initialized before accessing it synAlreadyInitialized 29 SynapseAI initialize failed because it was already initialized synFailedSectionValidation 30 The Launch operation failed because of section mismatch synSynapseTerminated 31 SynapseAI cannot process the operation since it is going to be terminated