Profiling Tips and Tricks
Profiling Tips and Tricks¶
This section explains the most common options to create a configuration file in various modes for data collection purposes.
Configure data collection per enqueue:
hl-prof-config --gaudi --buffer-size 256 --trace-analyzer on --trace-analyzer-csv on --phase enq -e off --invoc json --merged hltv
This command generates a configuration file for profiling per enqueue mode.
It enables dump for several files: JSON per enqueue (each enqueue is numbered), and one large hltv file
that contains all enqueues (not numbers).
In this mode, trace
is collected after every enqueue. The enqueues are executed in a serial mode
where each enqueue is executed once the previous one ends. Since the trace buffer is limited in size,
for initial trace collection, it is recommended to check the desired enqueues rather than collecting
specific enqueues in a multi-enq mode. (See the below note)
Configure data collection in a set of enqueues:
hl-prof-config --gaudi --buffer-size 256 --trace-analyzer on --trace-analyzer-csv on --phase multi-enq --invocations-range 1-100 -e off --merged hltv
This command generates a configuration file for profiling in a multi enqueue mode. It also enables a dump for a CSV and JSON files for specified enqueues. Set enqueues 10 to 13. This command generates trace analyzer data to provide further understanding of each executed node. (See the below note)
Note
The above two commands generate trace analyzer data to provide further understanding of each executed node.
Data collection per step (iteration) for TensorFlow:
python3 tensorflowApp.py --profile <#,#>
- where first number and second number define the range of steps to profile.
For example: python3 tensorflowApp.py --profile 4,6
Note
By default, TensorFlow data collection does not include NIC
profiling.
To add NIC
, create a configuration file based on template profile_api_light
, save the
new configuration file to a location in your disk and use:
HABANA_PROF_CONFIG=<path to new configuration file> python3 tensorflowApp.py --profile 4,6
.