Profiling Tips and Tricks

This section explains the most common options to create a configuration file in various modes for data collection purposes.

  • Configure data collection per enqueue:

hl-prof-config --gaudi --buffer-size 256 --trace-analyzer on  --trace-analyzer-csv on --phase enq -e off --invoc json --merged hltv

This command generates a configuration file for profiling per enqueue mode. It enables dump for several files: JSON per enqueue (each enqueue is numbered), and one large hltv file that contains all enqueues (not numbers). In this mode, trace is collected after every enqueue. The enqueues are executed in a serial mode where each enqueue is executed once the previous one ends. Since the trace buffer is limited in size, for initial trace collection, it is recommended to check the desired enqueues rather than collecting specific enqueues in a multi-enq mode. (See the below note)

  • Configure data collection in a set of enqueues:

hl-prof-config --gaudi --buffer-size 256 --trace-analyzer on  --trace-analyzer-csv on --phase multi-enq --invocations-range 1-100 -e off --merged hltv

This command generates a configuration file for profiling in a multi enqueue mode. It also enables a dump for a CSV and JSON files for specified enqueues. Set enqueues 10 to 13. This command generates trace analyzer data to provide further understanding of each executed node. (See the below note)

Note

The above two commands generate trace analyzer data to provide further understanding of each executed node.

  • Data collection per step (iteration) for TensorFlow:

    • python3 tensorflowApp.py --profile <#,#> - where first number and second number define the range of steps to profile.

For example: python3 tensorflowApp.py --profile 4,6

Note

By default, TensorFlow data collection does not include NIC profiling. To add NIC, create a configuration file based on template profile_api_light, save the new configuration file to a location in your disk and use: HABANA_PROF_CONFIG=<path to new configuration file> python3 tensorflowApp.py --profile 4,6.