Profiling Architecture

This section describes the control flow for Profiling. The communication diagram below introduces the main components and illustrates the process of data collection, post-processing, and publishing. The diagram itself shall be understood in terms of top-down abstraction (high-level to low-level). The labels indicate the sequence of actions and activities.

  1. Offline, before the training, configuration file for focused profiling can be created. Typically, the default configuration covers most use cases.

  2. Also, the TensorBoard server can be started prior to training. It is a simple web server which formats and visualizes data in the log files generated by TensorFlow.

  3. Once the user initiates the training execution, TensorFlow (TF) runs standard initial steps such as generating computational graphs for forward and backward propagation. Then, TF starts the training, step by step (batch by batch). Prior to every batch execution, TF checks whether it is the first batch to profile. If it is, TF calls an API on SynapseAI to start recording of the Gaudi profile data.

  4. SynapseAI reads the configuration file for setting up specific profiling instructions to execute on the Gaudi accelerator and retrieves and collects profiling data. It should be noted that a similar process of profiling data collection is executed on the Host/CPU.

  5. SynapseAI collects profiling data from Gaudi.

  6. TF keeps checking whether to profile the next batch. If it was the last one to profile, TF sends a request to SynapseAI to stop profiling. SynapseAI will push the profile data collected during the entire sequence of steps back to TF.

  7. TF collects profiling data from the Host/CPU.

  8. TF runs some analytics on the collected data and prepares formatted logs.

  9. TensorBoard server is polling the logs and generating HTML pages for the browser.

  10. The browser displays the profiles data.