Using Gaudi Trained Checkpoints on Xeon¶

It is common to train deep learning models on GPUs and Intel® Xeon® Scalable processors, as well as to run pre-trained or fine-tuned models for inference on either GPUs or CPUs. Intel AI Engines provide a performance boost for the entire AI-pipeline with Intel AMX. This document provides guidance on using Intel® Gaudi® AI accelerator platform architecture trained checkpoints on Xeon with PyTorch using the following examples:

Simple example of MNIST
BERT Large from the Intel Gaudi Model References GitHub repository
Hugging Face sourced Image Classification
OpenVINO example

Note

A previous article addresses the writing of training scripts capable of running on Gaudi, GPU, or CPU.

Using Gaudi Trained Checkpoints on CPU¶

You can directly load Gaudi-trained (HPU) checkpoints on Xeon and other CPU systems which do not have HPUs or the Intel Gaudi software stack installed.

MNIST Example Using HPU Checkpoints¶

The MNIST example is available here. This example demonstrates how to save the model on Gaudi and then load the model on Xeon (CPU):

Save a checkpoint of the model running on one Gaudi device. This generates a mnist_cnn.pt file:
PT_HPU_LAZY_MODE=1 python mnist.py --hpu --save-model --epochs 1
Prepare mnist-cpu.py script that runs the MNIST example on CPU device by running the following code:
sed 's/^import habana_frameworks/#import habana_frameworks/g; s/\(.*\)htcore\./\1pass;#htcore./g' mnist.py > mnist-cpu.py
This script enables loading of the saved HPU-trained model file on a Xeon CPU system which does not have the Intel Gaudi software stack installed. Some key changes applied to mnist.py to generate mnist-cpu.py are as follows:
Commented out lines that import habana_frameworks.

Replaced calls to htcore.mark_step() with a pass command.
Load the HPU-trained checkpoint on CPU and run the model using mnist-cpu.py :
python mnist-cpu.py --checkpoint mnist_cnn.pt --epochs 2

BERT Large Model from Model References¶

The BERT Large pre-trained model is generated using the instructions described in BERT for PyTorch. The checkpoint file name depends on the number of steps and the --max_steps setting used in phase 1 and phase 2 of the training.

Running Inference on Xeon CPU¶

To run inference with the generated model, remove the --use_habana option from the command line. Other options of the run_squad.py call to run inference using the SQUAD dataset can be left as is. For example, on a system that has the Intel Gaudi software stack installed, set variables path_to_checkpoint, path_to_vocab, and path_to_eval_script and run:

python run_squad.py \
    --bert_model=bert-large-uncased \
    --autocast  \
    --config_file=./bert_config.json \
    --do_lower_case \
    --output_dir=/output/checkpoints/bert/inference \
    --json-summary=/tmp/log_directory/dllogger.json  \     --predict_batch_size=24  \
    --init_checkpoint=$path_to_checkpoint \
    --vocab_file=$path_to_vocab  \
    --do_predict  \
    --predict_file=/output/bert/v1.1/dev-v1.1.json \
    --do_eval \
    --eval_script=$path_to_eval_script

To run this model on a CPU system, refer to the guidelines provided in the previous section.

Hugging Face Sourced Image Classification Model¶

To run training on Gaudi and a system configured with the Intel Gaudi software stack, use the below command. This creates the checkpoint file under /tmp/outputs/:

python run_image_classification.py \
    --model_name_or_path google/vit-base-patch16-224-in21k \
    --dataset_name cifar10 \
    --output_dir /tmp/outputs/ \
    --remove_unused_columns False \
    --do_train \
    --do_eval \
    --learning_rate 3e-5 \
    --num_train_epochs 5 \
    --per_device_train_batch_size 64 \
    --per_device_eval_batch_size 64 \
    --evaluation_strategy epoch \
    --save_strategy epoch \
    --load_best_model_at_end True \
    --save_total_limit 3 \
    --seed 1337 \
    --use_habana \
    --use_lazy_mode \
    --use_hpu_graphs_for_inference \
    --gaudi_config_name Habana/vit \
    --throughput_warmup_steps 3 \
    --dataloader_num_workers 1

To run inference on Xeon, use the following command:

python3 run_image_classification.py  \
    --model_name_or_path /tmp/outputs/   \
    --dataset_name cifar10  \
    --output_dir /tmp/outputs/  \
    --remove_unused_columns False  \
    --do_eval \
    --per_device_eval_batch_size 64 \
    --dataloader_num_workers 1

Use with OpenVINO¶

To run inference on a Gaudi PyTorch model using OpenVINO runtime, refer to the OpenVINO tutorial document - Convert a PyTorch Model to OpenVINO Intermediate Representation (IR).

Load the Gaudi PyTorch model:
1. Create an instance of a model class.
2. Load checkpoint state dictionary, which contains pre-trained model weights.
3. Turn the model to evaluation for switching some operations to inference mode.
Convert the PyTorch model to OpenVINO intermediate representation (IR):
1. Calls openvino.convert_model to convert a PyTorch model object to openvino.Model instance.
2. Calls openvino.Core.compile_model to load on a device.
3. Calls openvino.save_model to save on disk for next usage.

Gaudi Documentation 1.20.1 documentation

Using Gaudi Trained Checkpoints on Xeon

On this Page

Using Gaudi Trained Checkpoints on Xeon¶

Using Gaudi Trained Checkpoints on CPU¶

MNIST Example Using HPU Checkpoints¶

BERT Large Model from Model References¶

Running Inference on Xeon CPU¶

Hugging Face Sourced Image Classification Model¶

Use with OpenVINO¶