Virtualization
On this Page
Virtualization¶
You can now enable Intel® Gaudi® PCI passthrough on a virtual OS using a Linux host server. This document describes how to allocate Intel Gaudi AI accelerator for KVM Guests on Ubuntu 22.04 LTS.
PCI passthrough is the only virtualization mechanism supported by Gaudi accelerators. There is no support for SR-IOV or MIG. The smallest granularity possible is with a single HPU.
Configuring Gaudi in a VM Host Server¶
Verify that VM host server supports VT-D/IOMMU and SR-IOV technologies. Make sure they are enabled in the BIOS.
Enable IOMMU. Verify that it is included in the boot by running the following command:
The below shows the expected output:
If not included, add the following line to
/etc/default/grub
:For Intel CPUs, add
GRUB_CMDLINE_LINUX="intel_iommu=on"
For AMD CPUs, add
GRUB_CMDLINE_LINUX="amd_iommu=on"
Isolate the Intel Gaudi PCI device for VFIO pass-through:
Get Intel Gaudi PCI devices
[vendor-ID: Device-ID]
by running the following command:Example output:
4d:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) 4e:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) 50:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) 51:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) b3:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) b4:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) b5:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) b6:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01)
Update the
GRUB_CMDLINE_LINUX_DEFAULT
and add the PCI device IDs with the vfio-pci.ids parameters by running the following command:
Generate a new GRUB 2 configuration file:
Reboot the system:
Tip
After reboot, it is recommended to verify that IOMMU is loaded by running the following command:
To ensure the groups are valid, run the below script to see how your various PCI devices are mapped to IOMMU groups. If you do not receive any output, either IOMMU support has not been set and enabled properly or the KVM used does not support IOMMU.
Example output:
Assigning Gaudi Device to a VM Guest¶
Create a libvirt-based VM with UEFI support:
Add the new HL PCI device IDs with the vfio-pci.ids parameter (created above) to the <devices> section.
Tip
If the emulated CPU in QEMU only supports 40 physical address bits by default (which is only enough to address 1TiB of memory), then a configuration of libvirt/QEMU to pass through the value supported by the host is required.
For a two TB host machine, this can be done with the following QEMU command line arguments:
Start VM.
Configuring Gaudi Device in VM Guest¶
To install Intel Gaudi driver packages, see driver_fw_install_bare.