Virtualization
On this Page
Virtualization¶
You can now enable Intel® Gaudi® PCI passthrough on a virtual OS using a Linux host server. This document describes how to allocate Intel Gaudi AI accelerator for KVM Guests on Ubuntu 22.04 LTS.
PCI passthrough is the only virtualization mechanism supported by Gaudi accelerators. There is no support for SR-IOV or MIG. The smallest granularity possible is with a single HPU.
Configuring Gaudi in a VM Host Server¶
Verify that VM host server supports VT-D/IOMMU and SR-IOV technologies. Make sure they are enabled in the BIOS.
Enable IOMMU. Verify that it is included in the boot by running the following command:
cat /proc/cmdline
The below shows the expected output:
BOOT_IMAGE=/boot/vmlinuz-default [...] intel_iommu=on [...]
If not included, add the following line to
/etc/default/grub
:For Intel CPUs, add
GRUB_CMDLINE_LINUX="intel_iommu=on"
For AMD CPUs, add
GRUB_CMDLINE_LINUX="amd_iommu=on"
Isolate the Intel Gaudi PCI device for VFIO pass-through:
Get Intel Gaudi PCI devices
[vendor-ID: Device-ID]
by running the following command:lspci -nn -d 1da3:
Example output:
4d:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) 4e:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) 50:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) 51:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) b3:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) b4:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) b5:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01) b6:00.0 Processing accelerators [1200]: Habana Labs Ltd. HL-2000 AI Training Accelerator [Gaudi] `[1da3:1000]` (rev 01)
Update the
GRUB_CMDLINE_LINUX_DEFAULT
and add the PCI device IDs with the vfio-pci.ids parameters by running the following command:sudo vi /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on vfio pci.ids=[vendor-ID: Device-ID]"
Generate a new GRUB 2 configuration file:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
Reboot the system:
sudo systemctl reboot
Tip
After reboot, it is recommended to verify that IOMMU is loaded by running the following command:
sudo dmesg | grep -e IOMMU
To ensure the groups are valid, run the below script to see how your various PCI devices are mapped to IOMMU groups. If you do not receive any output, either IOMMU support has not been set and enabled properly or the KVM used does not support IOMMU.
#!/bin/bash
shopt -s nullglob
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
echo "IOMMU Group ${g##*/}:"
for d in $g/devices/*; do
echo -e "\t$(lspci -nns ${d##*/})"
done
done|grep -B 1 accelerators
Example output:
Assigning Gaudi Device to a VM Guest¶
Create a libvirt-based VM with UEFI support:
virsh edit VM-NAME.
Add the new HL PCI device IDs with the vfio-pci.ids parameter (created above) to the <devices> section.
<devices> ... <hostdev mode='subsystem' type=pci managed=yes model='vfio-pci'> <source> <address domain='0x0000' bus='0xb3' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </hostdev> ... </devices>
Tip
If the emulated CPU in QEMU only supports 40 physical address bits by default (which is only enough to address 1TiB of memory), then a configuration of libvirt/QEMU to pass through the value supported by the host is required.
For a two TB host machine, this can be done with the following QEMU command line arguments:
-cpu host,host-phys-bits=on -global q35-pcihost.pci-hole64-size=2048G
Start VM.
Configuring Gaudi Device in VM Guest¶
To install Intel Gaudi driver packages, see Installation Guide and On-Premise System Update.