Virtualization
On this Page
Virtualization¶
You can now enable Intel® Gaudi® PCI passthrough on a virtual OS using a Linux host server. This document describes how to allocate Intel Gaudi AI accelerator for KVM Guests on Ubuntu 22.04 LTS.
PCI passthrough is the only virtualization mechanism supported by Gaudi accelerators. There is no support for SR-IOV or MIG. The smallest granularity possible is with a single HPU.
Configuring Gaudi in a VM Host Server¶
Verify that VM host server supports VT-D/IOMMU and SR-IOV technologies. Make sure they are enabled in the BIOS.
Enable IOMMU. Verify that it is included in the boot by running the following command:
cat /proc/cmdline
The below shows the expected output:
BOOT_IMAGE=/boot/vmlinuz-default [...] intel_iommu=on [...]
If not included, add the following line to
/etc/default/grub
:For Intel CPUs, add
GRUB_CMDLINE_LINUX="intel_iommu=on"
For AMD CPUs, add
GRUB_CMDLINE_LINUX="amd_iommu=on"
Isolate the Intel Gaudi PCI device for VFIO pass-through:
Get Intel Gaudi PCI devices
[vendor-ID: Device-ID]
by running the following command:lspci -nn -d 1da3:
Gaudi 3 example output:
3d:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01) 3e:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01) 4e:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01) 4f:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01) 97:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01) 98:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01) cb:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01) cc:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01)
Update the
GRUB_CMDLINE_LINUX_DEFAULT
and add the PCI device IDs with the vfio-pci.ids parameters by running the following command:sudo vi /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="iommu=1 intel_iommu=on iommu=pt vfio pci.ids=[1da3:1060] systemd.unified_cgroup_hierarchy=0 kvm.ignore_msrs=1"
sudo vi /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="iommu=1 intel_iommu=on iommu=pt vfio pci.ids=[1da3:1020] systemd.unified_cgroup_hierarchy=0 kvm.ignore_msrs=1"
Generate a new GRUB 2 configuration file:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
Reboot the system:
sudo systemctl reboot
Tip
After reboot, it is recommended to verify that IOMMU is loaded by running the following command:
sudo dmesg | grep -e IOMMU
To ensure the groups are valid, run the below script to see how your various PCI devices are mapped to IOMMU groups. If you do not receive any output, either IOMMU support has not been set and enabled properly or the KVM used does not support IOMMU.
#!/bin/bash
shopt -s nullglob
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
echo "IOMMU Group ${g##*/}:"
for d in $g/devices/*; do
echo -e "\t$(lspci -nns ${d##*/})"
done
done|grep -B 1 accelerators
Example:
./show_iommu_mapping.sh
IOMMU Group 62:
3d:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
IOMMU Group 63:
3e:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
--
IOMMU Group 78:
4e:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
IOMMU Group 79:
4f:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
--
IOMMU Group 137:
97:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
IOMMU Group 138:
98:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
--
IOMMU Group 161:
cb:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
IOMMU Group 162:
cc:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
Mapping Multi-Card Setup¶
To configure multiple devices in a virtualized environment, ensuring that each device is accurately mapped between the Physical Layer and the Virtual Routed Layer is essential. Properly mapping the bus numbers facilitates seamless communication between the physical devices and their virtual counterparts, ensuring optimal performance and device recognition within the virtual machine.
For example, if you have eight Gaudi accelerators, each device should have a unique bus assignment to prevent conflicts. If the first device is assigned 0x07, the subsequent devices could be assigned as follows based on the pattern of unique identifiers:
3d:00.0 - Bus 0x07 (Slot 0x00, Function 0x00)
3e:00.0 - Bus 0x08 (Slot 0x00, Function 0x00)
4e:00.0 - Bus 0x09 (Slot 0x00, Function 0x00)
4f:00.0 - Bus 0x0A (Slot 0x00, Function 0x00)
97:00.0 - Bus 0x0B (Slot 0x00, Function 0x00)
98:00.0 - Bus 0x0C (Slot 0x00, Function 0x00)
cb:00.0 - Bus 0x0D (Slot 0x00, Function 0x00)
cc:00.0 - Bus 0x0E (Slot 0x00, Function 0x00)
Assigning Gaudi Device to a VM Guest¶
Make sure to install libvirt before running the below: Libvirt Ubuntu Server docs.
Create a libvirt-based VM with UEFI support:
virsh edit VM-NAME.
Add the new Intel Gaudi PCI device IDs with the vfio-pci.ids parameter (created above) to the <devices> section.
<devices> <hostdev mode='subsystem' type='pci' managed='yes' model='vfio-pci'> <source> <address domain='0x0000' bus='0x3d' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes' model='vfio-pci'> <source> <address domain='0x0000' bus='0x3e' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x07' slot='0x01' function='0x0'/> </hostdev> <!-- Repeat for additional devices --> </devices>
Tip
If the emulated CPU in QEMU only supports 40 physical address bits by default (which is only enough to address 1TiB of memory), then a configuration of libvirt/QEMU to pass through the value supported by the host is required.
For a two TB host machine, this can be done with the following QEMU command line arguments:
-cpu host,host-phys-bits=on -global q35-pcihost.pci-hole64-size=2048G
Start VM.
Verifying Virtualization¶
To ensure that virtualization is enabled on your system, you can perform the following checks:
Using virt-host-validate:
Run the command
virt-host-validate
. This will check various virtualization aspects of your host system and provide output indicating any issues or confirmations regarding virtualization capabilities.All should indicate “PASS”.
“WARN on QEMU: Checking for secure guest support” can be ignored.
QEMU: Checking for hardware virtualization : PASS QEMU: Checking if device /dev/kvm exists : PASS QEMU: Checking if device /dev/kvm is accessible : PASS QEMU: Checking if device /dev/vhost-net exists : PASS QEMU: Checking if device /dev/net/tun exists : PASS QEMU: Checking for cgroup 'cpu' controller support : PASS QEMU: Checking for cgroup 'cpuacct' controller support : PASS QEMU: Checking for cgroup 'cpuset' controller support : PASS QEMU: Checking for cgroup 'memory' controller support : PASS QEMU: Checking for cgroup 'devices' controller support : PASS QEMU: Checking for cgroup 'blkio' controller support : PASS QEMU: Checking for device assignment IOMMU : PASS QEMU: Checking for secure guest support : WARN (Unknown if this platform has Secure Guest support) LXC: Checking for Linux >= 2.6. : PASS LXC: Checking for namespace ipc : PASS LXC: Checking for namespace mnt : PASS LXC: Checking for namespace pid : PASS LXC: Checking for namespace uts : PASS LXC: Checking for namespace net : PASS LXC: Checking for namespace user : PASS LXC: Checking for cgroup 'cpu' controller support : PASS LXC: Checking for cgroup 'cpuacct' controller support : PASS LXC: Checking for cgroup 'cpuset' controller support : PASS LXC: Checking for cgroup 'memory' controller support : PASS LXC: Checking for cgroup 'devices' controller support : PASS LXC: Checking for cgroup 'freezer' controller support : PASS LXC: Checking for cgroup 'blkio' controller support : PASS LXC: Checking if device /sys/fs/fuse/connections exists : PASS
Using
kvm-ok
:Install the cpu-checker package by running
sudo apt install cpu-checker
.Run
kvm-ok
.
The output will confirm whether your processor supports KVM virtualization and if it is enabled. By performing these checks, you can verify that your operating system is configured to utilize KVM virtualization effectively.
INFO: /dev/kvm exists KVM acceleration can be used
Configuring Gaudi Device in VM Guest¶
To install Intel Gaudi driver packages, see Installation Guide and On-Premise System Update.