Virtualization

You can now enable Intel® Gaudi® PCI passthrough on a virtual OS using a Linux host server. This document describes how to allocate Intel Gaudi AI accelerator for KVM Guests on Ubuntu 22.04 LTS.

PCI passthrough is the only virtualization mechanism supported by Gaudi accelerators. There is no support for SR-IOV or MIG. The smallest granularity possible is with a single HPU.

Configuring Gaudi in a VM Host Server

  1. Verify that VM host server supports VT-D/IOMMU and SR-IOV technologies. Make sure they are enabled in the BIOS.

  2. Enable IOMMU. Verify that it is included in the boot by running the following command:

    cat /proc/cmdline
    

    The below shows the expected output:

    BOOT_IMAGE=/boot/vmlinuz-default [...] intel_iommu=on [...]
    
  3. If not included, add the following line to /etc/default/grub:

    • For Intel CPUs, add GRUB_CMDLINE_LINUX="intel_iommu=on"

    • For AMD CPUs, add GRUB_CMDLINE_LINUX="amd_iommu=on"

  4. Isolate the Intel Gaudi PCI device for VFIO pass-through:

    1. Get Intel Gaudi PCI devices [vendor-ID: Device-ID] by running the following command:

      lspci -nn -d 1da3:
      

      Gaudi 3 example output:

      3d:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01)
      3e:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01)
      4e:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01)
      4f:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01)
      97:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01)
      98:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01)
      cb:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01)
      cc:00.0 Processing accelerators: Habana Labs Ltd. Device 1060 (rev 01)
      
    2. Update the GRUB_CMDLINE_LINUX_DEFAULT and add the PCI device IDs with the vfio-pci.ids parameters by running the following command:

      sudo vi /etc/default/grub
      GRUB_CMDLINE_LINUX_DEFAULT="iommu=1 intel_iommu=on iommu=pt vfio pci.ids=[1da3:1060] systemd.unified_cgroup_hierarchy=0 kvm.ignore_msrs=1"
      
      sudo vi /etc/default/grub
      GRUB_CMDLINE_LINUX_DEFAULT="iommu=1 intel_iommu=on iommu=pt vfio pci.ids=[1da3:1020] systemd.unified_cgroup_hierarchy=0 kvm.ignore_msrs=1"
      
  5. Generate a new GRUB 2 configuration file:

    sudo grub2-mkconfig -o /boot/grub2/grub.cfg
    
  6. Reboot the system:

    sudo systemctl reboot
    

Tip

After reboot, it is recommended to verify that IOMMU is loaded by running the following command:

sudo dmesg | grep -e IOMMU

To ensure the groups are valid, run the below script to see how your various PCI devices are mapped to IOMMU groups. If you do not receive any output, either IOMMU support has not been set and enabled properly or the KVM used does not support IOMMU.

#!/bin/bash
shopt -s nullglob
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done
done|grep -B 1 accelerators

Example:

./show_iommu_mapping.sh

IOMMU Group 62:
    3d:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
IOMMU Group 63:
    3e:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
--
IOMMU Group 78:
    4e:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
IOMMU Group 79:
    4f:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
--
IOMMU Group 137:
    97:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
IOMMU Group 138:
    98:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
--
IOMMU Group 161:
    cb:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)
IOMMU Group 162:
    cc:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1060] (rev 01)

Mapping Multi-Card Setup

To configure multiple devices in a virtualized environment, ensuring that each device is accurately mapped between the Physical Layer and the Virtual Routed Layer is essential. Properly mapping the bus numbers facilitates seamless communication between the physical devices and their virtual counterparts, ensuring optimal performance and device recognition within the virtual machine.

For example, if you have eight Gaudi accelerators, each device should have a unique bus assignment to prevent conflicts. If the first device is assigned 0x07, the subsequent devices could be assigned as follows based on the pattern of unique identifiers:

3d:00.0 - Bus 0x07 (Slot 0x00, Function 0x00)
3e:00.0 - Bus 0x08 (Slot 0x00, Function 0x00)
4e:00.0 - Bus 0x09 (Slot 0x00, Function 0x00)
4f:00.0 - Bus 0x0A (Slot 0x00, Function 0x00)
97:00.0 - Bus 0x0B (Slot 0x00, Function 0x00)
98:00.0 - Bus 0x0C (Slot 0x00, Function 0x00)
cb:00.0 - Bus 0x0D (Slot 0x00, Function 0x00)
cc:00.0 - Bus 0x0E (Slot 0x00, Function 0x00)

Assigning Gaudi Device to a VM Guest

Make sure to install libvirt before running the below: Libvirt Ubuntu Server docs.

  1. Create a libvirt-based VM with UEFI support:

    virsh edit VM-NAME.
    
  2. Add the new Intel Gaudi PCI device IDs with the vfio-pci.ids parameter (created above) to the <devices> section.

     <devices>
    
        <hostdev mode='subsystem' type='pci' managed='yes' model='vfio-pci'>
            <source>
               <address domain='0x0000' bus='0x3d' slot='0x00' function='0x0'/>
            </source>
            <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
        </hostdev>
    
        <hostdev mode='subsystem' type='pci' managed='yes' model='vfio-pci'>
            <source>
               <address domain='0x0000' bus='0x3e' slot='0x00' function='0x0'/>
            </source>
            <address type='pci' domain='0x0000' bus='0x07' slot='0x01' function='0x0'/>
        </hostdev>
    
        <!-- Repeat for additional devices -->
    
    </devices>
    

Tip

If the emulated CPU in QEMU only supports 40 physical address bits by default (which is only enough to address 1TiB of memory), then a configuration of libvirt/QEMU to pass through the value supported by the host is required.

For a two TB host machine, this can be done with the following QEMU command line arguments:

-cpu host,host-phys-bits=on -global q35-pcihost.pci-hole64-size=2048G
  1. Start VM.

Verifying Virtualization

To ensure that virtualization is enabled on your system, you can perform the following checks:

  • Using virt-host-validate:

    • Run the command virt-host-validate. This will check various virtualization aspects of your host system and provide output indicating any issues or confirmations regarding virtualization capabilities.

      • All should indicate “PASS”.

      • “WARN on QEMU: Checking for secure guest support” can be ignored.

    QEMU: Checking for hardware virtualization          : PASS
    QEMU: Checking if device /dev/kvm exists                : PASS
    QEMU: Checking if device /dev/kvm is accessible         : PASS
    QEMU: Checking if device /dev/vhost-net exists          : PASS
    QEMU: Checking if device /dev/net/tun exists            : PASS
    QEMU: Checking for cgroup 'cpu' controller support      : PASS
    QEMU: Checking for cgroup 'cpuacct' controller support  : PASS
    QEMU: Checking for cgroup 'cpuset' controller support   : PASS
    QEMU: Checking for cgroup 'memory' controller support   : PASS
    QEMU: Checking for cgroup 'devices' controller support  : PASS
    QEMU: Checking for cgroup 'blkio' controller support        : PASS
    QEMU: Checking for device assignment IOMMU          : PASS
    QEMU: Checking for secure guest support             : WARN (Unknown if this platform has Secure Guest support)
    LXC: Checking for Linux >= 2.6.                 : PASS
    LXC: Checking for namespace ipc                 : PASS
    LXC: Checking for namespace mnt                 : PASS
    LXC: Checking for namespace pid                     : PASS
    LXC: Checking for namespace uts                     : PASS
    LXC: Checking for namespace net                 : PASS
    LXC: Checking for namespace user                    : PASS
    LXC: Checking for cgroup 'cpu' controller support       : PASS
    LXC: Checking for cgroup 'cpuacct' controller support   : PASS
    LXC: Checking for cgroup 'cpuset' controller support    : PASS
    LXC: Checking for cgroup 'memory' controller support    : PASS
    LXC: Checking for cgroup 'devices' controller support   : PASS
    LXC: Checking for cgroup 'freezer' controller support   : PASS
    LXC: Checking for cgroup 'blkio' controller support     : PASS
    LXC: Checking if device /sys/fs/fuse/connections exists : PASS
    
  • Using kvm-ok:

    • Install the cpu-checker package by running sudo apt install cpu-checker.

    • Run kvm-ok.

    The output will confirm whether your processor supports KVM virtualization and if it is enabled. By performing these checks, you can verify that your operating system is configured to utilize KVM virtualization effectively.

    INFO: /dev/kvm exists
    KVM acceleration can be used
    

Configuring Gaudi Device in VM Guest

To install Intel Gaudi driver packages, see Installation Guide and On-Premise System Update.