Hardware and Network Requirements

The following sections provide steps to get your physical system ready for loading the operating system and installing the driver and software.

System Unboxing

Prepare your system by ensuring the following:

  • Power supply and thermal requirements are met as per the system vendor recommendations. For the Gaudi 3 system guidelines, refer to the following documents:

    • “Power” section of the Gaudi 3 OAM Specification and HLB-325 Specification available in the Intel Gaudi vault.

    • “Temperature Management” section of the Gaudi 3 OAM Specification available in the Intel Gaudi vault.

  • Server is racked and stacked. Refer to your system vendor instructions.

  • Network cables are connected for the following:

    • One QSFP port to enable OS access to network fabric.

    • BMC ports to enable BMC connection. Refer to the BMC Access section below.

    • (Optional) Six OSFP ports to enable access to accelerator fabric and configure a multi-node environment.

  • BMC is configured using BIOS/EFI screens and can be accessed remotely via the network. Refer to your BMC system vendor instructions.

  • Power supply and thermal requirements are met as per the system vendor recommendations. For the Gaudi 2 system guidelines, refer to the following documents:

    • “Power” section of the Gaudi 2 OAM Specification and HLB/HLBA-225 Specification available in the Intel Gaudi vault.

    • “Temperature Management” section of the Gaudi 2 OAM Specification available in the Intel Gaudi vault.

  • Server is racked and stacked. Refer to your system vendor instructions.

  • Network cables are connected for the following:

    • One QSFP port to enable OS access to network fabric.

    • BMC ports to enable BMC connection. Refer to the BMC Access section below.

    • (Optional) Six QSFP-DD ports to enable access to accelerator fabric and configure a multi-node environment.

  • BMC is configured using BIOS/EFI screens and can be accessed remotely via the network. Refer to your BMC system vendor instructions.

BMC Access

Follow your vendors system instructions to set the BMC IP address via static or dynamic IP assignment:

  • HLS-2 and HLS-3 based systems:

    • These systems have two independent BMCs, one to access the CPU BMC server and one to access the Gaudi 2 and Gaudi 3 expansion server.

    • They are accessed via two independent BMC Ethernet RJ45 physical connections each needing their own IP addresses; one for the CPU BMC server and one for the Gaudi 2 or Gaudi 3 expansion server.

  • Gaudi 2 or Gaudi 3 Single Integrated Server Solution:

    • Dell Gaudi 3 Server and Supermicro Gaudi 2 and Gaudi 3 servers have a single BMC and only one physical RJ45 port with one IP address to manage a single integrated server with combined CPU tray and Gaudi 2 or Gaudi 3 UBB tray.

Once the BMC connectivity is set, verify its functionality by accessing http://bmc_ip/.

Operating System Readiness

  1. Install OS on the bare metal server by using a USB stick or network boot method. Make sure to review the currently supported versions and operating systems listed in the Support Matrix.

  2. Ensure that Internet connection is configured on the server. Set proxy variables globally if required.

Note

The Gaudi LEDs remain orange after OS installation. They turn green once the driver is installed.

Check Gaudi on the Platform

Check if all eight Gaudi cards are visible on the system by running the lspci command below:

$ lspci -d 1da3: -nn
19:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1020] (rev 01)
1a:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1020] (rev 01)
43:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1020] (rev 01)
44:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1020] (rev 01)
b3:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1020] (rev 01)
b4:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1020] (rev 01)
cc:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1020] (rev 01)
cd:00.0 Processing accelerators [1200]: Habana Labs Ltd. Device [1da3:1020] (rev 01)