logo

Gaudi Documentation 1.20.1 documentation

  • Welcome to Intel® Gaudi® v1.20 Documentation

Getting Started

  • Gaudi Architecture and Software Overview
    • Gaudi Architecture
    • Intel Gaudi Software Suite
  • Support Matrix
  • Release Notes
  • Installation
    • Hardware and Network Requirements
    • Driver and Software Installation
    • Firmware Upgrade
    • Additional Installation
      • Bare Metal Installation
      • Docker Installation
      • Kubernetes Installation
        • Intel Gaudi Base Operator for Kubernetes
        • Intel Gaudi Device Plugin for Kubernetes
      • OpenShift Installation
    • System Verifications and Final Tests
  • Quick Start Guides
    • Intel Tiber AI Cloud Quick Start Guide
    • IBM Cloud Quick Start Guide
    • AWS DL1 Quick Start Guide
    • Running Workloads on Bare Metal
    • Running Workloads on Docker
    • Running Workloads on Kubernetes

PyTorch

  • Training
    • Getting Started with Training on Intel Gaudi
    • PyTorch Model Porting
      • GPU Migration Toolkit
      • Importing PyTorch Models Manually
    • Mixed Precision Training with PyTorch Autocast
    • Intel Gaudi Media Loader
    • FP8 Training with Intel Gaudi Transformer Engine
    • Distributed Training with PyTorch
      • Scale-out Topology
      • Distributed Backend Initialization
      • Gaudi-to-process Assignment
      • DDP-based Scaling of Gaudi on PyTorch
      • Theory of Distributed Training
    • Using Fully Sharded Data Parallel (FSDP) with Intel Gaudi
    • Using DistributedTensor with Intel Gaudi
  • Inference
    • Getting Started with Inference on Intel Gaudi
    • AI Model Serving with Intel Gaudi
    • Run Inference Using HPU Graphs
    • Inference with Quantization
      • Run Inference Using FP8
      • Run Inference Using UINT4
    • Optimize Inference on PyTorch
    • Using Gaudi Trained Checkpoints on Xeon
    • vLLM Inference Server with Intel Gaudi
      • Getting Started with vLLM
      • FP8 Calibration and Inference with vLLM
      • vLLM with Intel Gaudi FAQs
    • Triton Inference Server with Gaudi
    • TorchServe Inference Server with Gaudi
  • DeepSpeed
    • Getting Started with DeepSpeed
    • DeepSpeed Training
    • Optimizing Large Language Models
    • Inference Using DeepSpeed
  • Optimization
    • Model Optimization Checklist
    • Optimizations of PyTorch Models
    • Inference Optimizations
    • Handling Dynamic Shapes
    • Fused Optimizers and Custom Ops for Intel Gaudi
    • HPU Graphs for Training
    • Optimizing Training Platform
  • Reference
    • Debugging and Troubleshooting
      • Debugging with Intel Gaudi Logs
      • Debugging Model Divergence
      • Debugging Slow Convergence
      • Troubleshooting PyTorch Model
    • Runtime Environment Variables
    • Intel Gaudi PyTorch Python API (habana_frameworks.torch)
    • PyTorch Operators
    • PyTorch CustomOp API
    • PyTorch Support Matrix
    • PyTorch Gaudi Theory of Operations
  • Hugging Face Optimum for Intel Gaudi
  • PyTorch Lightning

Guides

  • MediaPipe
    • Creating and Executing Media Pipeline
    • MediaPipe for PyTorch ResNet
    • MediaPipe for PyTorch ResNet3d
    • Operators
      • fn.Add
      • fn.BasicCrop
      • fn.BitwiseAnd
      • fn.BitwiseOr
      • fn.BitwiseXor
      • fn.Brightness
      • fn.Cast
      • fn.Clamp
      • fn.CocoReader
      • fn.CoinFlip
      • fn.ColorSpaceConversion
      • fn.Concat
      • fn.Constant
      • fn.Contrast
      • fn.Crop
      • fn.CropMirrorNorm
      • fn.ExtCpuOp
      • fn.ExtHpuOp
      • fn.Flip
      • fn.GatherND
      • fn.GaussianBlur
      • fn.Hue
      • fn.ImageDecoder
      • fn.MediaConst
      • fn.MediaExtReaderOp
      • fn.MediaFunc
      • fn.MemCpy
      • fn.Mult
      • fn.Neg
      • fn.Normalize
      • fn.Pad
      • fn.RandomBiasedCrop
      • fn.RandomFlip
      • fn.RandomNormal
      • fn.RandomUniform
      • fn.ReadImageDatasetFromDir
      • fn.ReadNumpyDatasetFromDir
      • fn.ReadVideoDatasetFromDir
      • fn.ReduceMax
      • fn.ReduceMin
      • fn.Reshape
      • fn.Resize
      • fn.Saturation
      • fn.Slice
      • fn.Split
      • fn.SSDBBoxFlip
      • fn.SSDCropWindowGen
      • fn.SSDEncode
      • fn.SSDMetadata
      • fn.Sub
      • fn.Transpose
      • fn.VideoDecoder
      • fn.Where
      • fn.Zoom
  • Profiling
    • Profiling Workflow
    • Profiling Real-World Examples
    • Profiling with PyTorch
    • Profiling with Intel Gaudi Software
      • Getting Started with Intel Gaudi Profiler
      • Configuration
      • Analysis
      • Remote Trace Viewer Tool
      • Offline Trace Parser Tool
      • Tips and Tricks to Accelerate the Training
  • Management and Monitoring
    • Qualification Tool Library Guide (hl_qual)
      • hl_qual Common Plugin Switches and Parameters
      • hl_qual Report Structure
      • hl_qual Expected Output and Failure Debug
      • Memory Stress Test Plugins Design, Switches and Parameters
      • Power Stress and EDP Tests Plugins Design, Switches and Parameters
      • Connectivity Serdes Test Plugins Design, Switches and Parameters
      • Functional Test Plugins Design, Switches and Parameters
      • Bandwidth Test Plugins Design, Switches and Parameters
      • hl_qual Monitor Textual UI
      • Package Content
      • hl_qual Design
      • Diagnostic Tool
        • Test Plan Automation
        • Log Analysis
        • Qual Package Installation Validator
    • Embedded System Tools User Guide
      • Firmware Update Tool
      • System Management Interface Tool ( hl-smi )
      • Intel Gaudi Secure Firmware Flow
      • Intel Gaudi Secure Boot Flow
      • Disable/Enable NICs
    • Hypervisor Tools Installation and Usage
      • Installing Hypervisor Tools Package
      • Memory Scrub Verification Tool
      • hl_smi_async Tool
    • Intel Gaudi RDMA PerfTest Tool
    • Intel Gaudi Network Configuration
      • Configure E2E Test in L3 Switching Environment
      • Expected Switch Configuration
      • Monitoring Switch and Gaudi 3 Accelerator
      • Collectives Performance
      • Congestion Test
      • How to Pick Good Nodes in the Datacenter
      • Arista Switch Configuration Example
    • Habana Labs Management Library (HLML) C API Reference
      • C APIs
      • Common APIs
      • Per Device APIs
      • Linkage HLML
    • Habana Labs Management Library (PYHLML) Python API Reference
      • Python APIs
      • Common APIs
      • Per Device APIs
  • Orchestration
    • Running Workloads on Kubernetes
    • VMware Tanzu User Guide
    • Enabling Multiple Tenants on PyTorch
      • Multiple Workloads on a Single Docker
      • Multiple Dockers Each with a Single Workload
    • BMC Exporter User Guide
    • Prometheus Metric Exporter
    • Using Slurm Workload Manager with Intel Gaudi
    • Amazon ECS with Gaudi User Guide
      • Setting up EFA-Enabled Security Group
      • Creating a Multi-Node Parallel (MNP) Compatible Docker Image
      • Create AWS Batch Compute Environment
      • Create and Submit AWS Batch Job
      • Advanced Model Training Batch Example: ResNet50
    • Amazon EKS with Gaudi User Guide
      • Creating Cluster and Node Group
      • Enabling Plugins
      • Running a Job on the Cluster
      • MNIST Model Training Example: Run MPIJob on Multi-node Cluster
      • Advanced Model Training Example: Run ResNet Multi-node Cluster
  • Virtualization
  • AWS User Guides
    • Habana Deep Learning Base AMI Installation
    • AWS Base OS AMI Installation
    • Distributed Training across Multiple AWS DL1 Instances
    • Amazon ECS with Gaudi User Guide
      • Setting up EFA-Enabled Security Group
      • Creating a Multi-Node Parallel (MNP) Compatible Docker Image
      • Create AWS Batch Compute Environment
      • Create and Submit AWS Batch Job
      • Advanced Model Training Batch Example: ResNet50
    • Amazon EKS with Gaudi User Guide
      • Creating Cluster and Node Group
      • Enabling Plugins
      • Running a Job on the Cluster
      • MNIST Model Training Example: Run MPIJob on Multi-node Cluster
      • Advanced Model Training Example: Run ResNet Multi-node Cluster
  • APIs
    • Habana Collective Communications Library (HCCL) API Reference
      • Supported Collective Primitives
      • Using HCCL
      • Scale-out via Host NIC
      • C APIs
      • Testing and Benchmarking
    • Habana Labs Management Library (HLML) C API Reference
      • C APIs
      • Common APIs
      • Per Device APIs
      • Linkage HLML
    • Habana Labs Management Library (PYHLML) Python API Reference
      • Python APIs
      • Common APIs
      • Per Device APIs
    • Intel Gaudi PyTorch Python API
  • TPC Programming
    • TPC Getting Started Guide
    • TPC Tools Installation Guide
    • TPC User Guide
      • TPC Programming Language
      • Processor Architectural Overview
      • TPC Programming Model
      • TPC-C Language
      • Built-in Functions
      • Implementing and Integrating New lib
      • TPC Coherency
      • Multiple Kernel Libraries
      • Abbreviations
    • TPC Tools Debugger
      • Installation
      • Starting a Debug Session
      • TPC-C Source or Disassembly Level Debugging
      • Debug Session Views and Operations
    • TPC-C Language Specification
      • Supported Data Types
      • Conversions and Type Casting
      • Operators
      • Vector Operations
      • Address Space Qualifiers
      • Storage-Class Specifiers
      • Exceptions to C99 standard
      • Exceptions to C++ 11 Standard
      • Preprocessor Directives and Macros
      • Functions
      • Built-in Special Functions
    • TPC Intrinsics Guide
      • Arithmetic
      • Bitwise
      • Cache
      • Convert
      • IRF
      • LUT
      • Load
      • Logical
      • Move
      • Pack/Unpack
      • Select
      • Store
      • Miscellaneous
    • TPC I64 Built-ins Guide
      • Arithmetic
      • Load
      • Move
      • Select
      • Store

Support

  • Support and Legal Notice
Theme by the Executable Book Project

Legal Notice and Disclaimer

Legal Notice and Disclaimer¶

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Habana Labs disclaims all warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

All information provided here is subject to change without notice. Habana Labs may make changes to its test conditions and internal reliability goals at any time. Contact your Habana Labs representative to obtain the latest Habana Labs product specifications and roadmaps. Your costs and results may vary.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Software and workloads used in performance tests may have been optimized for performance only on Habana Labs hardware. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

No product or component can be absolutely secure.

Habana Labs, Gaudi and SynapseAI are trademarks of Habana Labs in the U.S. and/or other countries.

Other names and brands may be claimed as the property of others.

© 2025 Habana Labs

By Habana Labs
© Copyright 2025, Habana Labs.