Toggle navigation sidebar
Toggle in-page Table of Contents
Gaudi Documentation 1.18.0 documentation
Welcome to IntelĀ® GaudiĀ® v1.18 Documentation
Getting Started
Gaudi Architecture and Software Overview
Gaudi Architecture
Intel Gaudi Software Suite
Support Matrix
Release Notes
Installation
Hardware and Network Requirements
Driver and Software Installation
Installation Methods
Bare Metal Installation
Docker Installation
Kubernetes Installation
OpenShift Installation
Setting up Gaudi for OpenShift
Deploying Intel Gaudi Base Operator
Firmware Upgrade
System Verifications and Final Tests
AWS DL1 Quick Start
Intel Developer Cloud Quick Start
PyTorch
Training
Getting Started with Training on Intel Gaudi
PyTorch Model Porting
GPU Migration Toolkit
Importing PyTorch Models Manually
Mixed Precision Training with PyTorch Autocast
Intel Gaudi Media Loader
FP8 Training with Intel Gaudi Transformer Engine
Distributed Training with PyTorch
Scale-out Topology
Distributed Backend Initialization
Gaudi-to-process Assignment
DDP-based Scaling of Gaudi on PyTorch
Theory of Distributed Training
Using Fully Sharded Data Parallel (FSDP) with Intel Gaudi
Using DistributedTensor with Intel Gaudi
Inference
Getting Started with Inference on Intel Gaudi
AI Model Serving with Intel Gaudi
Run Inference Using HPU Graphs
Run Inference Using FP8
Run Inference Using UINT4
Optimize Inference on PyTorch
Using Gaudi Trained Checkpoints on Xeon
vLLM Inference Server with Gaudi
Triton Inference Server with Gaudi
TorchServe Inference Server with Gaudi
DeepSpeed
Getting Started with DeepSpeed
DeepSpeed Training
Optimizing Large Language Models
Inference Using DeepSpeed
Optimization
Model Optimization Checklist
Optimizations of PyTorch Models
Inference Optimizations
Handling Dynamic Shapes
Fused Optimizers and Custom Ops for Intel Gaudi
HPU Graphs for Training
Optimizing Training Platform
Reference
Debugging and Troubleshooting
Debugging with Intel Gaudi Logs
Debugging Model Divergence
Debugging Slow Convergence
Troubleshooting PyTorch Model
Runtime Environment Variables
Intel Gaudi PyTorch Python API (habana_frameworks.torch)
PyTorch Operators
PyTorch CustomOp API
PyTorch Support Matrix
PyTorch Gaudi Theory of Operations
Hugging Face Optimum for Intel Gaudi
PyTorch Lightning
Guides
MediaPipe
Creating and Executing Media Pipeline
MediaPipe for PyTorch ResNet
MediaPipe for PyTorch ResNet3d
Operators
fn.Add
fn.BasicCrop
fn.BitwiseAnd
fn.BitwiseOr
fn.BitwiseXor
fn.Brightness
fn.Cast
fn.Clamp
fn.CocoReader
fn.CoinFlip
fn.ColorSpaceConversion
fn.Concat
fn.Constant
fn.Contrast
fn.Crop
fn.CropMirrorNorm
fn.ExtCpuOp
fn.ExtHpuOp
fn.Flip
fn.GatherND
fn.GaussianBlur
fn.Hue
fn.ImageDecoder
fn.MediaConst
fn.MediaExtReaderOp
fn.MediaFunc
fn.MemCpy
fn.Mult
fn.Neg
fn.Normalize
fn.Pad
fn.RandomBiasedCrop
fn.RandomFlip
fn.RandomNormal
fn.RandomUniform
fn.ReadImageDatasetFromDir
fn.ReadNumpyDatasetFromDir
fn.ReadVideoDatasetFromDir
fn.ReduceMax
fn.ReduceMin
fn.Reshape
fn.Resize
fn.Saturation
fn.Slice
fn.Split
fn.SSDBBoxFlip
fn.SSDCropWindowGen
fn.SSDEncode
fn.SSDMetadata
fn.Sub
fn.Transpose
fn.VideoDecoder
fn.Where
fn.Zoom
Profiling
Profiling Workflow
Profiling Real-World Examples
Profiling with PyTorch
Profiling with Intel Gaudi Software
Getting Started with Intel Gaudi Profiler
Configuration
Analysis
Remote Trace Viewer Tool
Offline Trace Parser Tool
Tips and Tricks to Accelerate the Training
Management and Monitoring
Qualification Tool Library Guide (hl_qual)
hl_qual Common Plugin Switches and Parameters
hl_qual Report Structure
hl_qual Expected Output and Failure Debug
Memory Stress Test Plugins Design, Switches and Parameters
Power Stress and EDP Tests Plugins Design, Switches and Parameters
Connectivity Serdes Test Plugins Design, Switches and Parameters
Functional Test Plugins Design, Switches and Parameters
Bandwidth Test Plugins Design, Switches and Parameters
hl_qual Monitor Textual UI
Package Content
hl_qual Design
Embedded System Tools User Guide
Firmware Update Tool
System Management Interface Tool (
hl-smi
)
Gaudi Secure Boot
Disable/Enable NICs
Habana Labs Management Library (HLML) C API Reference
C APIs
Common APIs
Per Device APIs
Linkage HLML
Habana Labs Management Library (PYHLML) Python API Reference
Python APIs
Common APIs
Per Device APIs
Orchestration
Running Kubernetes Workloads with Gaudi
VMware Tanzu User Guide
Enabling Multiple Tenants on PyTorch
Multiple Workloads on a Single Docker
Multiple Dockers Each with a Single Workload
KubeVirt Installation for On-Premise Platforms
BMC Exporter User Guide
Prometheus Metric Exporter
Using Slurm Workload Manager with Intel Gaudi
Amazon ECS with Gaudi User Guide
Setting up EFA-Enabled Security Group
Creating a Multi-Node Parallel (MNP) Compatible Docker Image
Create AWS Batch Compute Environment
Create and Submit AWS Batch Job
Advanced Model Training Batch Example: ResNet50
Amazon EKS with Gaudi User Guide
Creating Cluster and Node Group
Enabling Plugins
Running a Job on the Cluster
MNIST Model Training Example: Run MPIJob on Multi-node Cluster
Advanced Model Training Example: Run ResNet Multi-node Cluster
Virtualization
AWS User Guides
Habana Deep Learning Base AMI Installation
AWS Base OS AMI Installation
Distributed Training across Multiple AWS DL1 Instances
Amazon ECS with Gaudi User Guide
Setting up EFA-Enabled Security Group
Creating a Multi-Node Parallel (MNP) Compatible Docker Image
Create AWS Batch Compute Environment
Create and Submit AWS Batch Job
Advanced Model Training Batch Example: ResNet50
Amazon EKS with Gaudi User Guide
Creating Cluster and Node Group
Enabling Plugins
Running a Job on the Cluster
MNIST Model Training Example: Run MPIJob on Multi-node Cluster
Advanced Model Training Example: Run ResNet Multi-node Cluster
APIs
Habana Collective Communications Library (HCCL) API Reference
Supported Collective Primitives
Using HCCL
Scale-out via Host NIC
C APIs
Testing and Benchmarking
Habana Labs Management Library (HLML) C API Reference
C APIs
Common APIs
Per Device APIs
Linkage HLML
Habana Labs Management Library (PYHLML) Python API Reference
Python APIs
Common APIs
Per Device APIs
Intel Gaudi PyTorch Python API
TPC Programming
TPC Getting Started Guide
TPC Tools Installation Guide
TPC User Guide
TPC Programming Language
Processor Architectural Overview
TPC Programming Model
TPC-C Language
Built-in Functions
Implementing and Integrating New lib
TPC Coherency
Multiple Kernel Libraries
Abbreviations
TPC Tools Debugger
Installation
Starting a Debug Session
TPC-C Source or Disassembly Level Debugging
Debug Session Views and Operations
TPC-C Language Specification
Supported Data Types
Conversions and Type Casting
Operators
Vector Operations
Address Space Qualifiers
Storage-Class Specifiers
Exceptions to C99 standard
Exceptions to C++ 11 Standard
Preprocessor Directives and Macros
Functions
Built-in Special Functions
TPC Intrinsics Guide
Arithmetic
Bitwise
Cache
Convert
IRF
LUT
Load
Logical
Move
Pack/Unpack
Select
Store
Miscellaneous
TPC I64 Built-ins Guide
Arithmetic
Load
Move
Select
Store
Support
Support and Legal Notice
Index
B
|
C
|
F
|
M
B
built-in function
clip_norm()
C
clip_norm()
built-in function
F
FusedAdagrad (class in habana_frameworks.torch.hpex.optimizers)
FusedAdamW (class in habana_frameworks.torch.hpex.optimizers)
(class in habana_frameworks.torch.hpex.optimizers.distributed)
FusedClipNorm (class in habana_frameworks.torch.hpex.normalization)
FusedLamb (class in habana_frameworks.torch.hpex.optimizers)
FusedLars (class in habana_frameworks.torch.hpex.optimizers)
FusedSGD (class in habana_frameworks.torch.hpex.optimizers)
M
mixture_of_experts (built-in class)