Toggle navigation sidebar
Toggle in-page Table of Contents
Gaudi Documentation 1.15.0 documentation
Welcome to IntelĀ® GaudiĀ® v1.15 Documentation
Getting Started
Gaudi Architecture and Software Overview
Gaudi Architecture
Intel Gaudi Software Suite
Support Matrix
Release Notes
Installation
Intel Gaudi Software Stack Verification
Intel Gaudi Software Stack and Driver Installation
Platform Upgrade and Full System Installation
AWS DL1 Quick Start
Intel Developer Cloud Quick Start
PyTorch
Training
Getting Started with Training on Intel Gaudi
PyTorch Model Porting
GPU Migration Toolkit
Importing PyTorch Models Manually
Mixed Precision Training with PyTorch Autocast
FP8 Training with Intel Gaudi Transformer Engine
Distributed Training with PyTorch
Scale-Out Topology
Distributed Backend Initialization
Gaudi-to-process Assignment
DDP-based Scaling of Gaudi on PyTorch
Theory of Distributed Training
Using Fully Sharded Data Parallel (FSDP) with Intel Gaudi
Inference
Getting Started with Inference on Intel Gaudi
Run Inference Using HPU Graphs
Inference Using FP8
Optimize Inference on PyTorch
Triton Inference Server with Gaudi
Using Gaudi Trained Checkpoints on Xeon
DeepSpeed
Getting Started with DeepSpeed
DeepSpeed Training
Optimizing Large Language Models
Inference Using DeepSpeed
Optimization
Model Optimization Checklist
General Model Optimizations
Inference Optimizations
Handling Dynamic Shapes
Handling Custom Ops
Using HPU Graphs for Training
Optimizing Training Platform
Reference
Intel Gaudi Media Loader
Enabling Multiple Tenants on PyTorch
Multiple Workloads on a Single Docker
Multiple Dockers Each with a Single Workload
Debugging and Troubleshooting
Debugging Possible Model Errors
Debugging Model Divergence
Debugging Slow Convergence
Troubleshooting your Model
Runtime Environment Variables
Intel Gaudi PyTorch Python API (habana_frameworks.torch)
PyTorch Operators
PyTorch CustomOp API
PyTorch Support Matrix
PyTorch Gaudi Theory of Operations
Hugging Face Optimum-Habana
PyTorch Lightning
Guides
Media Pipeline
Creating and Executing Media Pipeline
Media Pipe for PyTorch ResNet
Operators
fn.Add
fn.BasicCrop
fn.BitwiseAnd
fn.BitwiseOr
fn.BitwiseXor
fn.Brightness
fn.Cast
fn.Clamp
fn.CocoReader
fn.CoinFlip
fn.ColorSpaceConversion
fn.Concat
fn.Constant
fn.Contrast
fn.Crop
fn.CropMirrorNorm
fn.ExtCpuOp
fn.ExtHpuOp
fn.Flip
fn.GatherND
fn.GaussianBlur
fn.Hue
fn.ImageDecoder
fn.MediaConst
fn.MediaExtReaderOp
fn.MediaFunc
fn.MemCpy
fn.Mult
fn.Neg
fn.Normalize
fn.Pad
fn.RandomBiasedCrop
fn.RandomFlip
fn.RandomNormal
fn.RandomUniform
fn.ReadImageDatasetFromDir
fn.ReadNumpyDatasetFromDir
fn.ReduceMax
fn.ReduceMin
fn.Reshape
fn.Resize
fn.Saturation
fn.Slice
fn.Split
fn.SSDMetadata
fn.Sub
fn.Transpose
fn.Where
fn.Zoom
Profiling
Profiling with PyTorch
Profiling with Intel Gaudi Software
Getting Started with Intel Gaudi Profiler
Configuration
Analysis
Profiling Architecture
Tips and Tricks to Accelerate the Training
Management and Monitoring
Qualification Library Guide (hl_qual Tool)
hl_qual Common Plugin Switches and Parameters
hl_qual Report Structure
hl_qual Expected Output and Failure Debug
Memory Stress Test Plugins Design, Switches and Parameters
Power Stress and EDP Tests Plugins Design, Switches and Parameters
Connectivity Serdes Test Plugins Design, Switches and Parameters
Functional Test Plugins Design, Switches and Parameters
Bandwidth Test Plugins Design, Switches and Parameters
hl_qual Monitor Textual UI
Package Content
hl_qual Design
Embedded System Tools User Guide
Firmware Update Tool
System Management Interface Tool (hl-smi)
Gaudi Secure Boot
Disable/Enable NICs
Habana Labs Management Library (HLML) C API Reference
C API
Common APIs
Per device APIs
Linkage HLML
Habana Labs Management Library (PYHLML) Python API Reference
Python APIs
Common APIs
Per device APIs
Orchestration
Kubernetes User Guide
Intel Gaudi Device Plugin for Kubernetes
MPI Operator for Kubernetes
Prometheus Metric Exporter for Kubernetes
HabanaAI Operator for OpenShift
Setting up OpenShift Environment
Deploying HabanaAI Operator
VMware Tanzu Guide
Enabling Multiple Tenants on PyTorch
Multiple Workloads on a Single Docker
Multiple Dockers Each with a Single Workload
KubeVirt Installation for On-Premise Platforms
BMC Exporter Guide
Amazon ECS with Gaudi User Guide
Setting Up EFA Enabled Security Group
Creating an Multi-Node Parallel (MNP) Compatible Docker Image
Create AWS Batch Compute Environment
Create and Submit AWS Batch Job
Advanced Model Training Batch Example: ResNet50
Amazon EKS with Gaudi User Guide
Creating Cluster and Node Group
Enabling Plugins
Running a Job on the Cluster
mnist Model Training Example: Run MPIJob on Multi-node Cluster
Advanced Model Training Example: Run ResNet Multi-node Cluster
Virtualization
AWS User Guides
Habana Deep Learning Base AMI Installation
AWS Deep Learning AMI (DLAMI) Installation
AWS Base OS AMI Installation
Distributed Training across Multiple AWS DL1 Instances
Amazon ECS with Gaudi User Guide
Setting Up EFA Enabled Security Group
Creating an Multi-Node Parallel (MNP) Compatible Docker Image
Create AWS Batch Compute Environment
Create and Submit AWS Batch Job
Advanced Model Training Batch Example: ResNet50
Amazon EKS with Gaudi User Guide
Creating Cluster and Node Group
Enabling Plugins
Running a Job on the Cluster
mnist Model Training Example: Run MPIJob on Multi-node Cluster
Advanced Model Training Example: Run ResNet Multi-node Cluster
APIs
Habana Collective Communications Library (HCCL) API Reference
Overview
Using HCCL
Scale-Out via Host-NIC
C API
Testing and Benchmarking
Habana Labs Management Library (HLML) C API Reference
C API
Common APIs
Per device APIs
Linkage HLML
Habana Labs Management Library (PYHLML) Python API Reference
Python APIs
Common APIs
Per device APIs
Intel Gaudi PyTorch Python API
TPC Programming
TPC Getting Started Guide
TPC Tools Installation Guide
TPC User Guide
TPC Programming Language
Processor Architectural Overview
TPC Programming Model
TPC-C Language
Built-in Functions
Implementing and Integrating New lib
TPC Coherency
Multiple Kernel Libraries
Abbreviations
TPC Tools Debugger
Installation
Starting a Debug Session
TPC-C Source or Disassembly Level Debugging
Debug Session Views and Operations
TPC-C Language Specification
Supported Data Types
Conversions and Type Casting
Operators
Vector Operations
Address Space Qualifiers
Storage-Class Specifiers
Exceptions to C99 standard
Exceptions to C++ 11 Standard
Preprocessor Directives and Macros
Functions
Built-in Special Functions
TPC Intrinsics Guide
Arithmetic
Bitwise
Cache
Convert
IRF
LUT
Load
Logical
Move
Pack/Unpack
Select
Store
Miscellaneous
Support
Support and Legal Notice
Page not found
Unfortunately we couldn't find the content you were looking for.