Toggle navigation sidebar
Toggle in-page Table of Contents
Welcome to Habana® Gaudi® v1.11 Documentation
Getting Started
Gaudi Architecture and Software Overview
Gaudi Architecture
SynapseAI® Software Suite
Support Matrix
Release Notes
Installation
Habana Deep Learning Base AMI Installation
AWS Deep Learning AMI (DLAMI) Installation
AWS Base OS AMI Installation
Bare Metal Fresh OS Installation
AWS DL1 Quick Start
Frameworks
PyTorch
Getting Started with PyTorch and Gaudi
PyTorch Model Porting
Importing Habana Torch Library
Enabling Mixed Precision
Setting Up Distributed Training
GPU Migration Toolkit
PyTorch Mixed Precision Training on Gaudi
Native PyTorch Autocast
Habana Mixed Precision
Distributed Training with PyTorch
Scale-Out Topology
Distributed Backend Initialization
Gaudi-to-process Assignment
DDP-based Scaling of Gaudi on PyTorch
Theory of Distributed Training
Habana Media Loader
Large Models on PyTorch Using DeepSpeed
Getting Started with DeepSpeed
DeepSpeed Training
DeepSpeed Inference
Inference on PyTorch
Run Inference Using Native PyTorch
Run Inference Using HPU Graphs
Optimize Inference on PyTorch
Run Inference Using DeepSpeed
Triton Inference Server with Gaudi
Enabling Multiple Tenants on PyTorch
Multiple Workloads on a Single Docker
Multiple Dockers Each with a Single Workload
Model Performance Optimization
Optimizing PyTorch Models
Handling Dynamic Shapes
Handling Custom Habana Ops for PyTorch
Using HPU Graphs for Training
Optimizing Training Using PyTorch Lightning
Optimizing Training Platform
Debugging and Troubleshooting
Debugging Possible Model Errors
Debugging Model Divergence
Debugging Slow Convergence
Troubleshooting your Model
Runtime Environment Variables
Habana PyTorch Python API (habana_frameworks.torch)
PyTorch Operators
PyTorch CustomOp API
Hugging Face Optimum-Habana
PyTorch Lightning
TensorFlow
Migration Guide
TensorFlow User Guide
TensorFlow Gaudi Integration Architecture
Host and Device Ops Placement
TensorFlow Keras
Runtime Environment Variables
Habana TensorFlow Python API (habana_frameworks.tensorflow)
TensorFlow Mixed Precision Training on Gaudi
Distributed Training with TensorFlow
Overview
Scale-out Topology
Gaudi-to-process Assignment
Horovod-based Scaling of Gaudi on TensorFlow
TensorFlow Distributed based Scaling of Gaudi
Habana Media Loader
Enabling Multiple Tenants on TensorFlow
Multiple Workloads on a Single Docker
Multiple Dockers Each with a Single Workload
Running Multiple Workloads on a Single Node K8s Cluster
Model Performance Optimization
Optimization in TensorFlow Models
Optimizing Training Platform
Debugging Guide
Debugging Possible Model Errors
Debugging Model Divergence
Debugging Slow Convergence
Troubleshooting your Model
TensorFlow Operators
TensorFlow CustomOp API
Guides
Media Pipeline
Creating and Executing Media Pipeline
Media Pipe for PyTorch ResNet
Media Pipe for TensorFlow ResNet
Operators
fn.Add
fn.BasicCrop
fn.BitwiseAnd
fn.BitwiseOr
fn.BitwiseXor
fn.Brightness
fn.Cast
fn.Clamp
fn.CocoReader
fn.CoinFlip
fn.ColorSpaceConversion
fn.Concat
fn.Constant
fn.Contrast
fn.Crop
fn.CropMirrorNorm
fn.ExtCpuOp
fn.ExtHpuOp
fn.Flip
fn.GatherND
fn.GaussianBlur
fn.Hue
fn.ImageDecoder
fn.MediaConst
fn.MediaExtReaderOp
fn.MediaFunc
fn.MemCpy
fn.Mult
fn.Neg
fn.Normalize
fn.Pad
fn.RandomBiasedCrop
fn.RandomFlip
fn.RandomNormal
fn.RandomUniform
fn.ReadImageDatasetFromDir
fn.ReadNumpyDatasetFromDir
fn.ReduceMax
fn.ReduceMin
fn.Reshape
fn.Resize
fn.Saturation
fn.Slice
fn.Split
fn.SSDMetadata
fn.Sub
fn.Transpose
fn.Where
fn.Zoom
Profiling
Profiling with PyTorch
Profiling with SynapseAI
Getting Started with SynapseAI Profiler
Configuration
Analysis
Profiling with TensorFlow
Profiling Architecture
Tips and Tricks to Accelerate the Training
Management and Monitoring
Qualification Library Guide (hl_qual Tool)
hl_qual Common Plugin Switches and Parameters
hl_qual Report Structure
hl_qual Expected Output and Failure Debug
Memory Stress Test Plugins Design, Switches and Parameters
Power Stress and EDP Tests Plugins Design, Switches and Parameters
Connectivity Serdes Test Plugins Design, Switches and Parameters
Functional Test Plugins Design, Switches and Parameters
Bandwidth Test Plugins Design, Switches and Parameters
hl_qual Monitor Textual UI
Package Content
hl_qual Design
System Management Interface Tool User Guide (hl-smi Tool)
Habana Labs Management Library (HLML) C API Reference
C API
Common APIs
Per device APIs
Linkage HLML
Habana Labs Management Library (PYHLML) Python API Reference
Python APIs
Common APIs
Per device APIs
Orchestration
Kubernetes User Guide
Habana Device Plugin for Kubernetes
MPI Operator for Kubernetes
Prometheus Metric Exporter for Kubernetes
HabanaAI Operator for OpenShift
Setting up OpenShift Environment
Deploying HabanaAI Operator
VMware Tanzu Guide
Enabling Multiple Tenants
Enabling Multiple Tenants on PyTorch
Multiple Workloads on a Single Docker
Multiple Dockers Each with a Single Workload
Enabling Multiple Tenants on TensorFlow
Multiple Workloads on a Single Docker
Multiple Dockers Each with a Single Workload
Running Multiple Workloads on a Single Node K8s Cluster
Amazon ECS with Habana User Guide
Setting Up EFA Enabled Security Group
Creating an Multi-Node Parallel (MNP) Compatible Docker Image
Create AWS Batch Compute Environment
Create and Submit AWS Batch Job
Advanced Model Training Batch Example: ResNet50 Keras
Amazon EKS with Habana User Guide
Creating Cluster and Node Group
Enabling Plugins
Running a Job on the Cluster
mnist Model Training Example: Run MPIJob on Multi-node Cluster
Advanced Model Training Example: Run ResNet Keras Multi-node Cluster
Virtualization
AWS User Guides
Create Elastic Container Registry (ECR) and Upload Images
Distributed Training across Multiple AWS DL1 Instances User Guide
Amazon ECS with Habana User Guide
Setting Up EFA Enabled Security Group
Creating an Multi-Node Parallel (MNP) Compatible Docker Image
Create AWS Batch Compute Environment
Create and Submit AWS Batch Job
Advanced Model Training Batch Example: ResNet50 Keras
Amazon EKS with Habana User Guide
Creating Cluster and Node Group
Enabling Plugins
Running a Job on the Cluster
mnist Model Training Example: Run MPIJob on Multi-node Cluster
Advanced Model Training Example: Run ResNet Keras Multi-node Cluster
APIs
Habana Collective Communications Library (HCCL) API Reference
Overview
Using HCCL
Scale-Out via Host-NIC
C API
Testing and Benchmarking
Habana Labs Management Library (HLML) C API Reference
C API
Common APIs
Per device APIs
Linkage HLML
Habana Labs Management Library (PYHLML) Python API Reference
Python APIs
Common APIs
Per device APIs
Habana TensorFlow Python API
Habana PyTorch Python API
TPC Programming
TPC Getting Started Guide
TPC Tools Installation Guide
TPC User Guide
TPC Programming Language
Processor Architectural Overview
TPC Programming Model
TPC-C Language
Built-in Functions
Implementing and Integrating New lib
TPC Coherency
Multiple Kernel Libraries
Abbreviations
TPC Tools Debugger
Installation
Starting a Debug Session
TPC-C Source or Disassembly Level Debugging
Debug Session Views and Operations
TPC-C Language Specification
Supported Data Types
Conversions and Type Casting
Operators
Vector Operations
Address Space Qualifiers
Storage-Class Specifiers
Exceptions to C99 standard
Exceptions to C++ 11 Standard
Preprocessor Directives and Macros
Functions
Built-in Special Functions
TPC Intrinsics Guide
Arithmetic
Bitwise
Cache
Convert
IRF
LUT
Load
Logical
Move
Pack/Unpack
Select
Store
Miscellaneous
Support
Support and Legal Notice
Habana Driver Unattended Upgrade
Habana Driver Unattended Upgrade
¶
Ubuntu 22.04
Ubuntu 20.04
Amazon Linux 2
RHEL8