• Welcome to Habana® Gaudi® v1.8 Documentation

Getting Started

  • Gaudi Architecture and Software Overview
    • Gaudi Architecture
    • SynapseAI® Software Suite
    • Best Practices for Model Training with Gaudi
  • Support Matrix
  • Release Notes
  • Installation
    • Habana Deep Learning Base AMI Installation
    • AWS Deep Learning AMI (DLAMI) Installation
    • AWS Base OS AMI Installation
    • Bare Metal Fresh OS Installation
  • AWS DL1 Quick Start

Frameworks

  • PyTorch
    • Getting Started with PyTorch and Gaudi
    • PyTorch Model Porting
      • Porting PyTorch Models to Gaudi
      • Placement of Ops on HPU
      • Weight Sharing
      • HPU Graphs for Training
    • PyTorch Mixed Precision Training on Gaudi
    • Distributed Training with PyTorch
      • Scale-Out Topology
      • Distributed Backend Initialization
      • Gaudi-to-process Assignment
      • DDP-based Scaling of Gaudi on PyTorch
      • Theory of Distributed Training
    • Habana Media Loader
    • Large Models on PyTorch Using DeepSpeed
      • Getting Started with DeepSpeed
      • DeepSpeed Training
      • DeepSpeed Inference
    • Inference on PyTorch
      • Run Inference Using Native PyTorch
      • Run Inference Using HPU Graphs
      • Optimize Inference on PyTorch
      • Run Inference Using DeepSpeed
    • Model Performance Optimization
      • Optimizing PyTorch Models
      • Handling Dynamic Shapes
      • Handling Custom Habana Ops for PyTorch
      • Optimizing Training Using PyTorch Lightning
      • Optimizing Training Platform
    • Debugging and Troubleshooting
      • Debugging Possible Model Errors
      • Debugging Model Divergence
      • Debugging Slow Convergence
      • Troubleshooting your Model
    • Runtime Environment Variables
    • Habana PyTorch Python API (habana_frameworks.torch)
    • PyTorch Operators
    • PyTorch CustomOp API
  • Hugging Face Optimum-Habana
  • PyTorch Lightning
  • TensorFlow
    • Migration Guide
    • TensorFlow User Guide
      • TensorFlow Gaudi Integration Architecture
      • Host and Device Ops Placement
      • TensorFlow Keras
      • Runtime Environment Variables
      • Habana TensorFlow Python API (habana_frameworks.tensorflow)
    • TensorFlow Mixed Precision Training on Gaudi
    • Distributed Training with TensorFlow
      • Overview
      • Scale-out Topology
      • Gaudi-to-process Assignment
      • Horovod-based Scaling of Gaudi on TensorFlow
      • TensorFlow Distributed based Scaling of Gaudi
    • Habana Media Loader
    • Enabling Multiple Tenants
      • Multiple Workloads on a Single Docker
      • Multiple Dockers Each with a Single Workload
      • Running Multiple Workloads on a Single Node K8s Cluster
    • Model Performance Optimization
      • Optimization in TensorFlow Models
      • Optimizing Training Platform
    • Debugging Guide
      • Debugging Possible Model Errors
      • Debugging Model Divergence
      • Debugging Slow Convergence
      • Troubleshooting your Model
    • TensorFlow Operators
    • TensorFlow CustomOp API

Guides

  • Media Pipeline
    • Creating and Executing Media Pipeline
    • Media Pipe for PyTorch ResNet
    • Media Pipe for TensorFlow ResNet
    • Operators
      • fn.Add
      • fn.BasicCrop
      • fn.BitwiseAnd
      • fn.BitwiseOr
      • fn.BitwiseXor
      • fn.Brightness
      • fn.Cast
      • fn.Clamp
      • fn.CocoReader
      • fn.ColorSpaceConversion
      • fn.Concat
      • fn.Constant
      • fn.Contrast
      • fn.Crop
      • fn.CropMirrorNorm
      • fn.ExtCpuOp
      • fn.ExtHpuOp
      • fn.Flip
      • fn.GatherND
      • fn.GaussianBlur
      • fn.Hue
      • fn.ImageDecoder
      • fn.MediaConst
      • fn.MediaExtReaderOp
      • fn.MediaFunc
      • fn.MemCpy
      • fn.Mult
      • fn.Neg
      • fn.Normalize
      • fn.Pad
      • fn.RandomBernoulli
      • fn.RandomBiasedCrop
      • fn.RandomFlip
      • fn.RandomNormal
      • fn.ReadImageDatasetFromDir
      • fn.ReadNumpyDatasetFromDir
      • fn.ReduceMax
      • fn.ReduceMin
      • fn.Reshape
      • fn.Resize
      • fn.Saturation
      • fn.Slice
      • fn.Split
      • fn.SSDMetadata
      • fn.Sub
      • fn.Transpose
  • Profiling
    • Profiling with TensorFlow
    • Profiling with Pytorch
    • Profiling with SynapseAI
      • Configuration
      • Runtime
      • Analysis
    • Profiling Architecture
    • Tips and Tricks to Accelerate the Training
  • Management and Monitoring
    • Qualification Library Guide (hl_qual Tool)
      • hl_qual Common Plugin Switches and Parameters
      • hl_qual Report Structure
      • hl_qual Expected Output and Failure Debug
      • Memory Stress Test Plugins Design, Switches and Parameters
      • Power Stress and EDP Tests Plugins Design, Switches and Parameters
      • Connectivity Serdes Test Plugins Design, Switches and Parameters
      • Functional Test Plugins Design, Switches and Parameters
      • Bandwidth Test Plugins Design, Switches and Parameters
      • hl_qual Monitor Textual UI
      • Package Content
      • hl_qual Design
    • System Management Interface Tool User Guide (hl-smi Tool)
    • Habana Labs Management Library (HLML) C API Reference
      • C API
      • Common APIs
      • Per device APIs
      • Linkage HLML
    • Habana Labs Management Library (PYHLML) Python API Reference
      • Python APIs
      • Common APIs
      • Per device APIs
  • Orchestration
    • Kubernetes User Guide
      • Habana Device Plugin for Kubernetes
      • MPI Operator for Kubernetes
      • Prometheus Metric Exporter for Kubernetes
    • OpenShift (OCP) User Guide
      • Preparation For Running Docker Image on OCP-based Host
      • Build & Run Docker Container
      • Load habanalabs Driver Inside Running Docker Container
      • Habana Device Plugin for Kubernetes
      • Usage Examples
    • VMware Tanzu Guide
    • Enabling Multiple Tenants
      • Multiple Workloads on a Single Docker
      • Multiple Dockers Each with a Single Workload
      • Running Multiple Workloads on a Single Node K8s Cluster
    • Amazon ECS with Habana User Guide
      • Setting Up EFA Enabled Security Group
      • Creating an Multi-Node Parallel (MNP) Compatible Docker Image
      • Create AWS Batch Compute Environment
      • Create and Submit AWS Batch Job
      • Advanced Model Training Batch Example: ResNet50 Keras
    • Amazon EKS with Habana User Guide
      • Creating Cluster and Node Group
      • Enabling Plugins
      • Running a Job on the Cluster
      • mnist Model Training Example: Run MPIJob on Multi-node Cluster
      • Advanced Model Training Example: Run ResNet Keras Multi-node Cluster
  • Virtualization
  • AWS User Guides
    • Create Elastic Container Registry (ECR) and Upload Images
    • Distributed Training across Multiple AWS DL1 Instances User Guide
    • Amazon ECS with Habana User Guide
      • Setting Up EFA Enabled Security Group
      • Creating an Multi-Node Parallel (MNP) Compatible Docker Image
      • Create AWS Batch Compute Environment
      • Create and Submit AWS Batch Job
      • Advanced Model Training Batch Example: ResNet50 Keras
    • Amazon EKS with Habana User Guide
      • Creating Cluster and Node Group
      • Enabling Plugins
      • Running a Job on the Cluster
      • mnist Model Training Example: Run MPIJob on Multi-node Cluster
      • Advanced Model Training Example: Run ResNet Keras Multi-node Cluster
  • APIs
    • Habana Collective Communications Library (HCCL) API Reference
      • Overview
      • Using HCCL
      • Scale-Out via Host-NIC
      • C API
      • Testing and Benchmarking
    • Habana Labs Management Library (HLML) C API Reference
      • C API
      • Common APIs
      • Per device APIs
      • Linkage HLML
    • Habana Labs Management Library (PYHLML) Python API Reference
      • Python APIs
      • Common APIs
      • Per device APIs
    • Habana TensorFlow Python API
    • Habana PyTorch Python API
  • TPC Programming
    • TPC Getting Started Guide
    • TPC Tools Installation Guide
    • TPC User Guide
      • TPC Programming Language
      • Processor Architectural Overview
      • TPC Programming Model
      • TPC-C Language
      • Built-in Functions
      • Implementing and Integrating New lib
      • TPC Coherency
      • Multiple Kernel Libraries
      • Abbreviations
    • TPC Tools Debugger
      • Installation
      • Starting a Debug Session
      • TPC-C Source or Disassembly Level Debugging
      • Debug Session Views and Operations
    • TPC-C Language Specification
      • Supported Data Types
      • Conversions and Type Casting
      • Operators
      • Vector Operations
      • Address Space Qualifiers
      • Storage-Class Specifiers
      • Exceptions to C99 standard
      • Exceptions to C++ 11 Standard
      • Preprocessor Directives and Macros
      • Functions
      • Built-in Special Functions
    • TPC Intrinsics Guide
      • Arithmetic
      • Bitwise
      • Cache
      • Convert
      • IRF
      • LUT
      • Load
      • Logical
      • Move
      • Pack/Unpack
      • Select
      • Store
      • Miscellaneous

Support

  • Support and Legal Notice
Theme by the Executable Book Project

Support and Legal Notice

Support and Legal Notice¶

  • Legal Notice and Disclaimer

  • Habana Outbound Software License Agreement

  • Send Feedback

previous

Miscellaneous

By Habana Labs
© Copyright 2023, Habana Labs.