PyTorch Support Matrix¶

The following table shows the supported functionalities by the Intel® Gaudi® PyTorch integration. For more details on Intel Gaudi’s PyTorch integration and the supported execution modes, see PyTorch Gaudi Theory of Operations.

Note

Eager mode with torch.compile is the default mode. Lazy mode is a legacy fallback that is no longer developed.

	Item	Eager + torch.compile (PT_HPU_LAZY_MODE=0)	Lazy Mode (PT_HPU_LAZY_MODE=1)	Comments
Workload Type	Training	Yes	Yes
Workload Type	Inference	Yes	Yes
Device Name in PyTorch	HPU	Yes	Yes	Native Gaudi device name
	CPU	Yes	Yes
	CUDA	No	No	Automatically converted to HPU using GPU Migration Toolkit
Programming Language	C++	Yes*	Yes	*Limited - no support for graphs
Programming Language	Python	Yes	Yes
Modes	Eager	Yes	No
Modes	Graph	Yes	Yes
Graph Solutions	torch.compile(backend=”eager”)	Yes	No
	torch.compile(backend=”inductor”)	No	No	Automatically converted using GPU Migration Toolkit
	torch.compile(backend=”hpu_backend”)	Yes	No
	Lazy graph	No	Yes
	Fx tracing	Yes	Yes
	HPU Graphs	No	Yes
	torch.jit / TorchScript	No	No
	ONNX	No	No
Dynamic Shapes	Eager	Yes	No
	torch.compile	Yes	No
	Lazy	No	Yes
Export	torch.export	Yes	No
	torch.jit / TorchScript	No	No
	ONNX	No	No
Data Types	Int8	Yes	Yes	Limited ops support as listed in PyTorch Operators
	Int16	Yes	Yes	Limited ops support as listed in PyTorch Operators
	Int32	Yes	Yes	Limited ops support as listed in PyTorch Operators
	Int64	Yes	Yes
	Float8	Yes	Yes	Supported on Gaudi 2 only
	Float16	Yes	Yes	Limited ops support as listed in PyTorch Operators
	Float32	Yes	Yes
	BFloat16	Yes	Yes
	Boolean	Yes	Yes	Limited ops support as listed in PyTorch Operators
	Float64	No	No
	Complex32	No	No
	Complex64	No	No
	QInt8	No	No
	QInt16	No	No
	QInt32	No	No
	QInt64	No	No
Tensor Types	Native PyTorch tensor support	Yes	No
	Dense tensors	Yes	Yes
	Views	Yes	Yes
	Channel last	Yes	Yes
	Strided tensors	Yes	Yes	Output strides can be different than CUDA/CPU
	User tensor subclass	Yes	Yes
	Sparse tensors	No	No
	Masked tensors	No	No
	Nested tensors	No	No
Mixed Precision	torch.autocast	Yes	Yes
Mixed Precision	Intel Gaudi Transformer Engine (FP8)	Yes	Yes
Distributed	DeepSpeed	Yes	Yes	See our DeepSpeed documentation for more details.
	PyTorch DDP	Yes	Yes
	PyTorch FSDP	Yes	No	See Using Fully Sharded Data Parallel (FSDP) with Intel Gaudi
	PyTorch DTensor	Yes	No	See Using DistributedTensor with Intel Gaudi
	PyTorch Tensor Parallel	Yes	No	See Using DistributedTensor with Intel Gaudi
	PyTorch Pipeline Parallel	No	No
	PyTorch Distributed Elastic	No	No
Distributed Backend	HCCL	Yes	Yes	Gaudi’s version of NCCL which is converted using GPU Migration Toolkit
	MPI	Yes	Yes
	Gloo	No	No
	NCCL	No	No	Gaudi’s version of NCCL is HCCL which is converted using GPU Migration Toolkit
Device Management	Single device in a single process	Yes	Yes
	Multiple devices in a single process	No	No
	Sharing 1 device between multiple processes	No	No
Custom Ops	Writing custom ops in TPC C	Yes	Yes
	Writing custom ops in CUDA	No	No
	Writing custom ops in Triton	No	No
Quantization	FP8 quantization	Yes	Yes
	Int8/16/32 quantization	No	No
	Int4 quantization	No	No
Data loader	Native PyTorch	Yes	Yes
Data loader	Gaudi Media Loader	Yes	Yes	Exposes Gaudi’s HW acceleration
Serving Solution	TGI	Yes	Yes
	Triton	Yes	Yes	See Triton Inference Server with Gaudi
	TorchServe	Yes	No	See TorchServe Inference Server with Gaudi
	vLLM	Yes	Yes
Operators	Aten ops	Yes	Yes	See PyTorch Operators
Operators	Fused operators	Yes	Yes	See Fused Optimizers and Custom Ops for Intel Gaudi
Other	torch.profiler	Yes	Yes
	TensorBoard	Yes	Yes
	Checkpoints	Yes	Yes
	Weights Sharing	Yes	Yes*	*Limited support on Lazy mode. See Weight Sharing.
	HPU stream and event support	Yes	Yes
	Native PyTorch SDPA	Yes	Yes	Limited support. See Using Fused Scaled Dot Product Attention (FusedSDPA).
	Gaudi Optimized Flash Attention	Yes	Yes	Flash attention algorithm + additional Intel Gaudi optimizations. See Using Fused Scaled Dot Product Attention (FusedSDPA).
	FFT	No	No
	torch.cond	Yes	No
	torch.signal	No	No
	torch.special	No	No
	torch.func	No	No
	torch.hub	No	No
PyTorch Libraries	TorchVision	Yes	Yes
	TorchAudio	Yes	Yes
	TorchText	Yes	Yes
	TorchData	Yes	Yes
	TorchRec	No	No
	TorchArrow	No	No
	TorchX	No	No
	ExecuTorch	No	No

Gaudi Documentation 1.21.1 documentation

PyTorch Support Matrix

PyTorch Support Matrix¶