TensorFlow Operators

Overview

This document summarizes the SynapseAI TensorFlow supported operators for GAUDI.

TensorFlow Operators Support Summary

TF OP

Constraints

Notes / Limitations

Abs

T={bfloat16,float32,int32}

Acos

T={float32}

Acosh

T={float32}

Add

T={bfloat16,float32,int16,int32,int64}

AddN

T={bfloat16,float32,int16,int32,int64, int8,uint8}

AddV2

T={bfloat16,float32,int16,int32,int64}

AdjustContrastv2

T={float32}

All

Tidx={int32,int64}

AnonymousIterator

AnonymousIteratorV2

AnonymousIteratorV3

Any

Tidx={int32,int64}

ApplyAdaMax

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdadelta

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdagrad

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdagradV2

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdam

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAddSign

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyCenteredRMSProp

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyFtrl

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyFtrlV2

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyGradientDescent

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyMomentum

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyPowerSign

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyRMSProp

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApproximateEqual

T={float32}

ArgMax

T={bfloat16,float32,int32,int8} Tidx={int32,int64} output_type={int32,int64}

ArgMin

T={bfloat16,float32} Tidx={int32,int64} output_type={int32,int64}

Asin

T={float32}

Asinh

T={float32}

Assign

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

AssignAdd

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

AssignAddVariableOp

dtype={bfloat16,float32,int32}

AssignSub

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

AssignSubVariableOp

dtype={bfloat16,float32,int32}

AssignVariableOp

dtype={bfloat16,float32,int32}

  • validate_shape attribute is ignored.

Atan

T={float32}

Atan2

T={float32}

Atanh

T={float32}

AvgPool

T={bfloat16,float32}

AvgPool3D

T={bfloat16,float32}

AvgPool3DGrad

T={bfloat16,float32}

AvgPoolGrad

T={bfloat16,float32}

BatchMatMul

T={bfloat16,float32}

BatchMatMulV2

T={bfloat16,float32}

BatchMatMulV3

Ta={bfloat16,float32} Tb={bfloat16,float32} Tout={bfloat16,float32}

BatchToSpace

T={bfloat16,float32} Tidx={int32}

BatchToSpaceND

T={bfloat16,float32} Tblock_shape={int32} Tcrops={int32}

BesselI0

T={float32}

BiasAdd

T={bfloat16,float32}

BiasAddGrad

T={bfloat16,float32}

Bincount

T={float32,int32}

Bitcast

T={bfloat16,float32,int16,int32,int64, int8,uint16,uint32,uint64,uint8}

BitwiseAnd

T={int16,int32,int8,uint16,uint32, uint8}

BitwiseOr

T={int16,int32,int8,uint16,uint32, uint8}

BitwiseXor

T={int16,int32,int8,uint16,uint32, uint8}

BlockLSTM

T={float32}

BlockLSTMGrad

T={float32}

BlockLSTMGradV2

T={float32}

BlockLSTMV2

T={float32}

BroadcastArgs

T={int32,int64}

BroadcastGradientArgs

T={int32,int64}

BroadcastTo

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} Tidx={int32,int64}

Cast

SrcT={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} DstT={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

Ceil

T={bfloat16,float32}

ClipByValue

T={bfloat16,float32}

CollectiveBcastRecv

T={float32,int32,int64}

CollectiveBcastRecvV2

T={float32,int32,int64}

CollectiveBcastSend

T={float32,int32,int64}

CollectiveBcastSendV2

T={float32,int32,int64}

CollectiveGather

T={float32,int32,int64}

CollectiveGatherV2

T={float32,int32,int64}

CollectiveInitializeCommunicator

CollectiveReduce

T={bfloat16,float32}

CollectiveReduceV2

T={bfloat16,float32}

CollectiveReduceV3

T={bfloat16,float32}

CombinedNonMaxSuppression

Concat

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

ConcatOffset

ConcatV2

T={bfloat16,bool,float32,int32,int8, uint8} Tidx={int32,int64}

Const

dtype={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint8}

Conv2D

T={bfloat16,float32}

Conv2DBackpropFilter

T={bfloat16,float32}

Conv2DBackpropInput

T={bfloat16,float32}

Conv3D

T={bfloat16,float32}

Conv3DBackpropFilterV2

T={bfloat16,float32}

Conv3DBackpropInputV2

T={bfloat16,float32} Tshape={int32,int64}

Cos

T={bfloat16,float32}

Cosh

T={bfloat16,float32}

CropAndResize

T={float32}

CropAndResizeGradBoxes

T={float32,int16,int32,int8,uint16, uint8}

CropAndResizeGradImage

T={float32}

Cross

T={bfloat16,float32,int32}

Cumprod

T={bfloat16,float32,int32} Tidx={int32,int64}

Cumsum

T={bfloat16,float32,int32} Tidx={int32,int64}

DataFormatDimMap

T={int32,int64}

DataFormatVecPermute

T={int32}

DebugIdentityV2

DeepCopy

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8}

DeleteIterator

DenseBincount

Tidx={int32} T={float32,int32}

DepthToSpace

T={bfloat16,float32}

DepthwiseConv2dNative

T={bfloat16,float32}

DepthwiseConv2dNativeBackpropFilter

T={bfloat16,float32}

  • data_format should be NHWC or NCHW

DepthwiseConv2dNativeBackpropInput

T={bfloat16,float32}

  • data_format should be NHWC or NCHW

DestroyResourceOp

Diag

T={bfloat16,float32,int32}

DiagPart

T={bfloat16,float32,int32}

Digamma

T={float32}

Div

T={bfloat16,float32}

DivNoNan

T={bfloat16,float32}

DynamicPartition

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

DynamicStitch

T={bfloat16,bool,float32,int32,int8, uint8}

Einsum

T={bfloat16,float32}

Elu

T={bfloat16,float32}

EluGrad

T={bfloat16,float32}

Empty

dtype={bfloat16,bool,float32,int32,int64, uint8}

EmptyTensorList

EnsureShape

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

Equal

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8}

Erf

T={bfloat16,float32}

Erfinv

T={float32}

EuclideanNorm

T={bfloat16,float32} Tidx={int32,int64}

Exp

T={bfloat16,float32}

ExpandDims

T={bfloat16,bool,float32,int32,int8, uint8} Tdim={int32,int64}

ExperimentalMapDataset

ExperimentalSleepDataset

Expm1

T={bfloat16,float32}

Fill

T={bfloat16,float32,int32} index_type={int32,int64}

FinalizeDataset

Floor

T={bfloat16,float32}

FloorDiv

T={bfloat16,float32,int32}

FloorMod

T={int16,int32,int8,uint16,uint32, uint8}

FusedBatchNorm

T={float32}

FusedBatchNormGrad

T={float32}

FusedBatchNormGradV2

T={bfloat16,float32} U={float32}

FusedBatchNormGradV3

T={bfloat16,float32} U={float32}

FusedBatchNormV2

T={bfloat16,float32} U={float32}

FusedBatchNormV3

T={bfloat16,float32} U={float32}

GRUBlockCell

T={float32}

GRUBlockCellGrad

T={float32}

Gather

Tparams={bfloat16,float32,int32,int8} Tindices={int32,int64}

GatherNd

Tparams={bfloat16,float32,int32} Tindices={int32,int64}

GatherV2

Tparams={bfloat16,bool,float32,int32,int64, int8} Tindices={int32,int64} Taxis={int32,int64}

GeneratorDataset

GetOptions

Greater

T={bfloat16,float32,int16,int32,int8, uint16,uint32,uint8}

GreaterEqual

T={bfloat16,float32,int32,int8,uint8}

Identity

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8}

IdentityN

ImageProjectiveTransformV2

dtype={bfloat16,float32}

ImageProjectiveTransformV3

dtype={bfloat16,float32}

InTopK

T={int32,int64}

InTopKV2

T={int32,int64}

Inv

T={bfloat16,float32}

InvGrad

T={bfloat16,float32}

Invert

T={int16,int32,int8,uint16,uint32, uint8}

InvertPermutation

T={int32,int64}

IsFinite

T={bfloat16,float32}

IsInf

T={bfloat16,float32}

IsNan

T={bfloat16,float32}

IteratorFromStringHandleV2

IteratorGetNext

IteratorGetNextAsOptional

IteratorGetNextSync

IteratorToStringHandle

IteratorV2

L2Loss

T={bfloat16,float32}

  • data_format should be NHWC

LRN

T={bfloat16,float32}

LRNGrad

T={bfloat16,float32}

LSTMBlockCell

T={float32}

LSTMBlockCellGrad

T={float32}

LeakyRelu

T={bfloat16,float32}

LeakyReluGrad

T={bfloat16,float32}

LeftShift

T={int16,int32,int8,uint16,uint32, uint8}

Less

T={bfloat16,float32,int16,int32,int8, uint16,uint32,uint8}

LessEqual

T={bfloat16,float32,int16,int32,int8, uint16,uint32,uint8}

Lgamma

T={float32}

LinSpace

T={bfloat16,float32} Tidx={int32,int64}

Log

T={bfloat16,float32}

Log1p

T={bfloat16,float32}

LogSoftmax

T={bfloat16,float32}

LogicalAnd

LogicalNot

LogicalOr

Lu

T={float32} output_idx_type={int32}

MakeIterator

MapClear

MapIncompleteSize

MapPeek

MapSize

MapStage

MapUnstage

MapUnstageNoKey

MatMul

T={bfloat16,float32}

MatrixBandPart

T={bfloat16,float32} Tindex={int32,int64}

MatrixDiag

T={bfloat16,float32}

MatrixDiagPart

T={bfloat16,float32}

MatrixDiagPartV2

T={bfloat16,float32}

MatrixDiagPartV3

T={bfloat16,float32}

MatrixDiagV2

T={bfloat16,float32}

MatrixDiagV3

T={bfloat16,float32}

MatrixSetDiag

T={bfloat16,float32}

MatrixSetDiagV2

T={bfloat16,float32}

MatrixSetDiagV3

T={bfloat16,float32}

MatrixTriangularSolve

T={bfloat16,float32}

Max

T={bfloat16,float32,int32,int64,int8, uint8} Tidx={int32,int64}

MaxPool

T={bfloat16,float32}

MaxPool3D

T={bfloat16,float32}

MaxPool3DGrad

T={bfloat16,float32} TInput={bfloat16,float32}

MaxPoolGrad

T={bfloat16,float32}

MaxPoolGradV2

T={bfloat16,float32}

MaxPoolV2

T={bfloat16,float32}

Maximum

T={bfloat16,float32,int32,int64}

Mean

T={bfloat16,float32} Tidx={int32,int64}

Min

T={bfloat16,float32,int32} Tidx={int32,int64}

Minimum

T={bfloat16,float32,int32}

MirrorPad

T={bfloat16,float32,int32} Tpaddings={int32,int64}

MirrorPadGrad

T={bfloat16,float32,int16,int32,uint16, uint32} Tpaddings={int32,int64}

Mod

T={bfloat16,float32,int32}

Mul

T={bfloat16,float32,int16,int32,int64, int8,uint16,uint32,uint8}

MulNoNan

T={float32}

Multinomial

T={bfloat16,float32} output_dtype={int32,int64}

Ndtri

T={float32}

Neg

T={bfloat16,float32,int32}

NextAfter

T={bfloat16,float32}

NoOp

NonMaxSuppressionV2

T={float32} T_threshold={float32}

NonMaxSuppressionV3

T={float32} T_threshold={float32}

NonMaxSuppressionV4

T={float32} T_threshold={float32}

NonMaxSuppressionV5

T={float32}

NotEqual

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

OneHot

T={bfloat16,float32} TI={int32}

OnesLike

T={bfloat16,float32,int32,int8}

OptionalFromValue

OptionalGetValue

OptionalHasValue

OptionalNone

OptionsDataset

Pack

T={bfloat16,bool,float32,int32}

Pad

T={bfloat16,float32,int32} Tpaddings={int32,int64}

PadV2

T={bfloat16,float32,int32} Tpaddings={int32,int64}

PartitionedCall

PlaceholderWithDefault

dtype={bfloat16,bool,float32,int32,int8}

Pow

T={float32}

PrefetchDataset

PreventGradient

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8}

Prod

T={bfloat16,float32,int32} Tidx={int32,int64}

PyramidRoiAlign

T={bfloat16,float32}

PyramidRoiAlignGradImages

T={bfloat16,float32}

Qr

T={float32}

RaggedTensorToTensor

T={float32} Tindex={int32,int64} Tshape={int32,int64}

RandomShuffle

T={bfloat16,float32,int32,int8}

RandomStandardNormal

dtype={bfloat16,float32} T={int32,int64}

RandomUniform

dtype={bfloat16,float32} T={int32,int64}

RandomUniformInt

Tout={int32} T={int32}

Range

Tidx={bfloat16,float32,int32}

Rank

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

ReadVariableOp

RealDiv

T={bfloat16,float32}

Reciprocal

T={bfloat16,float32}

ReciprocalGrad

T={bfloat16,float32}

Relu

T={bfloat16,float32}

Relu6

T={bfloat16,float32}

Relu6Grad

T={bfloat16,float32}

ReluGrad

T={bfloat16,float32}

Reshape

T={bfloat16,bool,float32,int32} Tshape={int32,int64}

ResizeArea

T={bfloat16,float32}

ResizeBicubic

T={bfloat16,float32}

ResizeBicubicGrad

T={float32}

  • ResizeBicubicGradOp support inputs less than 16kb

ResizeBilinear

T={bfloat16,float32}

ResizeBilinearGrad

T={bfloat16,float32}

ResizeNearestNeighbor

T={bfloat16,float32}

ResizeNearestNeighborGrad

T={bfloat16,float32}

ResourceApplyAdaMax

T={bfloat16,float32}

ResourceApplyAdadelta

T={bfloat16,float32}

ResourceApplyAdagrad

T={bfloat16,float32}

ResourceApplyAdagradV2

T={bfloat16,float32}

ResourceApplyAdam

T={bfloat16,float32}

ResourceApplyAdamWithAmsgrad

T={bfloat16,float32}

ResourceApplyAddSign

T={bfloat16,float32}

ResourceApplyCenteredRMSProp

T={bfloat16,float32}

ResourceApplyFtrl

T={bfloat16,float32}

ResourceApplyFtrlV2

T={bfloat16,float32}

ResourceApplyGradientDescent

T={bfloat16,float32}

ResourceApplyKerasMomentum

T={bfloat16,float32}

ResourceApplyMomentum

T={float32}

ResourceApplyPowerSign

T={bfloat16,float32}

ResourceApplyProximalAdagrad

T={bfloat16,float32}

ResourceApplyRMSProp

T={bfloat16,float32}

ResourceGather

dtype={bfloat16,float32} Tindices={int32,int64}

ResourceGatherNd

dtype={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterAdd

dtype={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterDiv

dtype={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterMax

dtype={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterMin

dtype={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterMul

dtype={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterNdAdd

T={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterNdMax

T={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterNdMin

T={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterNdSub

T={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterNdUpdate

T={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterSub

dtype={bfloat16,float32,int32} Tindices={int32,int64}

ResourceScatterUpdate

dtype={bfloat16,float32,int32} Tindices={int32,int64}

ResourceSparseApplyAdadelta

T={bfloat16,float32}

ResourceSparseApplyAdagrad

T={bfloat16,float32}

ResourceSparseApplyAdagradV2

T={bfloat16,float32}

ResourceSparseApplyCenteredRMSProp

T={bfloat16,float32}

ResourceSparseApplyFtrl

T={bfloat16,float32}

ResourceSparseApplyFtrlV2

T={bfloat16,float32}

ResourceSparseApplyKerasMomentum

T={bfloat16,float32}

ResourceSparseApplyProximalAdagrad

T={bfloat16,float32}

ResourceSparseApplyRMSProp

T={bfloat16,float32}

Reverse

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

ReverseSequence

T={float32}

ReverseV2

Tidx={int32} T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

RightShift

T={int16,int32,int8,uint16,uint32, uint8}

Rint

T={bfloat16,float32}

Roll

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} Tshift={int32,int64} Taxis={int32,int64}

Round

T={bfloat16,float32}

Rsqrt

T={bfloat16,float32}

RsqrtGrad

T={bfloat16,float32}

ScatterNd

T={bfloat16,float32,int32} Tindices={int32,int64}

Select

T={bfloat16,float32,int32}

SelectV2

T={bfloat16,bool,float32,int32,int64}

Selu

T={bfloat16,float32}

SeluGrad

T={bfloat16,float32}

Shape

T={bfloat16,bool,float32,int32,int8} out_type={int32,int64}

ShapeN

T={bfloat16,float32,int32,int8} out_type={int32,int64}

Sigmoid

T={bfloat16,float32}

SigmoidGrad

T={bfloat16,float32}

Sign

T={bfloat16,float32}

Sin

T={bfloat16,float32}

Sinh

T={float32}

Size

T={bfloat16,float32,int32,int8} out_type={int32,int64}

SleepDataset

Slice

T={bfloat16,float32,int32,int8} Index={int32}

Snapshot

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint8}

Softmax

T={bfloat16,float32}

SoftmaxCrossEntropyWithLogits

T={bfloat16,float32}

Softplus

T={bfloat16,float32,int32,int8}

SoftplusGrad

T={bfloat16,float32}

Softsign

T={bfloat16,float32}

SoftsignGrad

T={bfloat16,float32}

SpaceToBatch

T={bfloat16,float32} Tpaddings={int32}

SpaceToBatchND

T={bfloat16,float32} Tblock_shape={int32} Tpaddings={int32}

SpaceToDepth

T={bfloat16,float32,uint8}

SparseMatMul

Ta={bfloat16,float32} Tb={bfloat16,float32}

SparseSegmentMean

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSegmentMeanGrad

T={bfloat16,float32} Tidx={int32,int64} Tsegmentids={int32,int64}

SparseSegmentMeanWithNumSegments

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSegmentSqrtN

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32,int64}

SparseSegmentSqrtNGrad

T={bfloat16,float32} Tidx={int32,int64} Tsegmentids={int32,int64}

SparseSegmentSqrtNWithNumSegments

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSegmentSum

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSegmentSumGrad

T={bfloat16,float32} Tidx={int32,int64} Tsegmentids={int32,int64}

SparseSegmentSumWithNumSegments

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSoftmaxCrossEntropyWithLogits

T={bfloat16,float32} Tlabels={int32,int64}

SparseTensorDenseAdd

T={bfloat16,float32} Tindices={int32}

SparseToDense

T={bool,float32,int16,int32,int64, int8,uint16,uint32,uint8} Tindices={int32,int64}

Split

T={bfloat16,float32}

SplitV

T={bfloat16,bool,float32,int32,int8} Tlen={int32,int64}

Sqrt

T={bfloat16,float32}

SqrtGrad

T={bfloat16,float32}

Square

T={bfloat16,float32,int16,int32,int64, int8}

SquaredDifference

T={bfloat16,float32,int32}

Squeeze

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8}

Stage

StageClear

StagePeek

StageSize

StatefulPartitionedCall

StatefulStandardNormalV2

dtype={float32}

StatefulUniform

dtype={float32}

StatefulUniformFullInt

dtype={int32,uint32}

StatelessRandomGetAlg

StatelessRandomGetKeyCounter

StatelessRandomGetKeyCounterAlg

StatelessRandomUniform

dtype={bfloat16,float32} T={int32,int64} Tseed={int32,int64}

StatelessRandomUniformFullInt

dtype={int32,int64,uint32,uint64} T={int32,int64} Tseed={int32,int64}

StatelessRandomUniformFullIntV2

dtype={int32,int64,uint32,uint64} Tshape={int32,int64}

StatelessRandomUniformInt

dtype={int32,int64} T={int32,int64} Tseed={int32,int64}

StatelessRandomUniformIntV2

dtype={int32,int64} Tshape={int32,int64}

StatelessRandomUniformV2

dtype={bfloat16,float32} Tshape={int32,int64}

StopGradient

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8}

StridedSlice

T={bfloat16,bool,float32,int32,int8, uint8}

StridedSliceGrad

T={bfloat16,bool,float32,int32,int8, uint8}

Sub

T={bfloat16,float32,int32}

Sum

T={bfloat16,float32,int32} Tidx={int32,int64}

SymbolicGradient

Tan

T={float32}

Tanh

T={bfloat16,float32}

TanhGrad

T={bfloat16,float32}

TensorArrayCloseV3

TensorArrayConcatV3

dtype={float32,int32}

TensorArrayGatherV3

dtype={float32,int32}

TensorArrayGradV3

TensorArrayGradWithShape

TensorArrayReadV3

dtype={bfloat16,float32,int32}

TensorArrayScatterV3

T={float32,int32}

TensorArraySizeV3

TensorArraySplitV3

T={bfloat16,float32,int32}

TensorArrayV3

dtype={bfloat16,float32,int32}

TensorArrayWriteV3

T={float32,int32}

TensorListConcatLists

TensorListElementShape

TensorListFromTensor

element_dtype={bfloat16,bool,float32,int32,int64}

TensorListGetItem

element_dtype={bfloat16,bool,float32,int32,int64}

TensorListLength

TensorListPopBack

element_dtype={bfloat16,bool,float32,int32,int64}

TensorListPushBack

TensorListReserve

TensorListResize

TensorListSetItem

TensorListSplit

element_dtype={bfloat16,bool,float32,int32,int64}

TensorListStack

element_dtype={bfloat16,bool,float32,int32,int64}

TensorScatterAdd

T={bfloat16,float32} Tindices={int32}

TensorScatterMax

T={bfloat16,float32} Tindices={int32}

TensorScatterMin

T={bfloat16,float32} Tindices={int32}

TensorScatterSub

T={bfloat16,float32} Tindices={int32}

TensorScatterUpdate

T={bfloat16,bool,float32,int32,int64, int8,uint16,uint8} Tindices={int32,int64}

TensorStridedSliceUpdate

T={bfloat16,float32,int32} Index={int32,int64}

Tile

T={bfloat16,bool,float32,int32,int8}

TopK

T={float32,int32,int64}

TopKV2

T={float32,int32,int64}

Transpose

T={bfloat16,bool,float32,int16,int32, int8,uint8} Tperm={int32,int64}

TruncateDiv

T={int16,int32,int8,uint16,uint32, uint8}

TruncateMod

T={bfloat16,float32,int32}

TruncatedNormal

dtype={bfloat16,float32} T={int32,int64}

Unique

T={float32,int32} out_idx={int32}

UniqueV2

T={float32,int32} Taxis={int32} out_idx={int32}

UniqueWithCounts

T={float32,int32} out_idx={int32}

UniqueWithCountsV2

T={float32,int32} Taxis={int32} out_idx={int32}

Unpack

T={bfloat16,float32,int32}

UnravelIndex

Tidx={int32,int64}

UnsortedSegmentSum

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} Tindices={int32} Tnumsegments={int32,int64}

Unstage

UnwrapDatasetVariant

VarHandleOp

dtype={bfloat16,float32,int32}

VarIsInitializedOp

Variable

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

VariableShape

out_type={int32,int64}

VariableV2

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

Where

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8}

WrapDatasetVariant

Xdivy

T={float32}

Xlog1py

T={float32}

Xlogy

T={float32}

ZerosLike

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

_Arg

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

_DeviceArg

_DeviceRetval

_FusedBatchNormEx

T={bfloat16,float32} U={float32}

_FusedBatchNormGradEx

T={float32} U={float32}

_FusedConv2D

T={bfloat16,float32}

_FusedConv3D

T={bfloat16,float32}

_Retval

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

_ScopedAllocator

_ScopedAllocatorConcat

_ScopedAllocatorSplit

_TensorToHashBucketFast

T={int16,int32,int64,int8,uint16, uint32,uint64,uint8}

_VarHandlesOp

Note

Int64 tensors are internally downcasted to Int32.

The following operator implementations are supplied by TensorFlow core (version 2.11.0), and automatically registered for all available devices, including Habana® Gaudi®.

TF OP

Constraints

Notes / Limitations

Assert

BatchFunction

Case

CollectiveAssignGroupV2

ConsumeMutexLock

ControlTrigger

Copy

CopyHost

DebugGradientIdentity

T={bfloat16,bool,float32,int16,int64, int8,uint16,uint32,uint64,uint8}

DebugIdentity

DebugNanCount

T={float32}

DebugNumericSummary

T={bool,float32,int16,int32,int64, int8,uint16,uint32,uint64,uint8}

DeleteSessionTensor

DestroyTemporaryVariable

T={bool,float32,int64,uint32}

DeviceIndex

DevicePlacementOp

DisableCopyOnRead

EagerPyFunc

Enter

Exit

FIFOQueueV2

Fact

FakeParam

For

HostConst

If

IsVariableInitialized

dtype={bool,float32,int64,uint32}

LoopCond

MakeWeakResourceHandle

Merge

MutexLock

MutexV2

NextIteration

OrderedMapClear

OrderedMapIncompleteSize

OrderedMapPeek

OrderedMapSize

OrderedMapStage

OrderedMapUnstage

OrderedMapUnstageNoKey

Placeholder

PlaceholderV2

QueueCloseV2

QueueDequeueV2

QueueEnqueueV2

QueueIsClosedV2

QueueSizeV2

Recv

RefIdentity

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

RemoteCall

Send

Stack

StackClose

StackCloseV2

StackPop

elem_type={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackPopV2

elem_type={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackPush

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackPushV2

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackV2

StatelessCase

StatelessIf

StatelessWhile

Switch

TemporaryVariable

dtype={bool,float32,int64,uint32}

While

_ArrayToList

T={float32,int32}

_EagerConst

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

_HostCast

_HostRecv

_HostSend

_If

_ListToArray

T={float32,int32}

_ReadVariablesOp

_Recv

_Send

_SwitchN

_While

Custom Habana TensorFlow Operators Description

TensorFlow integration for Habana® Gaudi® adds a set of custom TensorFlow operators. All of them are designed to improve overall topology performance, leverage specialized TPC kernels.

The following custom operators are described with their purpose.

HabanaInstanceNorm

Implementation of InstanceNormalization operation on forward path. Performs normalization across all features of one channel. For small batch sizes its accuracy is more stable than batch norm.

Inputs: ‘input’ must be 4D (NHWC) or 5D (NDHWC), ‘beta’ and ‘gamma’ must be 1D (C).

Outputs: ‘output’ must be 4D (NHWC) or 5D (NDHWC), ‘mean’ and ‘istd’ must be 2D (NC).

Attributes: ‘epsilon’ is added to variance, ‘axis’ points on axis that should be normalized (currently only last axis is supported).

HabanaInstanceNormGrad

Implementation of InstanceNormalization operation on backward path. Calculates the gradients for the HabanaInstanceNormalization.

Inputs: ‘input’ and ‘grad_in’ must be 4D (NHWC) or 5D (NDHWC), ‘mean’ and ‘istd’ must be 2D (NC), ‘gamma’ must be 1D (C).

Outputs: ‘grad_out’ must be 4D (NHWC) or 5D (NDHWC), ‘grad_beta’ and ‘grad_gamma’ must be 1D (C).

Attributes: ‘epsilon’ is added to variance, ‘axis’ points on axis that should be normalized (currently only last axis is supported).

HabanaResampler

Implementation of Resampler operation on forward path. Replaces Addons>Resampler.

Inputs: ‘warp’ tensor must have depth/channel size = 2, batch_size of ‘data’ and ‘warp’ must match.

Outputs: ‘output’ tensor of resampled values from data, ‘output’ shape is determined by ‘warp’ shape.

HabanaResamplerGrad

Implementation of Resampler operation on backward path. Calculates the gradient for the HabanaResampler. Replaces Addons>ResamplerGrad.

Inputs: ‘grad_warp’ and ‘warp’ tensors must have depth/channel size = 2, batch_size of ‘data’ and ‘warp’ must match.