1. TensorFlow Operators

1.1. Overview

This document summarizes the SynapseAI TensorFlow supported operators for GAUDI.

1.2. TensorFlow Operators Support Summary

Table 1.1 Supported TensorFlow Operators

TF OP

Constraints

Notes / Limitations

Abs

T={bfloat16,float32,int32}

Acos

T={float32}

Acosh

T={float32}

Add

T={bfloat16,float32,int16,int32}

AddN

T={bfloat16,float32,int16,int32}

AddV2

T={bfloat16,float32,int16,int32}

Addons>Resampler

T={bfloat16,float32}

Addons>ResamplerGrad

T={bfloat16,float32}

AdjustContrastv2

T={float32}

All

Tidx={int32,int64}

AnonymousIterator

AnonymousIteratorV2

Any

Tidx={int32,int64}

ApplyAdaMax

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdadelta

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdagrad

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdagradV2

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdam

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAddSign

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyCenteredRMSProp

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyFtrl

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyFtrlV2

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyGradientDescent

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyMomentum

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyPowerSign

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyRMSProp

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ArgMax

T={bfloat16,float32} Tidx={int32} output_type={int32}

ArgMin

T={bfloat16,float32} Tidx={int32} output_type={int32}

Asin

T={float32}

Asinh

T={float32}

Assign

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

AssignAdd

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

AssignAddVariableOp

dtype={float32,int32}

AssignSub

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

AssignSubVariableOp

dtype={float32,int32}

AssignVariableOp

dtype={float32,int32}

Atan

T={float32}

Atanh

T={float32}

AvgPool

T={bfloat16,float32}

AvgPool3D

T={bfloat16,float32}

AvgPool3DGrad

T={bfloat16,float32}

AvgPoolGrad

T={bfloat16,float32}

BatchMatMul

T={bfloat16,float32}

BatchMatMulV2

T={bfloat16,float32}

BiasAdd

T={bfloat16,float32}

BiasAddGrad

T={bfloat16,float32}

BitwiseAnd

T={int16,int32,int8,uint16,uint32, uint8}

BitwiseOr

T={int16,int32,int8,uint16,uint32, uint8}

BitwiseXor

T={int16,int32,int8,uint16,uint32, uint8}

BroadcastArgs

T={int32,int64}

BroadcastGradientArgs

T={int32,int64}

BroadcastTo

T={bfloat16,float32,int32} Tidx={int32,int64}

Cast

SrcT={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint8} DstT={bfloat16,bool,float32,int16,int32, int8,uint16,uint8}

Ceil

T={bfloat16,float32}

ClipByValue

T={bfloat16,float32}

CollectiveBcastRecv

T={bfloat16,float32}

CollectiveBcastRecvV2

T={bfloat16,float32}

CollectiveBcastSend

T={bfloat16,float32}

CollectiveBcastSendV2

T={bfloat16,float32}

CollectiveGather

CollectiveGatherV2

T={bfloat16,float32}

CollectiveReduce

CollectiveReduceV2

T={bfloat16,float32}

CombinedNonMaxSuppression

Concat

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

ConcatOffset

ConcatV2

T={bfloat16,bool,float32,int32} Tidx={int32,int64}

Const

dtype={bfloat16,bool,float32,int32,int64, int8}

ConsumeMutexLock

Conv2D

T={bfloat16,float32}

Conv2DBackpropFilter

T={bfloat16,float32}

Conv2DBackpropInput

T={bfloat16,float32}

Conv3D

T={bfloat16,float32}

Conv3DBackpropFilterV2

T={bfloat16,float32}

Conv3DBackpropInputV2

T={bfloat16,float32}

Cos

T={bfloat16,float32}

Cosh

T={bfloat16,float32}

CropAndResize

T={float32}

CropAndResizeGradImage

T={float32}

Cumprod

T={bfloat16,float32,int32} Tidx={int32,int64}

Cumsum

T={bfloat16,float32,int32} Tidx={int32,int64}

DataFormatVecPermute

T={int32}

DebugIdentityV2

DeleteIterator

DepthwiseConv2dNative

T={bfloat16,float32}

DepthwiseConv2dNativeBackpropFilter

T={bfloat16,float32}

  • data_format should be NHWC or NCHW

DepthwiseConv2dNativeBackpropInput

T={bfloat16,float32}

  • data_format should be NHWC or NCHW

DestroyResourceOp

DiagPart

T={bfloat16,float32}

Div

T={bfloat16,float32}

DivNoNan

T={float32}

DynamicStitch

T={bfloat16,float32,int32}

Einsum

T={bfloat16,float32}

Elu

T={bfloat16,float32}

EluGrad

T={bfloat16,float32}

EmptyTensorList

EnsureShape

T={float32,int16,int32,int8,uint16, uint32,uint8}

Equal

T={bfloat16,bool,float32,int32,int8}

Erf

T={bfloat16,float32}

EuclideanNorm

T={bfloat16,float32} Tidx={int32,int64}

Exp

T={bfloat16,float32}

ExpandDims

T={bfloat16,bool,float32,int32,int8} Tdim={int32,int64}

ExperimentalSleepDataset

Fill

T={bfloat16,float32,int32} index_type={int32,int64}

Floor

T={bfloat16,float32}

FloorDiv

T={bfloat16,float32,int32}

FloorMod

T={int32}

FusedBatchNorm

T={float32}

FusedBatchNormGrad

T={float32}

FusedBatchNormGradV2

T={bfloat16,float32}

FusedBatchNormGradV3

T={bfloat16,float32} U={float32}

FusedBatchNormV2

T={bfloat16,float32} U={float32}

FusedBatchNormV3

T={bfloat16,float32} U={float32}

GatherNd

Tparams={bfloat16,float32,int32} Tindices={int32,int64}

GatherV2

Tparams={bfloat16,bool,float32,int32,int8} Tindices={int32,int64} Taxis={int32,int64}

GeneratorDataset

Greater

T={bfloat16,float32,int32,int8}

GreaterEqual

T={bfloat16,float32,int32,int8}

HabanaInstanceNorm

HabanaInstanceNormGrad

Identity

T={bfloat16,bool,float32,int32,int64, int8}

Inv

T={bfloat16,float32}

Invert

T={int16,int32,int8,uint16,uint32, uint8}

InvertPermutation

T={int32,int64}

IsFinite

T={bfloat16,float32}

IsInf

T={bfloat16,float32}

IsNan

T={bfloat16,float32}

IteratorFromStringHandleV2

IteratorGetNext

IteratorGetNextAsOptional

IteratorGetNextSync

IteratorToStringHandle

IteratorV2

L2Loss

T={bfloat16,float32}

  • data_format should be NHWC

LeakyRelu

T={bfloat16,float32}

LeakyReluGrad

T={bfloat16,float32}

LeftShift

T={int16,int32,int8,uint16,uint32, uint8}

Less

T={bfloat16,float32,int32,int8}

LessEqual

T={bfloat16,float32,int32,int8}

Log

T={bfloat16,float32}

Log1p

T={bfloat16,float32}

LogSoftmax

T={bfloat16,float32}

LogicalAnd

LogicalNot

LogicalOr

MakeIterator

MatMul

T={bfloat16,float32}

MatrixBandPart

T={bfloat16,float32} Tindex={int32,int64}

MatrixDiag

T={bfloat16,float32}

MatrixDiagPart

T={bfloat16,float32}

MatrixDiagPartV2

T={bfloat16,float32}

MatrixDiagPartV3

T={bfloat16,float32}

MatrixDiagV2

T={bfloat16,float32}

MatrixDiagV3

T={bfloat16,float32}

Max

T={bfloat16,float32} Tidx={int32,int64}

MaxPool

T={bfloat16,float32}

MaxPool3D

T={bfloat16,float32}

MaxPool3DGrad

T={bfloat16,float32}

MaxPoolGrad

T={bfloat16,float32}

MaxPoolGradV2

T={bfloat16,float32}

MaxPoolV2

T={bfloat16,float32}

Maximum

T={bfloat16,float32,int32}

Mean

T={bfloat16,float32} Tidx={int32,int64}

Min

T={bfloat16,float32} Tidx={int32,int64}

Minimum

T={bfloat16,float32,int32}

MirrorPad

T={bfloat16,float32,int32} Tpaddings={int32,int64}

MirrorPadGrad

T={bfloat16,float32,int32} Tpaddings={int32,int64}

Mod

T={bfloat16,float32,int16,int32,int8}

Mul

T={bfloat16,float32,int32}

MulNoNan

T={float32}

MutexLock

MutexV2

Neg

T={bfloat16,float32,int32}

NoOp

NonMaxSuppressionV3

T={float32}

NonMaxSuppressionV4

T={float32}

NotEqual

T={bfloat16,bool,float32,int32,int8}

OneHot

T={float32} TI={int32}

OnesLike

T={bfloat16,float32,int32,int8}

OptionalFromValue

OptionalGetValue

OptionalHasValue

OptionalNone

Pack

T={bfloat16,bool,float32,int32}

Pad

T={bfloat16,float32,int32} Tpaddings={int32,int64}

PadV2

T={bfloat16,float32,int32} Tpaddings={int32,int64}

PartitionedCall

PlaceholderWithDefault

dtype={bfloat16,bool,float32,int32,int8}

Pow

T={float32}

PrefetchDataset

PreventGradient

T={bfloat16,bool,float32,int32,int64, int8}

Prod

T={bfloat16,float32,int32} Tidx={int32,int64}

PyramidRoiAlign

T={bfloat16,float32}

PyramidRoiAlignGradImages

T={bfloat16,float32}

RandomShuffle

T={bfloat16,float32,int32,int8}

RandomStandardNormal

dtype={bfloat16,float32} T={int32,int64}

RandomUniform

dtype={bfloat16,float32} T={int32,int64}

RandomUniformInt

Tout={int32} T={int32}

Range

Tidx={int32}

Rank

ReadVariableOp

RealDiv

T={bfloat16,float32}

Reciprocal

T={bfloat16,float32}

Relu

T={bfloat16,float32}

Relu6

T={bfloat16,float32}

Relu6Grad

T={bfloat16,float32}

ReluGrad

T={bfloat16,float32}

RemoteCall

Reshape

T={bfloat16,bool,float32,int32} Tshape={int32,int64}

ResizeBilinear

T={bfloat16,float32}

ResizeBilinearGrad

T={bfloat16,float32}

ResizeNearestNeighbor

T={bfloat16,float32}

ResizeNearestNeighborGrad

T={bfloat16,float32}

ResourceApplyAdaMax

T={float32}

ResourceApplyAdadelta

T={float32}

ResourceApplyAdagradV2

T={float32}

ResourceApplyAdam

T={float32}

ResourceApplyAdamWithAmsgrad

T={float32}

ResourceApplyCenteredRMSProp

T={float32}

ResourceApplyFtrl

T={float32}

ResourceApplyFtrlV2

T={float32}

ResourceApplyGradientDescent

T={float32}

ResourceApplyKerasMomentum

T={float32}

ResourceApplyMomentum

T={float32}

ResourceApplyRMSProp

T={float32}

ResourceGather

dtype={float32} Tindices={int32,int64}

ResourceGatherNd

dtype={bfloat16,float32,int32} Tindices={int32}

ResourceScatterAdd

dtype={float32} Tindices={int32,int64}

ResourceScatterDiv

dtype={float32} Tindices={int32,int64}

ResourceScatterMax

dtype={float32} Tindices={int32,int64}

ResourceScatterMin

dtype={float32} Tindices={int32,int64}

ResourceScatterMul

dtype={float32} Tindices={int32,int64}

ResourceScatterSub

dtype={float32} Tindices={int32,int64}

ResourceScatterUpdate

dtype={float32} Tindices={int32,int64}

ResourceSparseApplyAdadelta

T={float32}

ResourceSparseApplyAdagradV2

T={float32}

ResourceSparseApplyCenteredRMSProp

T={float32}

ResourceSparseApplyFtrl

T={float32}

ResourceSparseApplyFtrlV2

T={float32}

ResourceSparseApplyKerasMomentum

T={float32}

ResourceSparseApplyRMSProp

T={float32}

ReverseV2

Tidx={int32} T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

RightShift

T={int16,int32,int8,uint16,uint32, uint8}

Rint

T={bfloat16,float32}

Round

T={bfloat16,float32}

Rsqrt

T={bfloat16,float32}

RsqrtGrad

T={bfloat16,float32}

ScatterNd

T={bfloat16,float32} Tindices={int32,int64}

Select

T={bfloat16,float32,int32}

SelectV2

T={bfloat16,float32,int32}

Selu

T={bfloat16,float32}

SeluGrad

T={bfloat16,float32}

Shape

T={bfloat16,float32,int32,int8} out_type={int32,int64}

ShapeN

T={bfloat16,float32,int32,int8} out_type={int32,int64}

Sigmoid

T={bfloat16,float32}

SigmoidGrad

T={bfloat16,float32}

Sign

T={bfloat16,float32}

Sin

T={bfloat16,float32}

Sinh

T={float32}

Size

T={bfloat16,float32,int32,int8} out_type={int32,int64}

SleepDataset

Slice

T={bfloat16,float32,int32,int8}

Snapshot

T={bfloat16,bool,float32,int32}

Softmax

T={bfloat16,float32}

SoftmaxCrossEntropyWithLogits

T={bfloat16,float32}

Softplus

T={bfloat16,float32,int32,int8}

SoftplusGrad

T={bfloat16,float32}

Softsign

T={bfloat16,float32}

SoftsignGrad

T={bfloat16,float32}

SparseMatMul

Ta={bfloat16,float32} Tb={bfloat16,float32}

SparseSegmentSum

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSegmentSumWithNumSegments

T={bfloat16,float32} Tidx={int32} Tnumsegments={int32,int64} Tsegmentids={int32}

SparseSoftmaxCrossEntropyWithLogits

T={bfloat16,float32} Tlabels={int32,int64}

Split

T={bfloat16,float32}

SplitV

T={bfloat16,bool,float32,int32,int8} Tlen={int32,int64}

Sqrt

T={bfloat16,float32}

SqrtGrad

T={bfloat16,float32}

Square

T={bfloat16,float32}

SquaredDifference

T={bfloat16,float32,int32}

Squeeze

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8}

Stage

StatefulPartitionedCall

StopGradient

T={bfloat16,bool,float32,int32,int64, int8}

StridedSlice

T={bfloat16,bool,float32,int32,int8}

StridedSliceGrad

T={bfloat16,bool,float32,int32,int8}

Sub

T={bfloat16,float32,int32}

Sum

T={bfloat16,float32,int32} Tidx={int32,int64}

Tan

T={float32}

Tanh

T={bfloat16,float32}

TanhGrad

T={bfloat16,float32}

TensorListConcatLists

TensorListElementShape

TensorListFromTensor

element_dtype={bfloat16,bool,float32,int32,int64}

TensorListGetItem

element_dtype={bfloat16,bool,float32,int32,int64}

TensorListLength

TensorListPopBack

element_dtype={bfloat16,bool,float32,int32,int64}

TensorListPushBack

TensorListReserve

TensorListResize

TensorListSetItem

TensorListSplit

element_dtype={bfloat16,bool,float32,int32,int64}

TensorScatterUpdate

T={bfloat16,float32} Tindices={int32,int64}

Tile

T={bfloat16,bool,float32,int32,int8}

TopK

T={float32,int32}

TopKV2

T={float32,int32}

Transpose

T={bfloat16,bool,float32,int16,int32} Tperm={int32,int64}

TruncateDiv

T={int16,int32,int8}

TruncatedNormal

dtype={bfloat16,float32} T={int32,int64}

Unpack

T={bfloat16,float32,int32}

UnravelIndex

Tidx={int32,int64}

UnsortedSegmentSum

T={bfloat16,float32} Tindices={int32,int64} Tnumsegments={int32,int64}

Unstage

UnwrapDatasetVariant

VarHandleOp

dtype={float32,int32}

VarIsInitializedOp

Variable

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

VariableShape

out_type={int32,int64}

VariableV2

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env varaiable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

Where

WrapDatasetVariant

Xdivy

T={float32}

ZerosLike

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint8}

_Arg

_DeviceArg

_DeviceRetval

_FusedBatchNormEx

T={bfloat16,float32} U={float32}

_FusedConv2D

T={bfloat16,float32}

_Retval

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

_ScopedAllocator

_ScopedAllocatorConcat

_ScopedAllocatorSplit

Following operator implementations are supplied by TensorFlow core, and automatically registered for all available devices, including Gaudi.

Table 1.2 TensorFlow Operators with default implementation in TensorFlow core automatically derived by all devices

TF OP

Constraints

Notes / Limitations

Assert

ControlTrigger

DebugGradientIdentity

T={bfloat16,bool,float32,int16,int64, int8,uint16,uint32,uint64,uint8}

DestroyTemporaryVariable

T={bool,float32,int64,uint32}

Enter

Exit

FIFOQueueV2

HostConst

IdentityN

IsVariableInitialized

dtype={bool,float32,int64,uint32}

LoopCond

Merge

NextIteration

Placeholder

PlaceholderV2

QueueCloseV2

QueueDequeueV2

QueueEnqueueV2

QueueIsClosedV2

QueueSizeV2

Recv

RefIdentity

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

Send

Stack

StackClose

StackCloseV2

StackPop

elem_type={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackPopV2

elem_type={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackPush

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackPushV2

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackV2

Switch

TemporaryVariable

dtype={bool,float32,int64,uint32}

_HostCast

_HostRecv

_HostSend

_ReadVariablesOp

_Recv

_Send

_SwitchN

_VarHandlesOp