1. TensorFlow Operators

1.1. Overview

This document summarizes the SynapseAI TensorFlow supported operators for GAUDI.

1.2. TensorFlow Operators Support Summary

Table 1.1 Supported TensorFlow Operators

TF OP

Constraints

Notes / Limitations

Abs

T={bfloat16,float32,int32}

Acos

T={float32}

Acosh

T={float32}

Add

T={bfloat16,float32,int16,int32}

AddN

T={bfloat16,float32,int16,int32}

AddV2

T={bfloat16,float32,int16,int32}

Addons>Resampler

T={bfloat16,float32}

Addons>ResamplerGrad

T={bfloat16,float32}

AdjustContrastv2

T={float32}

All

Tidx={int32,int64}

AnonymousIterator

AnonymousIteratorV2

Any

Tidx={int32,int64}

ApplyAdaMax

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdadelta

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdagrad

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdagradV2

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAdam

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyAddSign

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyCenteredRMSProp

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyFtrl

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyFtrlV2

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyGradientDescent

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyMomentum

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyPowerSign

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ApplyRMSProp

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

ArgMax

T={bfloat16,float32} Tidx={int32,int64} output_type={int32,int64}

  • Tidx=int64_t will be downcasted to int32

ArgMin

T={bfloat16,float32} Tidx={int32,int64} output_type={int32}

  • Tidx=int64_t will be downcasted to int32

Asin

T={float32}

Asinh

T={float32}

Assign

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

AssignAdd

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

AssignAddVariableOp

dtype={bfloat16,float32,int32}

AssignSub

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

AssignSubVariableOp

dtype={bfloat16,float32,int32}

AssignVariableOp

dtype={bfloat16,float32,int32}

Atan

T={float32}

Atanh

T={float32}

AvgPool

T={bfloat16,float32}

AvgPool3D

T={bfloat16,float32}

AvgPool3DGrad

T={bfloat16,float32}

AvgPoolGrad

T={bfloat16,float32}

BatchMatMul

T={bfloat16,float32}

BatchMatMulV2

T={bfloat16,float32}

BatchToSpace

T={bfloat16,float32} Tidx={int32}

BatchToSpaceND

T={bfloat16,float32} Tblock_shape={int32} Tcrops={int32}

BiasAdd

T={bfloat16,float32}

BiasAddGrad

T={bfloat16,float32}

Bincount

T={float32,int32,int64}

BitwiseAnd

T={int16,int32,int8,uint16,uint32, uint8}

BitwiseOr

T={int16,int32,int8,uint16,uint32, uint8}

BitwiseXor

T={int16,int32,int8,uint16,uint32, uint8}

BroadcastArgs

T={int32,int64}

BroadcastGradientArgs

T={int32,int64}

BroadcastTo

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} Tidx={int32,int64}

Cast

SrcT={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8} DstT={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8}

Ceil

T={bfloat16,float32}

ClipByValue

T={bfloat16,float32}

CollectiveBcastRecv

T={float32,int32,int64}

CollectiveBcastRecvV2

T={float32,int32,int64}

CollectiveBcastSend

T={float32,int32,int64}

CollectiveBcastSendV2

T={float32,int32,int64}

CollectiveGather

T={float32,int32,int64}

CollectiveGatherV2

T={float32,int32,int64}

CollectiveReduce

T={float32}

CollectiveReduceV2

T={bfloat16,float32}

CombinedNonMaxSuppression

Concat

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

ConcatOffset

ConcatV2

T={bfloat16,bool,float32,int32,int8, uint8} Tidx={int32,int64}

Const

dtype={bfloat16,bool,float32,int32,int64, int8,uint8}

ConsumeMutexLock

Conv2D

T={bfloat16,float32}

Conv2DBackpropFilter

T={bfloat16,float32}

Conv2DBackpropInput

T={bfloat16,float32}

Conv3D

T={bfloat16,float32}

Conv3DBackpropFilterV2

T={bfloat16,float32}

Conv3DBackpropInputV2

T={bfloat16,float32}

Cos

T={bfloat16,float32}

Cosh

T={bfloat16,float32}

CropAndResize

T={float32}

CropAndResizeGradImage

T={float32}

Cumprod

T={bfloat16,float32,int32} Tidx={int32,int64}

Cumsum

T={bfloat16,float32,int32} Tidx={int32,int64}

DataFormatVecPermute

T={int32}

DebugIdentityV2

DeleteIterator

DepthwiseConv2dNative

T={bfloat16,float32}

DepthwiseConv2dNativeBackpropFilter

T={bfloat16,float32}

  • data_format should be NHWC or NCHW.

DepthwiseConv2dNativeBackpropInput

T={bfloat16,float32}

  • data_format should be NHWC or NCHW.

DestroyResourceOp

DiagPart

T={bfloat16,float32,int32}

Div

T={bfloat16,float32}

DivNoNan

T={float32}

DynamicPartition

T={bfloat16,bool,float32,int32,int8}

DynamicStitch

T={bfloat16,float32,int32}

Einsum

T={bfloat16,float32}

Elu

T={bfloat16,float32}

EluGrad

T={bfloat16,float32}

EmptyTensorList

EnsureShape

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

Equal

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

Erf

T={bfloat16,float32}

EuclideanNorm

T={bfloat16,float32} Tidx={int32,int64}

Exp

T={bfloat16,float32}

ExpandDims

T={bfloat16,bool,float32,int32,int8, uint8} Tdim={int32,int64}

ExperimentalSleepDataset

Fill

T={bfloat16,float32,int32} index_type={int32,int64}

Floor

T={bfloat16,float32}

FloorDiv

T={bfloat16,float32,int32}

FloorMod

T={int16,int32,int8,uint16,uint32, uint8}

FusedBatchNorm

T={float32}

FusedBatchNormGrad

T={float32}

FusedBatchNormGradV2

T={bfloat16,float32}

FusedBatchNormGradV3

T={bfloat16,float32} U={float32}

FusedBatchNormV2

T={bfloat16,float32} U={float32}

FusedBatchNormV3

T={bfloat16,float32} U={float32}

Gather

Tparams={bfloat16,float32,int32,int8} Tindices={int32,int64}

GatherNd

Tparams={bfloat16,float32,int32} Tindices={int32,int64}

GatherV2

Tparams={bfloat16,bool,float32,int32,int8} Tindices={int32,int64} Taxis={int32,int64}

GeneratorDataset

Greater

T={bfloat16,float32,int32,int8,uint8}

GreaterEqual

T={bfloat16,float32,int32,int8}

HabanaInstanceNorm

HabanaInstanceNormGrad

Identity

T={bfloat16,bool,float32,int32,int64, int8,uint8}

InTopK

T={int32,int64}

InTopKV2

T={int32,int64}

Inv

T={bfloat16,float32}

InvGrad

T={bfloat16,float32}

Invert

T={int16,int32,int8,uint16,uint32, uint8}

InvertPermutation

T={int32,int64}

IsFinite

T={bfloat16,float32}

IsInf

T={bfloat16,float32}

IsNan

T={bfloat16,float32}

IteratorFromStringHandleV2

IteratorGetNext

IteratorGetNextAsOptional

IteratorGetNextSync

IteratorToStringHandle

IteratorV2

L2Loss

T={bfloat16,float32}

  • data_format should be NHWC.

LRN

T={bfloat16,float32}

LRNGrad

T={bfloat16,float32}

LeakyRelu

T={bfloat16,float32}

LeakyReluGrad

T={bfloat16,float32}

LeftShift

T={int16,int32,int8,uint16,uint32, uint8}

Less

T={bfloat16,float32,int32,int8}

LessEqual

T={bfloat16,float32,int32,int8}

LinSpace

T={bfloat16,float32} Tidx={int32,int64}

Log

T={bfloat16,float32}

Log1p

T={bfloat16,float32}

LogSoftmax

T={bfloat16,float32}

LogicalAnd

LogicalNot

LogicalOr

MakeIterator

MatMul

T={bfloat16,float32}

MatrixBandPart

T={bfloat16,float32} Tindex={int32,int64}

MatrixDiag

T={bfloat16,float32}

MatrixDiagPart

T={bfloat16,float32}

MatrixDiagPartV2

T={bfloat16,float32}

MatrixDiagPartV3

T={bfloat16,float32}

MatrixDiagV2

T={bfloat16,float32}

MatrixDiagV3

T={bfloat16,float32}

MatrixSetDiagV3

T={bfloat16,float32}

Max

T={bfloat16,float32,int32,int8,uint8} Tidx={int32,int64}

MaxPool

T={bfloat16,float32}

MaxPool3D

T={bfloat16,float32}

MaxPool3DGrad

T={bfloat16,float32} TInput={bfloat16,float32}

MaxPoolGrad

T={bfloat16,float32}

MaxPoolGradV2

T={bfloat16,float32}

MaxPoolV2

T={bfloat16,float32}

Maximum

T={bfloat16,float32,int32}

Mean

T={bfloat16,float32} Tidx={int32,int64}

Min

T={bfloat16,float32,int32} Tidx={int32,int64}

Minimum

T={bfloat16,float32,int32}

MirrorPad

T={bfloat16,float32,int32} Tpaddings={int32,int64}

MirrorPadGrad

T={bfloat16,float32,int32} Tpaddings={int32,int64}

Mod

T={bfloat16,float32,int32}

Mul

T={bfloat16,float32,int32}

MulNoNan

T={float32}

MutexLock

MutexV2

Neg

T={bfloat16,float32,int32}

NoOp

NonMaxSuppressionV3

T={float32}

NonMaxSuppressionV4

T={float32}

NonMaxSuppressionV5

T={float32}

NotEqual

T={bfloat16,bool,float32,int32,int8}

OneHot

T={float32} TI={int32}

OnesLike

T={bfloat16,float32,int32,int8}

OptionalFromValue

OptionalGetValue

OptionalHasValue

OptionalNone

Pack

T={bfloat16,bool,float32,int32}

Pad

T={bfloat16,float32,int32} Tpaddings={int32,int64}

PadV2

T={bfloat16,float32,int32} Tpaddings={int32,int64}

PartitionedCall

PlaceholderWithDefault

dtype={bfloat16,bool,float32,int32,int8}

Pow

T={float32}

PrefetchDataset

PreventGradient

T={bfloat16,bool,float32,int32,int64, int8,uint8}

Prod

T={bfloat16,float32,int32} Tidx={int32,int64}

PyramidRoiAlign

T={bfloat16,float32}

PyramidRoiAlignGradImages

T={bfloat16,float32}

Qr

T={float32}

RandomShuffle

T={bfloat16,float32,int32,int8}

RandomStandardNormal

dtype={bfloat16,float32} T={int32,int64}

RandomUniform

dtype={bfloat16,float32} T={int32,int64}

RandomUniformInt

Tout={int32} T={int32}

Range

Tidx={int32}

Rank

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

ReadVariableOp

RealDiv

T={bfloat16,float32}

Reciprocal

T={bfloat16,float32}

ReciprocalGrad

T={bfloat16,float32}

Relu

T={bfloat16,float32}

Relu6

T={bfloat16,float32}

Relu6Grad

T={bfloat16,float32}

ReluGrad

T={bfloat16,float32}

RemoteCall

Reshape

T={bfloat16,bool,float32,int32} Tshape={int32,int64}

ResizeArea

T={bfloat16,float32}

ResizeBicubic

T={bfloat16,float32}

  • Half_pixel_center mode is unsupported for now.

ResizeBicubicGrad

T={float32}

  • ResizeBicubicGradOp support inputs less than 16kb

  • Half_pixel_center mode is unsupported for now.

ResizeBilinear

T={bfloat16,float32}

ResizeBilinearGrad

T={bfloat16,float32}

ResizeNearestNeighbor

T={bfloat16,float32}

ResizeNearestNeighborGrad

T={bfloat16,float32}

ResourceApplyAdaMax

T={float32}

ResourceApplyAdadelta

T={float32}

ResourceApplyAdagradV2

T={float32}

ResourceApplyAdam

T={float32}

ResourceApplyAdamWithAmsgrad

T={float32}

ResourceApplyCenteredRMSProp

T={float32}

ResourceApplyFtrl

T={float32}

ResourceApplyFtrlV2

T={float32}

ResourceApplyGradientDescent

T={float32}

ResourceApplyKerasMomentum

T={float32}

ResourceApplyMomentum

T={float32}

ResourceApplyRMSProp

T={float32}

ResourceGather

dtype={bfloat16,float32} Tindices={int32,int64}

ResourceGatherNd

dtype={bfloat16,float32,int32} Tindices={int32}

ResourceScatterAdd

dtype={bfloat16,float32} Tindices={int32,int64}

ResourceScatterDiv

dtype={bfloat16,float32} Tindices={int32,int64}

ResourceScatterMax

dtype={bfloat16,float32} Tindices={int32,int64}

ResourceScatterMin

dtype={bfloat16,float32} Tindices={int32,int64}

ResourceScatterMul

dtype={bfloat16,float32} Tindices={int32,int64}

ResourceScatterSub

dtype={bfloat16,float32} Tindices={int32,int64}

ResourceScatterUpdate

dtype={bfloat16,float32} Tindices={int32,int64}

ResourceSparseApplyAdadelta

T={float32}

ResourceSparseApplyAdagradV2

T={float32}

ResourceSparseApplyCenteredRMSProp

T={float32}

ResourceSparseApplyFtrl

T={float32}

ResourceSparseApplyFtrlV2

T={float32}

ResourceSparseApplyKerasMomentum

T={float32}

ResourceSparseApplyRMSProp

T={float32}

Reverse

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

ReverseV2

Tidx={int32} T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8}

RightShift

T={int16,int32,int8,uint16,uint32, uint8}

Rint

T={bfloat16,float32}

Roll

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} Tshift={int32,int64} Taxis={int32,int64}

Round

T={bfloat16,float32}

Rsqrt

T={bfloat16,float32}

RsqrtGrad

T={bfloat16,float32}

ScatterNd

T={bfloat16,float32} Tindices={int32,int64}

Select

T={bfloat16,float32,int32}

SelectV2

T={bfloat16,bool,float32,int32}

Selu

T={bfloat16,float32}

SeluGrad

T={bfloat16,float32}

Shape

T={bfloat16,bool,float32,int32,int8} out_type={int32,int64}

ShapeN

T={bfloat16,float32,int32,int8} out_type={int32,int64}

Sigmoid

T={bfloat16,float32}

SigmoidGrad

T={bfloat16,float32}

Sign

T={bfloat16,float32}

Sin

T={bfloat16,float32}

Sinh

T={float32}

Size

T={bfloat16,float32,int32,int8} out_type={int32,int64}

SleepDataset

Slice

T={bfloat16,float32,int32,int8}

Snapshot

T={bfloat16,bool,float32,int32}

Softmax

T={bfloat16,float32}

SoftmaxCrossEntropyWithLogits

T={bfloat16,float32}

Softplus

T={bfloat16,float32,int32,int8}

SoftplusGrad

T={bfloat16,float32}

Softsign

T={bfloat16,float32}

SoftsignGrad

T={bfloat16,float32}

SpaceToBatchND

T={bfloat16,float32} Tblock_shape={int32} Tpaddings={int32}

SparseMatMul

Ta={bfloat16,float32} Tb={bfloat16,float32}

SparseSegmentMean

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSegmentMeanGrad

T={bfloat16,float32} Tidx={int32,int64} Tsegmentids={int32,int64}

SparseSegmentMeanWithNumSegments

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSegmentSqrtN

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSegmentSqrtNGrad

T={bfloat16,float32} Tidx={int32,int64} Tsegmentids={int32,int64}

SparseSegmentSqrtNWithNumSegments

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSegmentSum

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSegmentSumGrad

T={bfloat16,float32} Tidx={int32,int64} Tsegmentids={int32,int64}

SparseSegmentSumWithNumSegments

T={bfloat16,float32} Tidx={int32} Tsegmentids={int32}

SparseSoftmaxCrossEntropyWithLogits

T={bfloat16,float32} Tlabels={int32,int64}

SparseTensorDenseAdd

T={bfloat16,float32} Tindices={int32}

Split

T={bfloat16,float32}

SplitV

T={bfloat16,bool,float32,int32,int8} Tlen={int32,int64}

Sqrt

T={bfloat16,float32}

SqrtGrad

T={bfloat16,float32}

Square

T={bfloat16,float32,int16,int32,int8}

SquaredDifference

T={bfloat16,float32,int32}

Squeeze

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8}

Stage

StageClear

StagePeek

StageSize

StatefulPartitionedCall

StopGradient

T={bfloat16,bool,float32,int32,int64, int8,uint8}

StridedSlice

T={bfloat16,bool,float32,int32,int8, uint8}

StridedSliceGrad

T={bfloat16,bool,float32,int32,int8, uint8}

Sub

T={bfloat16,float32,int32}

Sum

T={bfloat16,float32,int32} Tidx={int32,int64}

SymbolicGradient

Tan

T={float32}

Tanh

T={bfloat16,float32}

TanhGrad

T={bfloat16,float32}

TensorArrayGatherV3

dtype={float32,int32}

TensorArrayGradV3

TensorArrayReadV3

dtype={float32,int32}

TensorArrayScatterV3

T={float32,int32}

TensorArrayV3

dtype={float32,int32}

TensorArrayWriteV3

T={float32,int32}

TensorListConcatLists

TensorListElementShape

TensorListFromTensor

element_dtype={bfloat16,bool,float32,int32,int64}

TensorListGetItem

element_dtype={bfloat16,bool,float32,int32,int64}

TensorListLength

TensorListPopBack

element_dtype={bfloat16,bool,float32,int32,int64}

TensorListPushBack

TensorListReserve

TensorListResize

TensorListSetItem

TensorListSplit

element_dtype={bfloat16,bool,float32,int32,int64}

TensorListStack

element_dtype={bfloat16,bool,float32,int32,int64}

TensorScatterAdd

T={bfloat16,float32} Tindices={int32}

TensorScatterMax

T={bfloat16,float32} Tindices={int32}

TensorScatterSub

T={bfloat16,float32} Tindices={int32}

TensorScatterUpdate

T={bfloat16,float32} Tindices={int32,int64}

Tile

T={bfloat16,bool,float32,int32,int8}

TopK

T={float32,int32}

TopKV2

T={float32,int32}

Transpose

T={bfloat16,bool,float32,int16,int32} Tperm={int32,int64}

TruncateDiv

T={int16,int32,int8}

TruncateMod

T={bfloat16,float32,int32}

TruncatedNormal

dtype={bfloat16,float32} T={int32,int64}

Unpack

T={bfloat16,float32,int32}

UnravelIndex

Tidx={int32,int64}

UnsortedSegmentSum

T={bfloat16,float32} Tindices={int32} Tnumsegments={int32,int64}

Unstage

UnwrapDatasetVariant

VarHandleOp

dtype={bfloat16,float32,int32}

VarIsInitializedOp

Variable

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

VariableShape

out_type={int32,int64}

VariableV2

  • User should refrain from using legacy variables on HPU.

  • In case user would like to explicitly use legacy variables, this restriction can be overriden by the following env variable:

  • TF_HABANA_ALLOW_LEGACY_VARIABLES_ON_CPU=true

Where

WrapDatasetVariant

Xdivy

T={float32}

ZerosLike

T={bfloat16,bool,float32,int16,int32, int8,uint16,uint8}

_Arg

_DeviceArg

_DeviceRetval

_FusedBatchNormEx

T={bfloat16,float32} U={float32}

_FusedConv2D

T={bfloat16,float32}

_Retval

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

_ScopedAllocator

_ScopedAllocatorConcat

_ScopedAllocatorSplit

Following operator implementations are supplied by TensorFlow core, and automatically registered for all available devices, including Habana® Gaudi®.

Table 1.2 TensorFlow Operators with default implementation in TensorFlow core automatically derived by all devices

TF OP

Constraints

Notes / Limitations

Assert

ControlTrigger

DebugGradientIdentity

T={bfloat16,bool,float32,int16,int64, int8,uint16,uint32,uint64,uint8}

DestroyTemporaryVariable

T={bool,float32,int64,uint32}

Enter

Exit

FIFOQueueV2

HostConst

IdentityN

IsVariableInitialized

dtype={bool,float32,int64,uint32}

LoopCond

Merge

NextIteration

Placeholder

PlaceholderV2

QueueCloseV2

QueueDequeueV2

QueueEnqueueV2

QueueIsClosedV2

QueueSizeV2

Recv

RefIdentity

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

Send

Stack

StackClose

StackCloseV2

StackPop

elem_type={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackPopV2

elem_type={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackPush

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackPushV2

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

StackV2

Switch

TemporaryVariable

dtype={bool,float32,int64,uint32}

_EagerConst

T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8}

_HostCast

_HostRecv

_HostSend

_ReadVariablesOp

_Recv

_Send

_SwitchN

_VarHandlesOp

1.3. Custom Habana TensorFlow Operators Description

TensorFlow integration for Habana® Gaudi® adds a set of custom TensorFlow operators. All of them are designed to improve overall topology performance, leverage specialized TPC kernels.

The following custom operators are described with their purpose.

1.3.1. HabanaInstanceNorm

Implementation of InstanceNormalization operation on forward path. Performs normalization across all features of one channel. For small batch sizes its accuracy is more stable than batch norm.

Inputs: ‘input’ must be 4D (NHWC) or 5D (NDHWC), ‘beta’ and ‘gamma’ must be 1D (C).

Outputs: ‘output’ must be 4D (NHWC) or 5D (NDHWC), ‘mean’ and ‘istd’ must be 2D (NC).

Attributes: ‘epsilon’ is added to variance, ‘axis’ points on axis that should be normalized (currently only last axis is supported).

1.3.2. HabanaInstanceNormGrad

Implementation of InstanceNormalization operation on backward path. Calculates the gradients for the HabanaInstanceNormalization.

Inputs: ‘input’ and ‘grad_in’ must be 4D (NHWC) or 5D (NDHWC), ‘mean’ and ‘istd’ must be 2D (NC), ‘gamma’ must be 1D (C).

Outputs: ‘grad_out’ must be 4D (NHWC) or 5D (NDHWC), ‘grad_beta’ and ‘grad_gamma’ must be 1D (C).

Attributes: ‘epsilon’ is added to variance, ‘axis’ points on axis that should be normalized (currently only last axis is supported).