TensorFlow Operators
On this Page
TensorFlow Operators¶
Overview¶
This document summarizes the SynapseAI TensorFlow supported operators for GAUDI.
TensorFlow Operators Support Summary¶
TF OP |
Constraints |
Notes / Limitations |
---|---|---|
Abs |
T={bfloat16,float32,int32} |
|
Acos |
T={float32} |
|
Acosh |
T={float32} |
|
Add |
T={bfloat16,float32,int16,int32,int64} |
|
AddN |
T={bfloat16,float32,int16,int32,int64, int8,uint8} |
|
AddV2 |
T={bfloat16,float32,int16,int32,int64} |
|
AdjustContrastv2 |
T={float32} |
|
All |
Tidx={int32,int64} |
|
AnonymousIterator |
||
AnonymousIteratorV2 |
||
AnonymousIteratorV3 |
||
Any |
Tidx={int32,int64} |
|
ApplyAdaMax |
|
|
ApplyAdadelta |
|
|
ApplyAdagrad |
|
|
ApplyAdagradV2 |
|
|
ApplyAdam |
|
|
ApplyAddSign |
|
|
ApplyCenteredRMSProp |
|
|
ApplyFtrl |
|
|
ApplyFtrlV2 |
|
|
ApplyGradientDescent |
|
|
ApplyMomentum |
|
|
ApplyPowerSign |
|
|
ApplyRMSProp |
|
|
ApproximateEqual |
T={float32} |
|
ArgMax |
T={bfloat16,float32,int32,int8} Tidx={int32,int64} output_type={int32,int64} |
|
ArgMin |
T={bfloat16,float32} Tidx={int32,int64} output_type={int32,int64} |
|
Asin |
T={float32} |
|
Asinh |
T={float32} |
|
Assign |
|
|
AssignAdd |
|
|
AssignAddVariableOp |
dtype={bfloat16,float32,int32} |
|
AssignSub |
|
|
AssignSubVariableOp |
dtype={bfloat16,float32,int32} |
|
AssignVariableOp |
dtype={bfloat16,float32,int32} |
|
Atan |
T={float32} |
|
Atan2 |
T={float32} |
|
Atanh |
T={float32} |
|
AvgPool |
T={bfloat16,float32} |
|
AvgPool3D |
T={bfloat16,float32} |
|
AvgPool3DGrad |
T={bfloat16,float32} |
|
AvgPoolGrad |
T={bfloat16,float32} |
|
BatchMatMul |
T={bfloat16,float32} |
|
BatchMatMulV2 |
T={bfloat16,float32} |
|
BatchMatMulV3 |
Ta={bfloat16,float32} Tb={bfloat16,float32} Tout={bfloat16,float32} |
|
BatchToSpace |
T={bfloat16,float32} Tidx={int32} |
|
BatchToSpaceND |
T={bfloat16,float32} Tblock_shape={int32} Tcrops={int32} |
|
BesselI0 |
T={float32} |
|
BiasAdd |
T={bfloat16,float32} |
|
BiasAddGrad |
T={bfloat16,float32} |
|
Bincount |
T={float32,int32} |
|
Bitcast |
T={bfloat16,float32,int16,int32,int64, int8,uint16,uint32,uint64,uint8} |
|
BitwiseAnd |
T={int16,int32,int8,uint16,uint32, uint8} |
|
BitwiseOr |
T={int16,int32,int8,uint16,uint32, uint8} |
|
BitwiseXor |
T={int16,int32,int8,uint16,uint32, uint8} |
|
BlockLSTM |
T={float32} |
|
BlockLSTMGrad |
T={float32} |
|
BlockLSTMGradV2 |
T={float32} |
|
BlockLSTMV2 |
T={float32} |
|
BroadcastArgs |
T={int32,int64} |
|
BroadcastGradientArgs |
T={int32,int64} |
|
BroadcastTo |
T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} Tidx={int32,int64} |
|
Cast |
SrcT={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} DstT={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} |
|
Ceil |
T={bfloat16,float32} |
|
ClipByValue |
T={bfloat16,float32} |
|
CollectiveBcastRecv |
T={float32,int32,int64} |
|
CollectiveBcastRecvV2 |
T={float32,int32,int64} |
|
CollectiveBcastSend |
T={float32,int32,int64} |
|
CollectiveBcastSendV2 |
T={float32,int32,int64} |
|
CollectiveGather |
T={float32,int32,int64} |
|
CollectiveGatherV2 |
T={float32,int32,int64} |
|
CollectiveInitializeCommunicator |
||
CollectiveReduce |
T={bfloat16,float32} |
|
CollectiveReduceV2 |
T={bfloat16,float32} |
|
CollectiveReduceV3 |
T={bfloat16,float32} |
|
CombinedNonMaxSuppression |
||
Concat |
T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} |
|
ConcatOffset |
||
ConcatV2 |
T={bfloat16,bool,float32,int32,int8, uint8} Tidx={int32,int64} |
|
Const |
dtype={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint8} |
|
Conv2D |
T={bfloat16,float32} |
|
Conv2DBackpropFilter |
T={bfloat16,float32} |
|
Conv2DBackpropInput |
T={bfloat16,float32} |
|
Conv3D |
T={bfloat16,float32} |
|
Conv3DBackpropFilterV2 |
T={bfloat16,float32} |
|
Conv3DBackpropInputV2 |
T={bfloat16,float32} Tshape={int32,int64} |
|
Cos |
T={bfloat16,float32} |
|
Cosh |
T={bfloat16,float32} |
|
CropAndResize |
T={float32} |
|
CropAndResizeGradBoxes |
T={float32,int16,int32,int8,uint16, uint8} |
|
CropAndResizeGradImage |
T={float32} |
|
Cross |
T={bfloat16,float32,int32} |
|
Cumprod |
T={bfloat16,float32,int32} Tidx={int32,int64} |
|
Cumsum |
T={bfloat16,float32,int32} Tidx={int32,int64} |
|
DataFormatDimMap |
T={int32,int64} |
|
DataFormatVecPermute |
T={int32} |
|
DebugIdentityV2 |
||
DeepCopy |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8} |
|
DeleteIterator |
||
DenseBincount |
Tidx={int32} T={float32,int32} |
|
DepthToSpace |
T={bfloat16,float32} |
|
DepthwiseConv2dNative |
T={bfloat16,float32} |
|
DepthwiseConv2dNativeBackpropFilter |
T={bfloat16,float32} |
|
DepthwiseConv2dNativeBackpropInput |
T={bfloat16,float32} |
|
DestroyResourceOp |
||
Diag |
T={bfloat16,float32,int32} |
|
DiagPart |
T={bfloat16,float32,int32} |
|
Digamma |
T={float32} |
|
Div |
T={bfloat16,float32} |
|
DivNoNan |
T={bfloat16,float32} |
|
DynamicPartition |
T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} |
|
DynamicStitch |
T={bfloat16,bool,float32,int32,int8, uint8} |
|
Einsum |
T={bfloat16,float32} |
|
Elu |
T={bfloat16,float32} |
|
EluGrad |
T={bfloat16,float32} |
|
Empty |
dtype={bfloat16,bool,float32,int32,int64, uint8} |
|
EmptyTensorList |
||
EnsureShape |
T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} |
|
Equal |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8} |
|
Erf |
T={bfloat16,float32} |
|
Erfinv |
T={float32} |
|
EuclideanNorm |
T={bfloat16,float32} Tidx={int32,int64} |
|
Exp |
T={bfloat16,float32} |
|
ExpandDims |
T={bfloat16,bool,float32,int32,int8, uint8} Tdim={int32,int64} |
|
ExperimentalMapDataset |
||
ExperimentalSleepDataset |
||
Expm1 |
T={bfloat16,float32} |
|
Fill |
T={bfloat16,float32,int32} index_type={int32,int64} |
|
FinalizeDataset |
||
Floor |
T={bfloat16,float32} |
|
FloorDiv |
T={bfloat16,float32,int32} |
|
FloorMod |
T={int16,int32,int8,uint16,uint32, uint8} |
|
FusedBatchNorm |
T={float32} |
|
FusedBatchNormGrad |
T={float32} |
|
FusedBatchNormGradV2 |
T={bfloat16,float32} U={float32} |
|
FusedBatchNormGradV3 |
T={bfloat16,float32} U={float32} |
|
FusedBatchNormV2 |
T={bfloat16,float32} U={float32} |
|
FusedBatchNormV3 |
T={bfloat16,float32} U={float32} |
|
GRUBlockCell |
T={float32} |
|
GRUBlockCellGrad |
T={float32} |
|
Gather |
Tparams={bfloat16,float32,int32,int8} Tindices={int32,int64} |
|
GatherNd |
Tparams={bfloat16,float32,int32} Tindices={int32,int64} |
|
GatherV2 |
Tparams={bfloat16,bool,float32,int32,int64, int8} Tindices={int32,int64} Taxis={int32,int64} |
|
GeneratorDataset |
||
GetOptions |
||
Greater |
T={bfloat16,float32,int16,int32,int8, uint16,uint32,uint8} |
|
GreaterEqual |
T={bfloat16,float32,int32,int8,uint8} |
|
Identity |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8} |
|
IdentityN |
||
ImageProjectiveTransformV2 |
dtype={bfloat16,float32} |
|
ImageProjectiveTransformV3 |
dtype={bfloat16,float32} |
|
InTopK |
T={int32,int64} |
|
InTopKV2 |
T={int32,int64} |
|
Inv |
T={bfloat16,float32} |
|
InvGrad |
T={bfloat16,float32} |
|
Invert |
T={int16,int32,int8,uint16,uint32, uint8} |
|
InvertPermutation |
T={int32,int64} |
|
IsFinite |
T={bfloat16,float32} |
|
IsInf |
T={bfloat16,float32} |
|
IsNan |
T={bfloat16,float32} |
|
IteratorFromStringHandleV2 |
||
IteratorGetNext |
||
IteratorGetNextAsOptional |
||
IteratorGetNextSync |
||
IteratorToStringHandle |
||
IteratorV2 |
||
L2Loss |
T={bfloat16,float32} |
|
LRN |
T={bfloat16,float32} |
|
LRNGrad |
T={bfloat16,float32} |
|
LSTMBlockCell |
T={float32} |
|
LSTMBlockCellGrad |
T={float32} |
|
LeakyRelu |
T={bfloat16,float32} |
|
LeakyReluGrad |
T={bfloat16,float32} |
|
LeftShift |
T={int16,int32,int8,uint16,uint32, uint8} |
|
Less |
T={bfloat16,float32,int16,int32,int8, uint16,uint32,uint8} |
|
LessEqual |
T={bfloat16,float32,int16,int32,int8, uint16,uint32,uint8} |
|
Lgamma |
T={float32} |
|
LinSpace |
T={bfloat16,float32} Tidx={int32,int64} |
|
Log |
T={bfloat16,float32} |
|
Log1p |
T={bfloat16,float32} |
|
LogSoftmax |
T={bfloat16,float32} |
|
LogicalAnd |
||
LogicalNot |
||
LogicalOr |
||
Lu |
T={float32} output_idx_type={int32} |
|
MakeIterator |
||
MapClear |
||
MapIncompleteSize |
||
MapPeek |
||
MapSize |
||
MapStage |
||
MapUnstage |
||
MapUnstageNoKey |
||
MatMul |
T={bfloat16,float32} |
|
MatrixBandPart |
T={bfloat16,float32} Tindex={int32,int64} |
|
MatrixDiag |
T={bfloat16,float32} |
|
MatrixDiagPart |
T={bfloat16,float32} |
|
MatrixDiagPartV2 |
T={bfloat16,float32} |
|
MatrixDiagPartV3 |
T={bfloat16,float32} |
|
MatrixDiagV2 |
T={bfloat16,float32} |
|
MatrixDiagV3 |
T={bfloat16,float32} |
|
MatrixSetDiag |
T={bfloat16,float32} |
|
MatrixSetDiagV2 |
T={bfloat16,float32} |
|
MatrixSetDiagV3 |
T={bfloat16,float32} |
|
MatrixTriangularSolve |
T={bfloat16,float32} |
|
Max |
T={bfloat16,float32,int32,int64,int8, uint8} Tidx={int32,int64} |
|
MaxPool |
T={bfloat16,float32} |
|
MaxPool3D |
T={bfloat16,float32} |
|
MaxPool3DGrad |
T={bfloat16,float32} TInput={bfloat16,float32} |
|
MaxPoolGrad |
T={bfloat16,float32} |
|
MaxPoolGradV2 |
T={bfloat16,float32} |
|
MaxPoolV2 |
T={bfloat16,float32} |
|
Maximum |
T={bfloat16,float32,int32,int64} |
|
Mean |
T={bfloat16,float32} Tidx={int32,int64} |
|
Min |
T={bfloat16,float32,int32} Tidx={int32,int64} |
|
Minimum |
T={bfloat16,float32,int32} |
|
MirrorPad |
T={bfloat16,float32,int32} Tpaddings={int32,int64} |
|
MirrorPadGrad |
T={bfloat16,float32,int16,int32,uint16, uint32} Tpaddings={int32,int64} |
|
Mod |
T={bfloat16,float32,int32} |
|
Mul |
T={bfloat16,float32,int16,int32,int64, int8,uint16,uint32,uint8} |
|
MulNoNan |
T={float32} |
|
Multinomial |
T={bfloat16,float32} output_dtype={int32,int64} |
|
Ndtri |
T={float32} |
|
Neg |
T={bfloat16,float32,int32} |
|
NextAfter |
T={bfloat16,float32} |
|
NoOp |
||
NonMaxSuppressionV2 |
T={float32} T_threshold={float32} |
|
NonMaxSuppressionV3 |
T={float32} T_threshold={float32} |
|
NonMaxSuppressionV4 |
T={float32} T_threshold={float32} |
|
NonMaxSuppressionV5 |
T={float32} |
|
NotEqual |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} |
|
OneHot |
T={bfloat16,float32} TI={int32} |
|
OnesLike |
T={bfloat16,float32,int32,int8} |
|
OptionalFromValue |
||
OptionalGetValue |
||
OptionalHasValue |
||
OptionalNone |
||
OptionsDataset |
||
Pack |
T={bfloat16,bool,float32,int32} |
|
Pad |
T={bfloat16,float32,int32} Tpaddings={int32,int64} |
|
PadV2 |
T={bfloat16,float32,int32} Tpaddings={int32,int64} |
|
PartitionedCall |
||
PlaceholderWithDefault |
dtype={bfloat16,bool,float32,int32,int8} |
|
Pow |
T={float32} |
|
PrefetchDataset |
||
PreventGradient |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8} |
|
Prod |
T={bfloat16,float32,int32} Tidx={int32,int64} |
|
PyramidRoiAlign |
T={bfloat16,float32} |
|
PyramidRoiAlignGradImages |
T={bfloat16,float32} |
|
Qr |
T={float32} |
|
RaggedTensorToTensor |
T={float32} Tindex={int32,int64} Tshape={int32,int64} |
|
RandomShuffle |
T={bfloat16,float32,int32,int8} |
|
RandomStandardNormal |
dtype={bfloat16,float32} T={int32,int64} |
|
RandomUniform |
dtype={bfloat16,float32} T={int32,int64} |
|
RandomUniformInt |
Tout={int32} T={int32} |
|
Range |
Tidx={bfloat16,float32,int32} |
|
Rank |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} |
|
ReadVariableOp |
||
RealDiv |
T={bfloat16,float32} |
|
Reciprocal |
T={bfloat16,float32} |
|
ReciprocalGrad |
T={bfloat16,float32} |
|
Relu |
T={bfloat16,float32} |
|
Relu6 |
T={bfloat16,float32} |
|
Relu6Grad |
T={bfloat16,float32} |
|
ReluGrad |
T={bfloat16,float32} |
|
Reshape |
T={bfloat16,bool,float32,int32} Tshape={int32,int64} |
|
ResizeArea |
T={bfloat16,float32} |
|
ResizeBicubic |
T={bfloat16,float32} |
|
ResizeBicubicGrad |
T={float32} |
|
ResizeBilinear |
T={bfloat16,float32} |
|
ResizeBilinearGrad |
T={bfloat16,float32} |
|
ResizeNearestNeighbor |
T={bfloat16,float32} |
|
ResizeNearestNeighborGrad |
T={bfloat16,float32} |
|
ResourceApplyAdaMax |
T={bfloat16,float32} |
|
ResourceApplyAdadelta |
T={bfloat16,float32} |
|
ResourceApplyAdagrad |
T={bfloat16,float32} |
|
ResourceApplyAdagradV2 |
T={bfloat16,float32} |
|
ResourceApplyAdam |
T={bfloat16,float32} |
|
ResourceApplyAdamWithAmsgrad |
T={bfloat16,float32} |
|
ResourceApplyAddSign |
T={bfloat16,float32} |
|
ResourceApplyCenteredRMSProp |
T={bfloat16,float32} |
|
ResourceApplyFtrl |
T={bfloat16,float32} |
|
ResourceApplyFtrlV2 |
T={bfloat16,float32} |
|
ResourceApplyGradientDescent |
T={bfloat16,float32} |
|
ResourceApplyKerasMomentum |
T={bfloat16,float32} |
|
ResourceApplyMomentum |
T={float32} |
|
ResourceApplyPowerSign |
T={bfloat16,float32} |
|
ResourceApplyProximalAdagrad |
T={bfloat16,float32} |
|
ResourceApplyRMSProp |
T={bfloat16,float32} |
|
ResourceGather |
dtype={bfloat16,float32} Tindices={int32,int64} |
|
ResourceGatherNd |
dtype={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterAdd |
dtype={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterDiv |
dtype={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterMax |
dtype={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterMin |
dtype={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterMul |
dtype={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterNdAdd |
T={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterNdMax |
T={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterNdMin |
T={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterNdSub |
T={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterNdUpdate |
T={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterSub |
dtype={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceScatterUpdate |
dtype={bfloat16,float32,int32} Tindices={int32,int64} |
|
ResourceSparseApplyAdadelta |
T={bfloat16,float32} |
|
ResourceSparseApplyAdagrad |
T={bfloat16,float32} |
|
ResourceSparseApplyAdagradV2 |
T={bfloat16,float32} |
|
ResourceSparseApplyCenteredRMSProp |
T={bfloat16,float32} |
|
ResourceSparseApplyFtrl |
T={bfloat16,float32} |
|
ResourceSparseApplyFtrlV2 |
T={bfloat16,float32} |
|
ResourceSparseApplyKerasMomentum |
T={bfloat16,float32} |
|
ResourceSparseApplyProximalAdagrad |
T={bfloat16,float32} |
|
ResourceSparseApplyRMSProp |
T={bfloat16,float32} |
|
Reverse |
T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} |
|
ReverseSequence |
T={float32} |
|
ReverseV2 |
Tidx={int32} T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} |
|
RightShift |
T={int16,int32,int8,uint16,uint32, uint8} |
|
Rint |
T={bfloat16,float32} |
|
Roll |
T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} Tshift={int32,int64} Taxis={int32,int64} |
|
Round |
T={bfloat16,float32} |
|
Rsqrt |
T={bfloat16,float32} |
|
RsqrtGrad |
T={bfloat16,float32} |
|
ScatterNd |
T={bfloat16,float32,int32} Tindices={int32,int64} |
|
Select |
T={bfloat16,float32,int32} |
|
SelectV2 |
T={bfloat16,bool,float32,int32,int64} |
|
Selu |
T={bfloat16,float32} |
|
SeluGrad |
T={bfloat16,float32} |
|
Shape |
T={bfloat16,bool,float32,int32,int8} out_type={int32,int64} |
|
ShapeN |
T={bfloat16,float32,int32,int8} out_type={int32,int64} |
|
Sigmoid |
T={bfloat16,float32} |
|
SigmoidGrad |
T={bfloat16,float32} |
|
Sign |
T={bfloat16,float32} |
|
Sin |
T={bfloat16,float32} |
|
Sinh |
T={float32} |
|
Size |
T={bfloat16,float32,int32,int8} out_type={int32,int64} |
|
SleepDataset |
||
Slice |
T={bfloat16,float32,int32,int8} Index={int32} |
|
Snapshot |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint8} |
|
Softmax |
T={bfloat16,float32} |
|
SoftmaxCrossEntropyWithLogits |
T={bfloat16,float32} |
|
Softplus |
T={bfloat16,float32,int32,int8} |
|
SoftplusGrad |
T={bfloat16,float32} |
|
Softsign |
T={bfloat16,float32} |
|
SoftsignGrad |
T={bfloat16,float32} |
|
SpaceToBatch |
T={bfloat16,float32} Tpaddings={int32} |
|
SpaceToBatchND |
T={bfloat16,float32} Tblock_shape={int32} Tpaddings={int32} |
|
SpaceToDepth |
T={bfloat16,float32,uint8} |
|
SparseMatMul |
Ta={bfloat16,float32} Tb={bfloat16,float32} |
|
SparseSegmentMean |
T={bfloat16,float32} Tidx={int32} Tsegmentids={int32} |
|
SparseSegmentMeanGrad |
T={bfloat16,float32} Tidx={int32,int64} Tsegmentids={int32,int64} |
|
SparseSegmentMeanWithNumSegments |
T={bfloat16,float32} Tidx={int32} Tsegmentids={int32} |
|
SparseSegmentSqrtN |
T={bfloat16,float32} Tidx={int32} Tsegmentids={int32,int64} |
|
SparseSegmentSqrtNGrad |
T={bfloat16,float32} Tidx={int32,int64} Tsegmentids={int32,int64} |
|
SparseSegmentSqrtNWithNumSegments |
T={bfloat16,float32} Tidx={int32} Tsegmentids={int32} |
|
SparseSegmentSum |
T={bfloat16,float32} Tidx={int32} Tsegmentids={int32} |
|
SparseSegmentSumGrad |
T={bfloat16,float32} Tidx={int32,int64} Tsegmentids={int32,int64} |
|
SparseSegmentSumWithNumSegments |
T={bfloat16,float32} Tidx={int32} Tsegmentids={int32} |
|
SparseSoftmaxCrossEntropyWithLogits |
T={bfloat16,float32} Tlabels={int32,int64} |
|
SparseTensorDenseAdd |
T={bfloat16,float32} Tindices={int32} |
|
SparseToDense |
T={bool,float32,int16,int32,int64, int8,uint16,uint32,uint8} Tindices={int32,int64} |
|
Split |
T={bfloat16,float32} |
|
SplitV |
T={bfloat16,bool,float32,int32,int8} Tlen={int32,int64} |
|
Sqrt |
T={bfloat16,float32} |
|
SqrtGrad |
T={bfloat16,float32} |
|
Square |
T={bfloat16,float32,int16,int32,int64, int8} |
|
SquaredDifference |
T={bfloat16,float32,int32} |
|
Squeeze |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8} |
|
Stage |
||
StageClear |
||
StagePeek |
||
StageSize |
||
StatefulPartitionedCall |
||
StatefulStandardNormalV2 |
dtype={float32} |
|
StatefulUniform |
dtype={float32} |
|
StatefulUniformFullInt |
dtype={int32,uint32} |
|
StatelessRandomGetAlg |
||
StatelessRandomGetKeyCounter |
||
StatelessRandomGetKeyCounterAlg |
||
StatelessRandomUniform |
dtype={bfloat16,float32} T={int32,int64} Tseed={int32,int64} |
|
StatelessRandomUniformFullInt |
dtype={int32,int64,uint32,uint64} T={int32,int64} Tseed={int32,int64} |
|
StatelessRandomUniformFullIntV2 |
dtype={int32,int64,uint32,uint64} Tshape={int32,int64} |
|
StatelessRandomUniformInt |
dtype={int32,int64} T={int32,int64} Tseed={int32,int64} |
|
StatelessRandomUniformIntV2 |
dtype={int32,int64} Tshape={int32,int64} |
|
StatelessRandomUniformV2 |
dtype={bfloat16,float32} Tshape={int32,int64} |
|
StopGradient |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8} |
|
StridedSlice |
T={bfloat16,bool,float32,int32,int8, uint8} |
|
StridedSliceGrad |
T={bfloat16,bool,float32,int32,int8, uint8} |
|
Sub |
T={bfloat16,float32,int32} |
|
Sum |
T={bfloat16,float32,int32} Tidx={int32,int64} |
|
SymbolicGradient |
||
Tan |
T={float32} |
|
Tanh |
T={bfloat16,float32} |
|
TanhGrad |
T={bfloat16,float32} |
|
TensorArrayCloseV3 |
||
TensorArrayConcatV3 |
dtype={float32,int32} |
|
TensorArrayGatherV3 |
dtype={float32,int32} |
|
TensorArrayGradV3 |
||
TensorArrayGradWithShape |
||
TensorArrayReadV3 |
dtype={bfloat16,float32,int32} |
|
TensorArrayScatterV3 |
T={float32,int32} |
|
TensorArraySizeV3 |
||
TensorArraySplitV3 |
T={bfloat16,float32,int32} |
|
TensorArrayV3 |
dtype={bfloat16,float32,int32} |
|
TensorArrayWriteV3 |
T={float32,int32} |
|
TensorListConcatLists |
||
TensorListElementShape |
||
TensorListFromTensor |
element_dtype={bfloat16,bool,float32,int32,int64} |
|
TensorListGetItem |
element_dtype={bfloat16,bool,float32,int32,int64} |
|
TensorListLength |
||
TensorListPopBack |
element_dtype={bfloat16,bool,float32,int32,int64} |
|
TensorListPushBack |
||
TensorListReserve |
||
TensorListResize |
||
TensorListSetItem |
||
TensorListSplit |
element_dtype={bfloat16,bool,float32,int32,int64} |
|
TensorListStack |
element_dtype={bfloat16,bool,float32,int32,int64} |
|
TensorScatterAdd |
T={bfloat16,float32} Tindices={int32} |
|
TensorScatterMax |
T={bfloat16,float32} Tindices={int32} |
|
TensorScatterMin |
T={bfloat16,float32} Tindices={int32} |
|
TensorScatterSub |
T={bfloat16,float32} Tindices={int32} |
|
TensorScatterUpdate |
T={bfloat16,bool,float32,int32,int64, int8,uint16,uint8} Tindices={int32,int64} |
|
TensorStridedSliceUpdate |
T={bfloat16,float32,int32} Index={int32,int64} |
|
Tile |
T={bfloat16,bool,float32,int32,int8} |
|
TopK |
T={float32,int32,int64} |
|
TopKV2 |
T={float32,int32,int64} |
|
Transpose |
T={bfloat16,bool,float32,int16,int32, int8,uint8} Tperm={int32,int64} |
|
TruncateDiv |
T={int16,int32,int8,uint16,uint32, uint8} |
|
TruncateMod |
T={bfloat16,float32,int32} |
|
TruncatedNormal |
dtype={bfloat16,float32} T={int32,int64} |
|
Unique |
T={float32,int32} out_idx={int32} |
|
UniqueV2 |
T={float32,int32} Taxis={int32} out_idx={int32} |
|
UniqueWithCounts |
T={float32,int32} out_idx={int32} |
|
UniqueWithCountsV2 |
T={float32,int32} Taxis={int32} out_idx={int32} |
|
Unpack |
T={bfloat16,float32,int32} |
|
UnravelIndex |
Tidx={int32,int64} |
|
UnsortedSegmentSum |
T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} Tindices={int32} Tnumsegments={int32,int64} |
|
Unstage |
||
UnwrapDatasetVariant |
||
VarHandleOp |
dtype={bfloat16,float32,int32} |
|
VarIsInitializedOp |
||
Variable |
|
|
VariableShape |
out_type={int32,int64} |
|
VariableV2 |
|
|
Where |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint8} |
|
WrapDatasetVariant |
||
Xdivy |
T={float32} |
|
Xlog1py |
T={float32} |
|
Xlogy |
T={float32} |
|
ZerosLike |
T={bfloat16,bool,float32,int16,int32, int8,uint16,uint32,uint8} |
|
_Arg |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} |
|
_DeviceArg |
||
_DeviceRetval |
||
_FusedBatchNormEx |
T={bfloat16,float32} U={float32} |
|
_FusedBatchNormGradEx |
T={float32} U={float32} |
|
_FusedConv2D |
T={bfloat16,float32} |
|
_FusedConv3D |
T={bfloat16,float32} |
|
_Retval |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} |
|
_ScopedAllocator |
||
_ScopedAllocatorConcat |
||
_ScopedAllocatorSplit |
||
_TensorToHashBucketFast |
T={int16,int32,int64,int8,uint16, uint32,uint64,uint8} |
|
_VarHandlesOp |
Note
Int64 tensors are internally downcasted to Int32.
The following operator implementations are supplied by TensorFlow core (version 2.11.0), and automatically registered for all available devices, including Habana® Gaudi®.
TF OP |
Constraints |
Notes / Limitations |
---|---|---|
Assert |
||
BatchFunction |
||
Case |
||
CollectiveAssignGroupV2 |
||
ConsumeMutexLock |
||
ControlTrigger |
||
Copy |
||
CopyHost |
||
DebugGradientIdentity |
T={bfloat16,bool,float32,int16,int64, int8,uint16,uint32,uint64,uint8} |
|
DebugIdentity |
||
DebugNanCount |
T={float32} |
|
DebugNumericSummary |
T={bool,float32,int16,int32,int64, int8,uint16,uint32,uint64,uint8} |
|
DeleteSessionTensor |
||
DestroyTemporaryVariable |
T={bool,float32,int64,uint32} |
|
DeviceIndex |
||
DevicePlacementOp |
||
DisableCopyOnRead |
||
EagerPyFunc |
||
Enter |
||
Exit |
||
FIFOQueueV2 |
||
Fact |
||
FakeParam |
||
For |
||
HostConst |
||
If |
||
IsVariableInitialized |
dtype={bool,float32,int64,uint32} |
|
LoopCond |
||
MakeWeakResourceHandle |
||
Merge |
||
MutexLock |
||
MutexV2 |
||
NextIteration |
||
OrderedMapClear |
||
OrderedMapIncompleteSize |
||
OrderedMapPeek |
||
OrderedMapSize |
||
OrderedMapStage |
||
OrderedMapUnstage |
||
OrderedMapUnstageNoKey |
||
Placeholder |
||
PlaceholderV2 |
||
QueueCloseV2 |
||
QueueDequeueV2 |
||
QueueEnqueueV2 |
||
QueueIsClosedV2 |
||
QueueSizeV2 |
||
Recv |
||
RefIdentity |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} |
|
RemoteCall |
||
Send |
||
Stack |
||
StackClose |
||
StackCloseV2 |
||
StackPop |
elem_type={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} |
|
StackPopV2 |
elem_type={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} |
|
StackPush |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} |
|
StackPushV2 |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} |
|
StackV2 |
||
StatelessCase |
||
StatelessIf |
||
StatelessWhile |
||
Switch |
||
TemporaryVariable |
dtype={bool,float32,int64,uint32} |
|
While |
||
_ArrayToList |
T={float32,int32} |
|
_EagerConst |
T={bfloat16,bool,float32,int16,int32, int64,int8,uint16,uint32,uint64, uint8} |
|
_HostCast |
||
_HostRecv |
||
_HostSend |
||
_If |
||
_ListToArray |
T={float32,int32} |
|
_ReadVariablesOp |
||
_Recv |
||
_Send |
||
_SwitchN |
||
_While |
Custom Habana TensorFlow Operators Description¶
TensorFlow integration for Habana® Gaudi® adds a set of custom TensorFlow operators. All of them are designed to improve overall topology performance, leverage specialized TPC kernels.
The following custom operators are described with their purpose.
HabanaInstanceNorm¶
Implementation of InstanceNormalization operation on forward path. Performs normalization across all features of one channel. For small batch sizes its accuracy is more stable than batch norm.
Inputs: ‘input’ must be 4D (NHWC) or 5D (NDHWC), ‘beta’ and ‘gamma’ must be 1D (C).
Outputs: ‘output’ must be 4D (NHWC) or 5D (NDHWC), ‘mean’ and ‘istd’ must be 2D (NC).
Attributes: ‘epsilon’ is added to variance, ‘axis’ points on axis that should be normalized (currently only last axis is supported).
HabanaInstanceNormGrad¶
Implementation of InstanceNormalization operation on backward path. Calculates the gradients for the HabanaInstanceNormalization.
Inputs: ‘input’ and ‘grad_in’ must be 4D (NHWC) or 5D (NDHWC), ‘mean’ and ‘istd’ must be 2D (NC), ‘gamma’ must be 1D (C).
Outputs: ‘grad_out’ must be 4D (NHWC) or 5D (NDHWC), ‘grad_beta’ and ‘grad_gamma’ must be 1D (C).
Attributes: ‘epsilon’ is added to variance, ‘axis’ points on axis that should be normalized (currently only last axis is supported).
HabanaResampler¶
Implementation of Resampler operation on forward path. Replaces Addons>Resampler.
Inputs: ‘warp’ tensor must have depth/channel size = 2, batch_size of ‘data’ and ‘warp’ must match.
Outputs: ‘output’ tensor of resampled values from data, ‘output’ shape is determined by ‘warp’ shape.
HabanaResamplerGrad¶
Implementation of Resampler operation on backward path. Calculates the gradient for the HabanaResampler. Replaces Addons>ResamplerGrad.
Inputs: ‘grad_warp’ and ‘warp’ tensors must have depth/channel size = 2, batch_size of ‘data’ and ‘warp’ must match.