Habana Communication Library (HCL) API Reference

Overview

The Habana Communication Library (HCL) enables efficient scale-up communication between Habana® Gaudi® processors within a single node and scale-out across nodes for distributed training, leveraging Gaudi’s high performance RDMA communication capabilities. It has an MPI look-and-feel and supports point-to-point operations (for example, Write, Send) and collective operations (for example, AllReduce, AlltoAll) that are performance optimized for Gaudi.

C API

Infrastructure

Typdefs

  • typedef HCL_Rank uint16_t

  • typedef HCL_Comm uint32_t

  • typedef HCL_Request uint64_t

Constants

  • HCL_COMM_WORLD: Global HCL_Comm

#define HCL_COMM_WORLD 0
  • HCL_INVALID_RANK: Invalid HCL_Rank value

#define HCL_INVALID_RANK (HCL_Rank)(-1)    // 0xFFFF

HCLStatus

  • eHCLSuccess

  • eHCLFail

  • eHCLInvalidArgumemt

  • eHCLBusy

  • eHCLNotSupported

HCL_Op

  • eHCLNone

  • eHCLSum

  • eHCLMul

HCL_Request

Parameter

Type

event

uint64_t

index

uint64_t

pIndex

uint64_t

HCL_CollectiveOp

  • eHCLReduce

  • eHCLAllReduce

  • eHCLReduceScatter

  • eHCLAll2All

  • eHCLBroadcast

  • eHCLAllGather

  • eHCLAll2AllV

HCL_Flags

  • eHCLNoFlag

  • eHCLWeakOrder

HCL JSON Configuration File Format

The config file is a Json file that describes the network topology. I

The configuration file name is passed to HCL as HCL_Init parameter. For a single box, no JSON config file is required. Pass NULL or empty string to the hcl init.

File structure and supported keys are described in the below sections.

HCL_PORT

Type: int32_t

Mandatory: No

Default value: 53432

DISABLED_PORTS

Type: set<uint8_t>

Mandatory: No

DISABLED_RANKS

Type: set<uint8_t>

Mandatory: No

HCL_COUNT

Type: int32_t

Mandatory: No

MAX Value: 512

HCL_COUNT contains the number of HCL devices. The number of devices is indicated using local host address (“127.0.0.1” or “::1”).

HCL_RANKS

Type: vector<string>

Mandatory: This field is mandatory when using multiple boxes.

HCL_RANKS contains the IP addresses of all HCL devices and overrides HCL_COUNT.

On Habana’s HLS-1, each box contains eight copies of this field with the host IP. Nodes with less than eight devices may need to modify the string.

HCL_COMMUNICATORS

Type: vector<json_object>

Mandatory: No

Communicators vector must be sequential and ordered by ID.

The following shows the communicator structure. The below fields are not mandatory.

  • ID - Type: uint32_t

  • DEVICES - Type: vector<int>

  • MODULE_OFFSET - Type: int

IPv6

Type: int32_t

Mandatory: No

Any value of IPv6 key indicates using IPv6 addresses.

HCL JSON Configuration File Format Examples

This section provides examples of several HCL JSON configuration file format.

{

"HCL_PORT": 5332,

"HCL_RANKS": [“192.168.10.166”,”192.168.10.165”,”192.168.10.164”]

}

Example 1 - Single node format:

{

"HCL_PORT": 5332,

"HCL_COUNT": 8

}
{

"HCL_PORT": 5332,

"HCL_COUNT": 4

}

Example 2 - Multiple nodes format (4 nodes):

{

    "HCL_PORT"5332,

    "HCL_RANKS"["192.168.16.49""192.168.16.49""192.168.16.49""192.168.16.49",

                 "192.168.16.49""192.168.16.49""192.168.16.49""192.168.16.49",

                  "192.168.16.107""192.168.16.107""192.168.16.107""192.168.16.107",

                  "192.168.16.107""192.168.16.107""192.168.16.107""192.168.16.107",

                  "192.168.16.112""192.168.16.112""192.168.16.112""192.168.16.112",

                  "192.168.16.112","192.168.16.112","192.168.16.112""192.168.16.112",

                  "192.168.16.114""192.168.16.114""192.168.16.114""192.168.16.114",

                  "192.168.16.114""192.168.16.114""192.168.16.114""192.168.16.114"]

}

Example 3 - Multiple nodes with IPv6 format (4 nodes):

{

   "IPv6"1,

   "HCL_PORT"5332,

    "HCL_RANKS"["fd5d:12c9:2205:1::1ed""fd5d:12c9:2205:1::1ed""fd5d:12c9:2205:1::1ed""fd5d:12c9:2205:1::1ed",

                  "fd5d:12c9:2205:1::1ed""fd5d:12c9:2205:1::1ed""fd5d:12c9:2205:1::1ed""fd5d:12c9:2205:1::1ed",

                  "fd5d:12c9:2205:1::11a""fd5d:12c9:2205:1::11a""fd5d:12c9:2205:1::11a""fd5d:12c9:2205:1::11a",

                  "fd5d:12c9:2205:1::11a""fd5d:12c9:2205:1::11a""fd5d:12c9:2205:1::11a""fd5d:12c9:2205:1::11a",

                  "fd5d:12c9:2205:1::14b""fd5d:12c9:2205:1::14b""fd5d:12c9:2205:1::14b""fd5d:12c9:2205:1::14b",

                  "fd5d:12c9:2205:1::14b""fd5d:12c9:2205:1::14b""fd5d:12c9:2205:1::14b""fd5d:12c9:2205:1::14b",

                  "fd5d:12c9:2205:1::12f""fd5d:12c9:2205:1::12f""fd5d:12c9:2205:1::12f""fd5d:12c9:2205:1::12f",

                  "fd5d:12c9:2205:1::12f""fd5d:12c9:2205:1::12f""fd5d:12c9:2205:1::12f""fd5d:12c9:2205:1::12f"]

}

HCL API

HCL_Init

HCLStatus HCL_Init(const synDeviceId deviceId, const char \*
configFileName)

Operation:

Initializes HCL. Parameters are provided in a configuration file.

Parameters:

Parameter

Description

deviceId

[in] The device-id to initialize HCL upon

configFileName

[in] Configuration file name. See configuration file format in HCL JSON Config file Format.

Return Value:

The status of the operation.

HCL_Comm_Rank

HCLStatus HCL_Comm_Rank(HCL_Comm comm, HCL_Rank\* rank)

Operation:

Provides the rank of a given process within the communicator.

Parameters:

Parameter

Description

comm

[in] The communicator the rank is a part of.

rank

[out] The HCL rank of the calling process.

HCL_Comm_Ranks

HCLStatus HCL_Comm_Ranks(HCL_Comm comm, HCL_Rank\* rankList, int count)

Operation:

Provides all the HCL ranks of a given process within the communicator.

Parameters:

Parameter

Description

comm

[in] The communicator the rank is a part of.

rankList

[out] List of all the HCL ranks within the communicator.

The user should allocate the list.

count

[int] Number of elements in the list.

Return Value:

The status of the operation.

HCL_Comm_Size

HCLStatus HCL_Comm_Size(HCL_Comm comm, int\* size)

Operation:

Setting the communicator size.

Parameters:

Parameter

Description

comm

[in] The communicator whose size is being requested.

size

[out] The size of the communicator.

Return Value:

The status of the operation.

HCL_Destroy

HCL_Destroy()

Operation:

Destroys HCL.

Parameters:

None.

Return Value:

The status of the operation.

HCL_Write_Tag

HCLStatus HCL_Write_Tag(uint64_t localAddress, HCL_Rank remoteRank,
    uint64_t remoteAddress, uint64_t size, uint32_t tag)

Operation:

Writes data to a remote memory while signaling the receive operation at the remote side waiting on the tag value.

This function is blocking.

Parameters:

Parameter

Description

localAddress

[in] The local HBM address to read from.

RemoteRank

[in] The ID of the remote rank.

remoteAddress

[in] The remote HBM address to write from.

size

[in] The size of the buffer that is being sent.

tag

[in] The tag of the operation.

Return Value:

The status of the operation.

HCL_IWrite_Tag

HCLStatus HCL_IWrite_Tag(HCL_Request\* phRequest, uint64_t localAddress,
    HCL_Rank remoteRank, uint64_t \*remoteAddress, uint64_t size,
    uint32_t tag)

Operation:

Writes data to a remote memory while signaling the receive operation at the remote side waiting on the tag value.

This function is non-blocking.

Parameters:

Parameter

Description

phRequest

[out] Returned handle to the request.

localAddress

[in] The local HBM address to read from.

RemoteRank

[in] The ID of the remote rank.

remoteAddress

[in] The remote HBM address to write to.

size

[in] The size of the buffer that is being sent.

tag

[in] The tag of the operation.

Return Value:

The status of the operation.

HCL_Write

HCLStatus HCL_Write(synStreamHandle streamHandle,
    uint64_t localAddress, HCL_Rank remoteRank,
    uint64_t remoteAddress, uint64_t size)

Operation:

Writes data to a remote memory. This function is blocking.

Parameters:

Parameter

Description

streamHandle

[in] Stream to enqueue operation to. Use nullptr for non-stream implementation.

localAddress

[in] The local HBM address to read from.

RemoteRank

[in] The ID of the remote rank.

remoteAddress

[in] The remote HBM address to write to.

size

[in] The size of the buffer that is being sent.

Return Value:

The status of the operation.

HCL_IWrite

HCLStatus HCL_IWrite( HCL_Request\* phRequest, uint64_t localAddress, HCL_Rank remoteRank,
    uint64_t remoteAddress, uint64_t size)

Operation:

Writes data to a remote memory. This function is non-blocking.

Parameters:

Parameter

Description

phRequest

[out] Returned handle to the request.

localAddress

[in] The local HBM address to read from.

RemoteRank

[in] The ID of the remote rank.

remoteAddress

[in] The remote HBM address to write to.

size

[in] The size of the buffer that is being sent.

Return Value:

The status of the operation.

HCL_ISend_Tag

HCLStatus HCL_ISend_Tag( HCL_Request\* phRequest, uint64_t localAddress,
uint64_t size, HCL_Rank remoteRank, uint32_t tag)

Operation:

Sends data to a remote buffer while signaling the receive operation at the remote side waiting on the tag value.

This function is non-blocking.

Parameters:

Parameter

Description

phRequest

[out] Returned handle to the request.

localAddress

[in] The local HBM address to read from.

size

[in] The size of the buffer that is being sent.

RemoteRank

[in] The ID of the remote rank.

tag

[in] The tag of the operation.

Return Value:

The status of the operation.

HCL_Send_Tag

HCLStatus HCL_Send_Tag ( uint64_t localAddress, uint64_t size, HCL_Rank
remoteRank, uint32_t tag)

Operation:

Sends data to a remote buffer while signaling the receive operation at the remote side waiting on the tag value.

This function is blocking.

Parameters:

Parameter

Description

localAddress

[in] The local HBM address to read from.

size

[in] The size of the buffer that is being sent.

RemoteRank

[in] The ID of the remote rank.

tag

[in] The tag of the operation.

Return Value:

The status of the operation.

HCL_Send

HCLStatus HCL_Send(synStreamHandle streamHandle, uint64_t localAddress,
uint64_t size, HCL_Rank remoteRank)

Operation:

Sends data to a remote buffer.

This function is blocking.

Parameters:

Parameter

Description

streamHandle

[in] Stream to enqueue operation to.

localAddress

[in] The local HBM address to read from.

size

[in] The size of the buffer that is being sent.

RemoteRank

[in] The ID of the remote rank.

Return Value:

The status of the operation.

HCL_IReceive_Tag

HCLStatus HCL_IReceive_Tag(HCL_Request\* phRequest, uint64_t localAddress, uint64_t size,
    HCL_Rank remoteRank, uint32_t tag)

Operation:

Receives data that was sent using the SEND/ISEND command. This function is non-blocking.

Parameters:

Parameter

Description

phRequest

[out] Returned handle to the receive-request.

localAddress

[in] The local HBM address to redirect the SEND data into.

size

[in] The size in bytes of the buffer that is being sent.

RemoteRank

[in] The ID of the remote rank. Use RANK_NONE to accept data from any rank.

tag

[in] The tag of the operation.

Return Value:

The status of the operation.

HCL_Receive_Tag

HCLStatus HCL_Receive_Tag(uint64_t localAddress, uint64_t size, HCL_Rank
remoteRank, uint32_t tag)

Operation:

Received data that was sent using the SEND/ISEND command.

This function is blocking.

Parameters:

Parameter

Description

localAddress

[in] The local HBM address to redirect the SEND data into.

size

[in] The size in bytes of the buffer that is being sent.

RemoteRank

[in] The ID of the remote rank. Use RANK_NONE to accept data from any rank.

tag

[in] The tag of the operation.

Return Value:

The status of the operation.

HCL_Receive

HCLStatus HCL_Receive(synStreamHandle streamHandle, uint64_t
localAddress, uint64_t size, HCL_Rank remoteRank)

Operation:

Receives data that was sent using the Send command.

This function is blocking.

Parameters:

Parameter

Description

streamHandle

[in] Stream to enqueue operation to.

localAddress

[in] The local HBM address to redirect the SEND data into.

size

[in] The size in bytes of the buffer that is being sent.

RemoteRank

[in] The ID of the remote rank. Use RANK_NONE to accept data from any rank.

Return Value:

The status of the operation.

HCL_IReceive_Write_Tag

HCLStatus HCL_IReceive_Write_Tag( HCL_Request\* phRequest, HCL_Rank
remoteRank, uint32_t tag)

Operation:

Receives data that was sent using the WRITE_IMMEDIATA/IWRITE_IMMEDIATA command. This function is non-blocking

Parameters:

Parameter

Description

phRequest

[out] Returned handled to the receive-request.

RemoteRank

[in] The ID of the remote rank. Use RANK_NONE to accept data from any rank.

tag

[in] The tag of the operation.

Return Value:

The status of the operation.

HCL_Receive_Write_Tag

HCLStatus HCL_Receive_Write_Tag(HCL_Rank remoteRank, uint32_t tag)

Operation:

Receives data that was sent using the WRITE_IMMEDIATA/IWRITE_IMMEDIATA command. This function is blocking.

Parameters:

Parameter

Description

RemoteRank

[in] The ID of the remote rank. Use RANK_NONE to accept data from any rank.

tag

[in] The tag of the operation.

Return Value:

The status of the operation.

HCL_Receive_Write

HCLStatus HCL_Receive_Write(synStreamHandle streamHandle, HCL_Rank
remoteRank)

Operation:

Receives data that was sent using the WRITE_IMMEDIATA/IWRITE_IMMEDIATA command. This function is blocking.

Parameters:

Parameter

Description

streamHandle

[in] Stream to enqueue operation to.

RemoteRank

[in] The ID of the remote rank. Use RANK_NONE to accept data from any rank.

Return Value:

The status of the operation.

HCL_Sync

HCLStatus HCL_Sync(HCL_Comm communicator, uint32_t tag)

Operation:

Waits on syncing between all the ranks in the communicator.

This function is blocking.

Parameters:

Parameter

Description

communicator

[in] The communicator the rank is a part of.

tag

[in] The tag of the operation.

Return Value:

The status of the operation.

HCL_Wait

HCLStatus HCL_Wait((HCL_Request phRequest, uint64_t microSeconds =
HCL_InfinityWait)

Operation:

Waits on an async request to complete.

This function is blocking.

Parameters:

Parameter

Description

phRequest

[in] Returned handle to the request.

microSeconds

[in] Timeout in micro Seconds, use HCL_InfinityWait to wait indefinitely

        or 0 to no-wait.

Return Value:

  • eHCLSuccess:   Request was completed.

  • eHCLBusy:    Reached  the timeout and request was not completed.

HCL_WaitList

HCLStatus HCL_WaitList(HCL_Request\* phRequest, int count, uint64_t
microSeconds = HCL_InfinityWait)

Operation:

Waits on an async request to complete.

This function is blocking.

Parameters:

Parameter

Description

phRequest

[in] Returned a list of handles to the request.

count

[in] Number of elements in the requests list.

microSeconds

[in] Timeout in micro Seconds. Use HCL_InfinityWait to wait indefinitely

        or 0 to no-wait.

Return Value:

  • eHCLSuccess: Request was completed.

  • eHCLBusy: Reached the timeout and request was not completed.

HCL_NetworkFlush

HCLStatus HCL_NetworkFlush(HCL_Request\* phRequest, synStreamHandle
streamHandl)

Operation:

Waits on an async request to complete.

This function is blocking.

Parameters:

Parameter

Description

phRequest

[in] Returned a list of handles to the request.

streamHandle

[in] Stream to record write events from.

Return Value:

The status of the operation

HCL_Get_Intermediate_Tensor_size

HCLStatus HCL_Get_Intermediate_Tensor_size(uint64_t\* intermediateSize,
    const HCL_CollectiveOp collectiveOp,
    const synTensorDescriptor tesnorDescriptor,
    const HCL_Comm communicator)

Operation:

Queries HCL for the required size of the buffers for given collective operations. On Gaudi, HBM management is strictly the responsibility of the user. Due to that, and in order not to sequester HBM memory by HCL in a static manner, memory allocation of intermediate buffers is the responsibility of the user and not of the HCL.

This function is blocking.

Parameters:

Parameter

Description

intermediateSize

[out] The required size of the intermediate buffer for the given operation.

collectiveOp

[in] The operation that is being queried.

tensorDescriptor

[in] The tensor descriptor describing the operation.

communicator

[in] The communicator on which to operate.

Return Value:

The status of the operation.

HCL_Get_Intermediate_Buffer_size

HCLStatus HCL_Get_Intermediate_Buffer_size(uint64_t\* intermediateSize,
    const HCL_CollectiveOp collectiveOp,
    const uint64_t count, synDataType datatype, const
HCL_Comm communicator);

Operation:

Queries HCL for the required size of buffers for a given collective operation. On Gaudi, HBM management is strictly the responsibility of the user. Due to that, and in order not to sequester HBM memory by HCL in a static manner, memory allocation of intermediate buffers is the responsibility of the user and not of the HCL.

This function is blocking.

Parameters:

Parameter

Description

intermediateSize

[out] The required size of the intermediate buffer for the given operation.

collectiveOp

[in] The operation that is being queried.

count

[in] The number of elements.

dataType

[int] The datatype of the elements.

communicator

[in] The communicator on which to operate.

Return Value:

The status of the operation.

HCL_Bcast

HCLStatus HCL_Bcast(synStreamHandle streamHandle, uint64_t sendBuffAddr,
uint64_t, receiveBuffAddr, uint64_t count, synDataType dataType,
HCL_Rank root, HCL_Comm communicator, const uint32_t flags)

Operation:

Broadcasts data from a single rank to all communicators.

This function is blocking.

Parameters:

Parameter

Description

streamHandle

[in] Stream to enqueue operation to. Use nullptr for non-stream implementation.

sendBuffAddr

[in] Buffer that will be sent.

receiveBuffAddr

[in] Buffer that will be written into.

count

[in] The number of elements of the buffer that is BCAST.

dataType

[in] The datatype of the operation.

root

[in] The rank of the root of the broadcast operation.

communicator

[in] The communicator on which to broadcast.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_IBcast

HCLStatus HCL_IBcast(HCL_Request\* phRequest, uint64_t sendBuffAddr,
uint64_t receiveBuffAddr, uint64_t count, synDataType dataType, HCL_Rank
root, HCL_Comm communicator, const uint32_t flags )

Operation:

Broadcasts data from a single rank to all communicator.

This function is non-blocking.

Parameters:

Parameter

Description

phRequest

[out] Returned handle to the broadcast request.

sendBuffAddr

[in] Buffer that will be sent.

receiveBuffAddr

[in] Buffer that will be written into.

count

[in] The number of elements of the buffer that is BCAST.

dataType

[in] The datatype of the operation

root

[in] The rank of the root of the broadcast operation

communicator

[in] The communicator on which to broadcast

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_Allreduce

HCLStatus HCL_Allreduce(synStreamHandle streamHandle, uint64_t
sendBuffAddr, uint64_t receiveBuffAddr, uint64_t count, synDataType
dataType, uint64_t intermediateBufferAddr, uint64_t intermediateSize,
HCL_Op op, HCL_Comm communicator, const uint32_t flags)

Operation:

Reduces data across the entire communicator.

Memory allocation: Reduction requires intermediate buffers. Since the user is responsible for HBM allocation, the user must call HCL_Get_Intermediate_Buffer_size to get the required buffer size, allocate it and provide it to the HCL for the function to succeed.

This function is blocking.

Parameters:

Parameter

Description

streamHandle

[in] stream to enqueue operation to. Use nullptr for non-stream implementation

sendBuffAddr

[in] The HBM address of the send buffer.

receiveBuffAddr

[in] The HBM address of the receive buffer.

count

[in] The number of elements of the buffer that is reduced.

dataType

[in] The datatype of the operation.

intermediateBufferAddr

[in] The HBM address of the intermediate buffer.

intermediateSize

[in] The size of the intermediate buffer allocated by the user.

op

[in] The reduction operation to perform (summation for example).

communicator

[in] The communicator on which to perform the reduction operation.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_IAllreduce

HCLStatus HCL_IAllreduce(HCL_Request\* phRequest,uint64_t sendBuffAddr,
    uint64_t receiveBuffAddr, uint64_t count, synDataType dataType,
    uint64_t intermediateBufferAddr , uint64_t intermediateSize,
    HCL_Op op, HCL_Comm communicator, const uint32_t flags)

Operation:

Reduces data across the entire communicator.

Memory allocation: Reduction requires intermediate buffers. Since the user is in charge of HBM allocation, the user is required to call HCL_Get_Intermediate_Buffer_size to get the required buffer size, allocate it and provide it to the HCL for the function to succeed.

This function is non-blocking.

Parameters:

Parameter

Description

phRequest

[out] Returned handle to the Reduction request

sendBuffAddr

[in] The HBM address of the send buffer

receiveBuffAddr

[in] The HBM address of the receive buffer.

count

[in] The number of elements of the buffer that is reduced.

dataType

[in] The datatype of the operation.

intermediateBufferAddr

[in] The HBM address of the intermediate buffer.

intermediateSize

[in] The size of the intermediate buffer allocated by the user.

op

[in] The reduction operation to perform (summation for example).

communicator

[in] The communicator on which to perform the reduction operation.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_Reduce

HCLStatus HCL_Reduce(synStreamHandle streamHandle, uint64_t
sendBuffAddr, uint64_t receiveBuffAddr, uint64_t count, synDataType
dataType, uint64_t intermediateBufferAddr, uint64_t intermediateSize,
uint16_t root, HCL_Op op, HCL_Comm communicator, const uint32_t flags);

Operation:

Reduces data across the entire communicator.

Memory allocation: As reduction requires intermediate buffers, and since the user owns HBM allocation, the user must call HCL_Get_Intermediate_Buffer_size to get the required buffer size, allocate it and provide it to the HCL for the function to succeed.

This function is blocking

Parameters:

Parameter

Description

streamHandle

[in] Stream to enqueue operation to. Use nullptr for non-stream implementation

sendBuffAddr

[in] The HBM address of the send buffer.

receiveBuffAddr

[in ] The HBM address of the receive buffer.

count

[in] The number of  elements of the buffer that is reduced.

dataType

[in] The datatype of the operation.

intermediateBufferAddr

[in] The  HBM address of the intermediate buffer.

intermediateSize

[in] The size of the in termediate buffer allocated by the user.

root

[in] The reduction opera tion to perform (summation for example);

op

[in] The communicator on w hich to perform the reduction operation.

communicator

[in] T he HBM address in all ranks is the same.  No address resolution step is required.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_IReduce

HCLStatus HCL_IReduce(HCL_Request\* phRequest, uint64_t sendBuffAddr,
uint64_t receiveBuffAddr, uint64_t count, synDataType dataType, uint64_t
intermediateBufferAddr, uint64_t intermediateSize, uint16_t root, HCL_Op
op, HCL_Comm communicator, const uint32_t flags);

Operation:

Reduces data across the whole communicator.

Memory allocation: As reduction requires intermediate buffers, and since the user owns HBM allocation, the user must call HCL_Get_Intermediate_Buffer_size to get the required buffer size, allocate it and provide it to HCL for the function to succeed.

This function is blocking

Parameters:

Parameter

Description

phRequest

sendBuffAddr

[in] The HBM address of the send buffer.

receiveBuffAddr

[in ] The HBM address of the receive buffer.

count

[in] The number of  elements of the buffer that is reduced.

dataType

[in] The datatype of the operation.

intermediateBufferAddr

[in] The  HBM address of the intermediate buffer.

intermediateSize

[in] The size of the in termediate buffer allocated by the user.

root

[in] The reduction opera tion to perform (summation for example);

op

[in] The communicator on w hich to perform the reduction operation.

communicator

[in] T he HBM address in all ranks is the same.  No address resolution step is required.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_Reduce_Scatter

HCLStatus HCL_Reduce_Scatter(uint64_t sendBuffAddr, uint64_t
receiveBuffAddr, uint64_t count,
synDataType dataType, uint64_t intermediateBufferAddr,
uint64_t intermediateSize, HCL_Op op, HCL_Comm communicator,
const uint32_t flags)

Operation:

Reduce-Scatter data across the entire communicator.

Memory allocation: As reduction requires intermediate buffers, and since the user owns HBM allocation, the user must call HCL_Get_Intermediate_Buffer_size to get the required buffer size, allocate it and provide it to the HCL for the function to succeed. The user is required to allocate receive buffer in the format receive[count * dataType]. The user is required to allocate a send buffer of [communicator_size][count * dataType].

This function is blocking.

Parameters:

Parameter

Description

sendBuffAddr

[in] The HBM address of the send buffer.

ReceiveBuffAddr

[in] The HBM address of the receive buffer.

count

[in] The number of elements of the buffer that is reduced.

dataType

[in] The datatype of the operation.

intermediateBufferAddr

[in] The HBM address of the intermediate buffer.

intermediateSize

[in] The size of the intermediate buffer allocated by the user.

op

[in] The reduction operation to perform (summation for example).

communicator

[in] The communicator on which to perform the reduction operation.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_IReduce_Scatter

HCLStatus HCL_IReduce_Scatter(synStreamHandle streamHandle,
HCL_Request\* phRequest, uint64_t sendBufAddr, uint64_t receiveBuffAddr,
uint64_t count, synDataType dataType, uint64_t intermediateBufferAddr ,
uint64_t intermediateSize, HCL_Op op, HCL_Comm communicator, const uint32_t
flags)

Operation:

Reduce-Scatter data across the entire communicator.

Memory allocation: As reduction requires intermediate buffers, and since the user owns HBM allocation, the user must call HCL_Get_Intermediate_Buffer_size to get the required buffer size, allocate it and provide it to the HCL for the function to succeed.

The user is required to allocate receive buffer in the format receive[count * dataType]. The user is required to allocate a send buffer of [communicator_size][count * dataType].

This function is non-blocking.

Parameters:

Parameter

Description

streamHandle

[in] Stream to enqueue operation to. Use nullptr for non-stream implementation

phRequest

[out] Returned handle to the Reduction request

sendBufAddr

[in] The HBM address of the send buffer.

ReceiveBuffAddr

[in] The HBM address of the receive buffer.

count

[in] The number of elements of the buffer that is reduced

dataType

[in] The datatype of the operation

intermediateBufferAddr

[in] The HBM address of the intermediate buffer.

intermediateSize

[in] The size of the intermediate buffer allocated by the user.

op

[in] The reduction operation to perform (summation for example).

communicator

[in] The communicator on which to perform the reduction operation.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_AlltoAll

HCLStatus HCL_AlltoAll(synStreamHandle streamHandle, uint64_t
sendBufAddr, uint64_t receiveBuffAddr, uint64_t count, synDataType
dataType, uint64_t intermediateBufferAddr, uint64_t intermediateSize,
HCL_Comm communicator, const uint32_t flags)

Operation:

Perform all to all communication across the entire communicator.

This function is blocking.

Parameters:

Parameter

Description

streamHandle

[in] Stream to enqueue operation to. Use nullptr for non-stream implementation.

sendBufAddr

[in] The HBM address of the send buffer. The size of this buffer is count * dataType

receiveBuffAddr

[in] The HBM address of the receive buffer. The size of this buffer is count * dataType.

count

[in] The number of elements of the buffer that is moved.

dataType

[in] The datatype of the operation.

intermediateBufferAddr

[in] The HBM address of the intermediate buffer.

intermediateSize

[in] The size of the intermediate buffer allocated by the user.

communicator

[in] The communicator on which to perform the reduction operation.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_IAlltoAll

HCLStatus HCL_IAlltoAll(HCL_Request\* phRequest, uint64_t sendBufAddr,
uint64_t receiveBuffAddr, uint64_t count, synDataType dataType,
uint64_t intermediateBufferAddr , uint64_t intermediateSize,
HCL_Comm communicator, const uint32_t flags)

Operation:

Performs all to all communication across the entire communicator.

This function is non-blocking.

Parameters:

Parameter

Description

phRequest

[out] Returned handle to the Reduction request.

sendBufAddr

[in] The HBM address of the send buffer. The size of this buffer is communicator_size * count * dataType.

receiveBuffAddr

[in] The HBM address of the receive buffer. The size of this buffer is communicator_size * count * dataType.

count

[in] The number of elements of the buffer that is moved.

dataType

[in] The datatype of the operation.

intermediateBufferAddr

[in] The HBM address of the intermediate buffer.

intermediateSize

[in] The size of the intermediate buffer allocated by the user.

communicator

[in] The communicator on which to perform the reduction operation.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_AlltoAllv

HCLStatus HCL_AlltoAllv (synStreamHandle streamHandle, uint64_t
sendBufAddr, uint64_t receiveBuffAddr, uint64_t\* sendCnts,uint64_t\*
sDispls, uint64_t\* recvCnts, uint64_t\* rDispls, synDataType dataType,
uint64_t intermediateBufferAddr, uint64_t intermediateSize, HCL_Comm
communicator, const uint32_t flags)

Operation:

Performs all to all V communication across the entire communicator.

This function is blocking.

Parameters:

Parameter

Description

streamHandle

[in] Stream to enqueue operation to. Use nullptr for non-stream implementation.

sendBufAddr

[in] The HBM address of the send buffer.

receiveBuffAddr

[in] The HBM address of the receive buffer.

sendCnts

[in] An array equal to the group size specifying the number of elements to send to each processor.

sDispls

[in] An array holding the displacements (relative to sendBufAddr) from which to take the outgoing data destined for each processor.

recvCnts

[in] An array equal to the group size specifying the maximum number of elements that can be received from each processor.

rDispls

[in] An array holding the displacements (relative to receiveBuffAddr) at which to place the incoming data from each processor.

datatype

[in] The datatype of the operation.

intermediateBufferAddr

[in] The HBM address of the intermediate buffer.

intermediateSize

[in] The size of the intermediate buffer allocated by the user.

communicator

[in] The communicator on which to perform the reduction operation.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_IAlltoAllv

HCLStatus HCL_IAlltoAllv (HCL_Request\* phRequest, uint64_t
  sendBufAddr,
  uint64_t receiveBuffAddr, uint64_t\* sendCnts, uint64_t\* sDispls,
  uint64_t\* recvCnts, uint64_t\* rDispls, synDataType dataType,
  uint64_t intermediateBufferAddr, uint64_t intermediateSize,
  HCL_Comm communicator, const uint32_t flags)

Operation:

Performs all to all V communication across the entire communicator.

This function is non-blocking.

Parameters:

Parameter

Description

phRequest

[out] Returned handle to the Reduction request.

sendBufAddr

[in] The HBM address of the send buffer.

receiveBuffAddr

[in] The HBM address of the receive buffer.

sendCnts

[in] An array equal to the group size specifying the number of elements to send to each processor.

sDispls

[in] An array holding the displacements (relative to sendBufAddr) from which to take the outgoing data destined for each processor.

recvCnts

[in] An array equal to the group size specifying the maximum number of elements that can be received from each processor.

rDispls

[in] An array holding the displacements (relative to receiveBuffAddr) at which to place the incoming data from each processor.

dataType

[in] The datatype of the operation.

intermediateBufferAddr

[in] The HBM address of the intermediate buffer.

intermediateSize

[in] The size of the intermediate buffer allocated by the user.

communicator

[in] The communicator on which to perform the reduction operation.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_AllGather

HCLStatus HCL_AllGather(synStreamHandle streamHandle, uint64_t
sendBufAddr, uint64_t receiveBuffAddr, uint64_t count, synDataType
dataType, HCL_Comm communicator, const uint32_t flags)

Operation:

Perform all gather communication across the whole communicator.

This function is blocking.

Parameters:

Parameter

Description

streamHandle

[in] stream to enqueue operation to. Use nullptr for non-stream implementation

sendBufAddr

[in] The HBM address of the send buffer. The size of this buffer is count * dataType

receiveBuffAddr

[in] The HBM address of the receive buffer. The size of this buffer is count * dataType.

count

[in] The number of elements of the buffer that is moved.

dataType

[in] The datatype of the operation.

communicator

[in] The communicator on which to perform the reduction operation.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_IAllGather

HCLStatus HCL_IAllGather(HCL_Request\* phRequest, uint64_t sendBufAddr,
uint64_t receiveBuffAddr, uint64_t count, synDataType dataType,
HCL_Comm communicator, const uint32_t flags)

Operation:

Perform all gather communication across the whole communicator.

This function is non-blocking.

Parameters:

Parameter

Description

phRequest

[out] Returned handle to the Reduction request.

sendBufAddr

[in] The HBM address of the send buffer. The size of this buffer is communicator_size * count * dataType.

receiveBuffAddr

[in] The HBM address of the receive buffer. The size of this buffer is communicator_size * count * dataType.

count

[in] The number of elements of the buffer that is moved.

dataType

[in] The datatype of the operation.

communicator

[in] The communicator on which to perform the reduction operation.

flags

[in] eHCLSameAddress - The HBM address in all ranks (including root) is the same. No address resolution step is required. eHCLWeakOrder - next op on same stream can start before all data is received on the receive buffer.

Return Value:

The status of the operation.

HCL_DoReduction

HCLStatus HCL_DoReduction(synStreamHandle streamHandle, uint64_t destBufferAddr,
uint64_t* sourceBufferArray, uint64_t sourceArraySize, uint32_t count, synDataType dataType,
HCL_Op op);

Operation:

Perform reduction on device.

This function is non-blocking.

Parameters:

Parameter

Description

streamHandle

[in] Stream to enqueue operation to. Use nullptr for non-stream implementation.

destBufferAddr

[in] The HBM address that the reduction result will be placed in. The size of this buffer is count * dataType.

sourceBufferArray

[in] Array of HBM addresses to perform reduction on. The size of each buffer is count * dataType

sourceArraySize

[in] Number of addresses in srcBuffArray

count

[in] The number of elements of the buffer that is reduced.

dataType

[in] The datatype of the operation.

op

[in] TThe reduction operation to perform (summation for example).

Return Value:

The status of the operation.