From a hardware perspective, the Habana® Gaudi® HPU supports the RoCEv2 protocol, RDMA over Converged Ethernet. Each Gaudi natively integrates ten 100Gigabit Ethernet ports supporting RoCEv2. Fig. 6 shows a server which features eight Gaudi devices. Within each Gaudi device, seven of the ten NIC ports are used for connecting to the other seven Gaudis within the server in an all-to-all processor configuration for scale-up and three are used for scale-out across servers.


Figure 6 Server Block Diagram

From a software perspective, Gaudi scaling with data parallelism in the TensorFlow framework is achieved using two distinct methods:

  • By using Habana Horovod (see Fig. 7).

  • By using HPUStrategy integrated with tf.distribute API.

The following versions are supported:

  • All supported TensorFlow versions. See the Support Matrix for more details.

  • Habana Horovod: a fork of the official release. See the Support Matrix for more details.


Figure 7 Gaudi Distributed Training Software Stack