MPI Operator for Kubernetes
On this Page
MPI Operator for Kubernetes¶
Habana uses the standard MPI Operator from Kubeflow that enables the running of MPI all reduce style workloads in Kubernetes, leveraging Gaudi accelerators. In combination with Habana’s hardware and software, it enables large scale distributed training with simple Kubernetes job distribution model.
Prerequisites¶
The below lists the prerequisites needed for running the Habana MPI Operator on Habana hardware:
1.10 <= Kubernetes version < 1.22
SynapseAI SW drivers loaded on the system.
Make sure the
habana-container-runtime
package is installed.Set up the
habana-container-runtime
.
The installation instructions for the above components are detailed in Installation Guide.
Running Multi-Gaudi Workloads¶
For more details on how to deploy and run workloads at a scale leveraging the MPI Operator, refer to the MPI operator documentation.