MPI Operator for Kubernetes¶

Intel® Gaudi® uses the standard MPI Operator from Kubeflow that allows running MPI allreduce style workloads in Kubernetes and leveraging Gaudi accelerators. In combination with Intel Gaudi hardware and software, it enables large scale distributed training with simple Kubernetes job distribution model.

Prerequisites¶

Kubernetes version listed in the Support Matrix.
Intel Gaudi software drivers loaded on the system. For more details, refer to Installation Guide.
habana-container-runtime package is installed and set. For more details, refer to Installation Guide.

Installing MPI Operator¶

Follow MPI Operator documentation for instructions on setting up MPI Operator on your Kubernetes cluster.

Running Multi-Gaudi Workloads Example¶

Below is an example of a MPIJob on a MNIST model on 16 Gaudi devices.

Create mpijob-mnist.yaml file. Make sure to set the number of Gaudi nodes in Worker -> replicas:

apiVersion: kubeflow.org/v2beta1
kind: MPIJob
metadata:
  name: mnist-run
spec:
  slotsPerWorker: 8
  runPolicy:
    cleanPodPolicy: Running
  mpiReplicaSpecs:
    Launcher:
      replicas: 1
      template:
        spec:
          containers:
            - image: vault.habana.ai/gaudi-docker/1.16.2/ubuntu22.04/habanalabs/pytorch-installer-2.2.2:latest
              name: mnist-launcher
              command: ["/bin/bash", "-c"]
              args:
                - >-
                  HOSTSFILE=$OMPI_MCA_orte_default_hostfile;
                  MASTER_ADDR="$(head -n 1 $HOSTSFILE | sed -n s/[[:space:]]slots.*//p)";

                  NUM_NODES=$(wc -l < $HOSTSFILE);
                  CARDS_PER_NODE=8;
                  N_CARDS=$((NUM_NODES*CARDS_PER_NODE));

                  SETUP_CMD="git clone --branch 1.16.2 https://github.com/HabanaAI/Model-References.git /Model-References";
                  $SETUP_CMD;
                  mpirun --npernode 1 \
                    --tag-output \
                    --allow-run-as-root \
                    --prefix $MPI_ROOT \
                    $SETUP_CMD;

                  MODEL_PATH=/Model-References/PyTorch/examples/computer_vision/hello_world;
                  MNIST_CMD="python $MODEL_PATH/mnist.py \
                    --batch-size=64 \
                    --epochs=1 \
                    --lr=1.0 \
                    --gamma=0.7 \
                    --hpu";

                  cd $MODEL_PATH;
                  mpirun -np ${N_CARDS} \
                    --allow-run-as-root \
                    --bind-to core \
                    --map-by ppr:4:socket:PE=6 \
                    -rank-by core --report-bindings \
                    --tag-output \
                    --merge-stderr-to-stdout --prefix $MPI_ROOT \
                    -x MASTER_ADDR=$MASTER_ADDR \
                    $MNIST_CMD;
    Worker:
      replicas: 2
      template:
        spec:
          hostIPC: true
          containers:
            - image: vault.habana.ai/gaudi-docker/1.16.2/ubuntu22.04/habanalabs/pytorch-installer-2.2.2:latest
              name: mnist-worker
              resources:
                limits:
                  habana.ai/gaudi: 8
                  memory: 409Gi
                  hugepages-2Mi: 95000Mi
                requests:
                  habana.ai/gaudi: 8
                  memory: 409Gi
                  hugepages-2Mi: 95000Mi

Note

PyTorch uses shared memory buffers to communicate between processes. By default, Docker containers are allocated 64MB of shared memory. When using more than one HPU, this allocation can be insufficient. Setting hostIPC: true allows re-using the host’s shared memory space inside the container.
According to Kubernetes’ backoff policy, if a failure occurs, such as the worker pods are not running, the job is automatically restarted. This is useful for resuming long-running training from a checkpoint if an error causes the job to crash. For more information, refer to Kubernetes backoff failure policy.

Run the job:
kubectl apply -f mpijob-mnist.yaml
Check the pod status:
kubectl get pods -A
Retrieve the name of the pod and see the results:
kubectl logs <pod-name>

Gaudi Documentation 1.16.2 documentation

MPI Operator for Kubernetes

On this Page

MPI Operator for Kubernetes¶

Prerequisites¶

Installing MPI Operator¶

Running Multi-Gaudi Workloads Example¶