Multiple Dockers Each with a Single Workload

Multiple tenants also allows running a single workload on multiple dockers. Each of the created docker containers own part of the Gaudi processors exclusively and one distributed workload runs on each container.

../../../_images/Multiple_dockers_each_with_single_workload.JPG

Setting the Docker Container

You need to mount the used Gaudi on the docker properly. Choosing the mounted Gaudi can be done easily with the habana container runtime by setting HABANA_VISIBLE_DEVICES when starting the docker container. Below is an example of a docker run command which mounts four Gaudi processors with index 0, 1, 2, and 3.

docker run … --runtime=habana -e HABANA_VISIBLE_DEVICES=0,1,2,3 ...

There are some guidelines on setting HABANA_VISIBLE_DEVICES, however, you need to know how to find the mapping between the index and module ID of the Gaudi processors before reading the guidelines. The below command is a sample output of the mapping between index and module ID of the Gaudi processors:

$ hl-smi -Q index,module_id -f csv

index, module_id

3, 6

1, 4

2, 7

0, 5

4, 2

6, 0

7, 3

5, 1

With the mapping between index and module ID, you can set HABANA_VISIBLE_DEVICES properly with the guidelines below:

  • Mount two Gaudi Processors or four Gaudi Processors in the docker container. Even though using partial Gaudi in a distributed workload is possible, only 2-Gaudi and 4-Gaudi scenario are allowed. We recommend using module ID from the combinations below:

    • 2-Gaudi: “0,1”, “2,3”, “4,5” or “6,7”

    • 4-Gaudi: “0,1,2,3” or “4,5,6,7”

  • Since HABANA_VISIBLE_DEVICES accepts index instead of module ID, you need to leverage the above command to figure out the corresponding indices for a set of module IDs.

  • Avoid mounting the same index on multiple containers. Since multiple workloads might run in parallel, avoiding mounting the same Gaudi to multiple docker containers can prevent reusing the same Gaudi in different workloads.

Running Distributed Workload Inside the Docker Container

Though there is only one workload running in the container in this scenario, it’s necessary to set the environment variable HABANA_VISIBLE_MODULES as in the Multiple Workloads on a Single Docker scenario.

If you are the creator of the docker container, you should be able to get the corresponding module_id of the devices specified in HABANA_VISIBLE_DEVICES with the hl-smi command mentioned in the above section and set the environment variable HABANA_VISIBLE_MODULES accordingly.

If you are using a docker container created by others, you can run the below command inside of the container to get the module_id of Gaudis available in the container

$ hl-smi -Q module_id -f csv

module_id

4

6

5

7

According to the output in the above example, you can set the environment variable HABANA_VISIBLE_MODULES as below:

export HABANA_VISIBLE_MODULES="4,5,6,7"