Multiple Dockers Each with a Single Workload
On this Page
Multiple Dockers Each with a Single Workload¶
Multiple tenants also allows running a single workload on multiple dockers. Each of the created docker containers own part of the Gaudi processors exclusively and one distributed workload runs on each container.
Setting the Docker Container¶
You need to mount the used Gaudi on the docker
properly. Choosing the mounted Gaudi can be done easily
with the habana container runtime by setting HABANA_VISIBLE_DEVICES
when starting the docker container. Below is an example of a docker run
command which mounts four Gaudi processors with index 0, 1, 2, and 3.
docker run … --runtime=habana -e HABANA_VISIBLE_DEVICES=0,1,2,3 ...
There are some guidelines on setting HABANA_VISIBLE_DEVICES
, however,
you need to know how to find the mapping between the index and module ID
of the Gaudi processors before reading the guidelines.
The below command is a sample output of the mapping between index
and module ID of the Gaudi processors:
$ hl-smi -Q index,module_id -f csv
index, module_id
3, 6
1, 4
2, 7
0, 5
4, 2
6, 0
7, 3
5, 1
With the mapping between index and module ID, you can set
HABANA_VISIBLE_DEVICES
properly with the guidelines below:
Mount two Gaudi Processors or four Gaudi Processors in the docker container. Even though using partial Gaudi in a distributed workload is possible, only 2-Gaudi and 4-Gaudi scenario are allowed. We recommend using module ID from the combinations below:
2-Gaudi: “0,1”, “2,3”, “4,5” or “6,7”
4-Gaudi: “0,1,2,3” or “4,5,6,7”
Since
HABANA_VISIBLE_DEVICES
accepts index instead of module ID, you need to leverage the above command to figure out the corresponding indices for a set of module IDs.Avoid mounting the same index on multiple containers. Since multiple workloads might run in parallel, avoiding mounting the same Gaudi to multiple docker containers can prevent reusing the same Gaudi in different workloads.
Running Distributed Workload Inside the Docker Container¶
Though there is only one workload running in the container in this scenario, it’s necessary to set the environment variable HABANA_VISIBLE_MODULES as in the Multiple Workloads on a Single Docker scenario.
If you are the creator of the docker container, you should be able to get the corresponding module_id of the devices specified in HABANA_VISIBLE_DEVICES with the hl-smi command mentioned in the above section and set the environment variable HABANA_VISIBLE_MODULES accordingly.
If you are using a docker container created by others, you can run the below command inside of the container to get the module_id of Gaudis available in the container
$ hl-smi -Q module_id -f csv
module_id
4
6
5
7
According to the output in the above example, you can set the environment variable HABANA_VISIBLE_MODULES as below
export HABANA_VISIBLE_MODULES="4,5,6,7"