Amazon ECS with Habana Getting Started Guide¶
This document provides guidelines on how to set up a Habana Deep Learning AMI on Amazon ECS. For those familiar with the AWS platform, the process of launching the ECS is simple as setting up a cluster and deploying jobs to the cluster. After setting the cluster and the node, you can deploy images to the cluster and start your AI applications for deep learning, machine learning and data science by leveraging the Gaudi Hardware to achieve optimal accelerated training and development.
Find the Latest AMI ID¶
You need to find the latest Habana ECS AMI ID to setup the node:
Open the EC2 homepage and choose the right region.
Click AMIs and search “habana” and “ecs”.
Select the AMI ID with the Source being aws-marketplace (ami-0c3a07f4b95ef293c).

Create a Cluster¶
Follow this guide Tutorial: Creating a Cluster with an EC2 Task Using the Amazon ECS CLI to learn how to create an ECS cluster. In order to use Habana devices, you need to:
Specify the instance type to dl1.24xlarge and add a parameter ‘–image-id ami-0c3a07f4b95ef293c’ (Find Latest Habana ECS AMI ID throught above guide) in Step 2 of the Tutorial
Set the environment
HABANA_VISIBLE_DEVICES=all
in Step 3 of the TutorialSet the runtime to habana in Step 3 of the Tutorial
For example, we use TensorFlow 2.9.1 with 1.5.0 Habana version, the docker-compose.yml
is:
version: '3'
services:
web:
image: vault.habana.ai/gaudi-docker/1.5.0/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.9.1:1.5.0-610
entrypoint: ["hl-smi"]
environment:
- HABANA_VISIBLE_DEVICES=all
logging:
driver: awslogs
options:
awslogs-group: ec2-tutorial
awslogs-region: us-east-1
awslogs-stream-prefix: web
runtime
and resources
should be specified as well. For example, the ecs-params.yml
is:
version: 1
task_definition:
services:
resources:
limits:
hugepages-2Mi: "21000Mi"
memory: 720Gi
requests:
hugepages-2Mi: "21000Mi"
memory: 720Gi
web:
cpu_shares: 100
mem_limit: 524288000
runtime: habana
View Results¶
Since we use hl-smi as the entrypoint
, the terminal will print all the 8 cards device info.
You can change the memory
, cpu_shares
, hugepages-2Mi
, etc. according to your model’s needs.