4. Getting Started Guide ECS with Habana

4.1. Introduction

This document provides guidelines on how to set up a Habana Deep Learning AMI on Amazon ECS. For those familiar with the AWS platform, the process of launching the ECS is simple as setting up a cluster and deploying jobs to the cluster. After setting the cluster and the node, you can deploy images to the cluster and start your AI applications for deep learning, machine learning and data science by leveraging the Gaudi Hardware to achieve optimal accelerated training and development.

4.2. Find the Latest AMI ID

You need to find the latest Habana ECS AMI ID to setup the node:

  1. Open the EC2 homepage and choose the right region.

  2. Click AMIs and search “habana” and “ecs”.

  3. Select the AMI ID with the Source being aws-marketplace (ami-0c3a07f4b95ef293c).


4.3. Create a Cluster

Follow this guide Tutorial: Creating a Cluster with an EC2 Task Using the Amazon ECS CLI to learn how to create an ECS cluster. In order to use Habana devices, you need to:

For example, we use TensorFlow 2.6 with 1.1.0 Habana version, the docker-compose.yml is:

version: '3'
    image: vault.habana.ai/gaudi-docker/1.1.1/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.7.0:1.1.1-94
    entrypoint: ["hl-smi"]
      driver: awslogs
        awslogs-group: ec2-tutorial
        awslogs-region: us-east-1
        awslogs-stream-prefix: web

runtime and resources should be specified as well. For example, the ecs-params.yml is:

version: 1
        hugepages-2Mi: "21000Mi"
        memory: 720Gi
        hugepages-2Mi: "21000Mi"
        memory: 720Gi

   cpu_shares: 100
   mem_limit: 524288000
   runtime: habana

4.4. View Results

Since we use hl-smi as the entrypoint, the terminal will print all the 8 cards device info. You can change the memory, cpu_shares, hugepages-2Mi, etc. according to your model’s needs.