Creating Cluster and Node Group

Follow the guide Getting Started with Amazon EKS eksctl to learn how to create an EKS cluster.

Prerequisites

The below lists the prerequisites needed for setting up an AMI on Amazon EKS:

Create a Cluster

  1. To use Gaudi devices, it is recommended to avoid creating node groups in this step. You can add --without-nodegroup to the command in the Step 1 Managed Nodes Linux tab.

  2. To create an EKS cluster, run the following command:

eksctl create cluster --name my-cluster --region us-east-1 --without-nodegroup

Note

kubectl get nodes -o wide in Step 2 will not show any nodes since no node groups were added.

Determine Zone DL1 Available for your Account

Run the following command to determine what zone dl1.24xlarge is available in for your account, and modify the region appropriately:

aws ec2 describe-instance-type-offerings \
  --location-type availability-zone \
  --filters Name=instance-type,Values=dl1.24xlarge \
  --region us-east-1 \
  --output table

Note

Make sure to set one of the two zones in the step below to the zone (location) discovered in this step.

Find the Latest AMI ID

To set up the node, find the latest Habana EKS AMI ID:

aws ec2 describe-images  --region us-west-2 --filters "Name=name,Values=habanalabs-eks*" --query 'Images[].{Name: Name, ImageID: ImageId}'

Add Node Group for Multi-node Cluster

  1. Follow Creating a managed node group guide to learn how to create a node group. You can create the node group with a launch template:

Note

Habana EKS AMI supports containerD only.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
availabilityZones: ["us-east-1a", "us-east-1b"]
metadata:
  name: my-cluster
  region: us-east-1
iam:
  withOIDC: true
managedNodeGroups:
  - name: ng01
    ami: AMI_ID
    availabilityZones: ["us-east-1b"]
    volumeSize: 500
    minSize: 0
    desiredCapacity: 2
    maxSize: 4
    privateNetworking: true
    efaEnabled: true
    ssh:
      publicKeyName: PEM_KEY_NAME
    tags:
      k8s.io/cluster-autoscaler/node-template/resources/habana.ai/gaudi: "8"
      k8s.io/cluster-autoscaler/node-template/resources/hugepages-2Mi: "42000Mi"
      k8s.io/cluster-autoscaler/enabled: "true"
    overrideBootstrapCommand: |
      #!/bin/bash
      /etc/eks/bootstrap.sh my-cluster --container-runtime containerd

The node group is created with the name ng01 and Habana EKS AMI in my-cluster with region us-east-1. It includes 2 active nodes with a maximum node number up to 4.

Note

The launch-template.yaml` has Elastic Fabric Adapater (EFA) enabled for optimal distributed training, which incurs an additional cost. To disable EFA, set efaEnabled and privateNetworking to false. For more information on EFA, refer to EFA User Guide.

  1. Update the parameters listed in the table to run the desired configuration:

PlaceHolder

Replace

AMI_ID

ami-xxxxxxxxxxxxxxxxx

PEM_KEY_NAME

key_name_no_extension

  1. Create node group with the following command:

eksctl create nodegroup --config-file launch-template.yaml

Clean and Delete the Cluster

To delete the node group and the cluster, run the following command:

eksctl delete nodegroup --cluster=<clusterName> --name=<nodegroupName>
eksctl delete cluster --name my-cluster --region us-east-1