Creating Cluster and Node Group

Follow the Getting Started with Amazon EKS eksctl guide to learn how to create an EKS cluster.

Prerequisites

The below lists the prerequisites needed for setting up an AMI on Amazon EKS:

Create a Cluster

  1. To use Gaudi devices, it is recommended to avoid creating node groups in this step. You can add --without-nodegroup to the command in the Step 1 Managed Nodes Linux tab.

  2. To create an EKS cluster, run the following command:

    eksctl create cluster --name my-cluster --region us-east-1 --without-nodegroup
    

Note

kubectl get nodes -o wide in Step 2 will not show any nodes since no node groups were added.

Determine Zone DL1 Available for your Account

Run the following command to determine what zone dl1.24xlarge is available for your account, and modify the region appropriately:

aws ec2 describe-instance-type-offerings \
  --location-type availability-zone \
  --filters Name=instance-type,Values=dl1.24xlarge \
  --region us-east-1 \
  --output table

Note

Make sure to set one of the two zones in the step below to the zone (location) discovered in this step.

Find the Latest AMI ID

To set up the node, find the latest Habana EKS AMI ID:

aws ec2 describe-images  --region us-west-2 --filters "Name=name,Values=habanalabs-eks*" --query 'Images[].{Name: Name, ImageID: ImageId}'

Add Node Group for Multi-Node Cluster

  1. Follow Creating a managed node group guide to learn how to create a node group. You can create the node group with a launch template:

    Note

    Habana EKS AMI supports containerD only.

    apiVersion: eksctl.io/v1alpha5
    kind: ClusterConfig
    availabilityZones: ["us-east-1a", "us-east-1b"]
    metadata:
      name: my-cluster
      region: us-east-1
    iam:
      withOIDC: true
    managedNodeGroups:
      - name: ng01
        ami: AMI_ID
        availabilityZones: ["us-east-1b"]
        volumeSize: 500
        minSize: 0
        desiredCapacity: 2
        maxSize: 4
        privateNetworking: true
        efaEnabled: true
        ssh:
          publicKeyName: PEM_KEY_NAME
        tags:
          k8s.io/cluster-autoscaler/node-template/resources/habana.ai/gaudi: "8"
          k8s.io/cluster-autoscaler/node-template/resources/hugepages-2Mi: "42000Mi"
          k8s.io/cluster-autoscaler/enabled: "true"
        overrideBootstrapCommand: |
          #!/bin/bash
          /etc/eks/bootstrap.sh my-cluster --container-runtime containerd
    

The node group is created with the name ng01, and Habana EKS AMI in my-cluster with region us-east-1. It includes 2 active nodes with a maximum node number up to 4.

Note

The launch-template.yaml file has Elastic Fabric Adapater (EFA) enabled for optimal distributed training, which incurs an additional cost. To disable EFA, set efaEnabled and privateNetworking to false. For more information on EFA, refer to EFA User Guide.

  1. Update the parameters listed in the table to run the desired configuration:

    PlaceHolder

    Replace

    AMI_ID

    ami-xxxxxxxxxxxxxxxxx

    PEM_KEY_NAME

    key_name_no_extension

  2. Create node group with the following command:

    eksctl create nodegroup --config-file launch-template.yaml
    

Clean and Delete the Cluster

To delete the node group and the cluster, run the following command:

eksctl delete nodegroup --cluster=<clusterName> --name=<nodegroupName>
eksctl delete cluster --name my-cluster --region us-east-1