Creating Cluster and Node Group
On this Page
Creating Cluster and Node Group¶
Follow the Getting Started with Amazon EKS eksctl guide to learn how to create an EKS cluster.
Prerequisites¶
The below lists the prerequisites needed for setting up an AMI on Amazon EKS:
Amazon account
Install CLIs and set up the proper Amazon permissions. Refer to Getting Started with Amazon EKS eksctl.
Subscribe Habana EKS base AMI for the current release version. Refer to Habana Deep Learning EKS AMI.
Create a Cluster¶
To use Gaudi devices, it is recommended to avoid creating node groups in this step. You can add
--without-nodegroup
to the command in the Step 1 Managed Nodes Linux tab.To create an EKS cluster, run the following command:
eksctl create cluster --name my-cluster --region us-east-1 --without-nodegroup
Note
kubectl get nodes -o wide
in Step 2
will not show any nodes since no node groups were added.
Determine Zone DL1 Available for your Account¶
Run the following command to determine what zone dl1.24xlarge
is available for your account, and modify the region appropriately:
aws ec2 describe-instance-type-offerings \
--location-type availability-zone \
--filters Name=instance-type,Values=dl1.24xlarge \
--region us-east-1 \
--output table
Note
Make sure to set one of the two zones in the step below to the zone (location) discovered in this step.
Find the Latest AMI ID¶
To set up the node, find the latest Habana EKS AMI ID:
aws ec2 describe-images --region us-west-2 --filters "Name=name,Values=habanalabs-eks*" --query 'Images[].{Name: Name, ImageID: ImageId}'
Add Node Group for Multi-Node Cluster¶
Follow Creating a managed node group guide to learn how to create a node group. You can create the node group with a launch template:
Note
Habana EKS AMI supports containerD only.
apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig availabilityZones: ["us-east-1a", "us-east-1b"] metadata: name: my-cluster region: us-east-1 iam: withOIDC: true managedNodeGroups: - name: ng01 ami: AMI_ID availabilityZones: ["us-east-1b"] volumeSize: 500 minSize: 0 desiredCapacity: 2 maxSize: 4 privateNetworking: true efaEnabled: true ssh: publicKeyName: PEM_KEY_NAME tags: k8s.io/cluster-autoscaler/node-template/resources/habana.ai/gaudi: "8" k8s.io/cluster-autoscaler/node-template/resources/hugepages-2Mi: "42000Mi" k8s.io/cluster-autoscaler/enabled: "true" overrideBootstrapCommand: | #!/bin/bash /etc/eks/bootstrap.sh my-cluster --container-runtime containerd
The node group is created with the name
ng01
, and Habana EKS AMI inmy-cluster
with regionus-east-1
. It includes 2 active nodes with a maximum node number up to 4.Note
The
launch-template.yaml
file has Elastic Fabric Adapater (EFA) enabled for optimal distributed training, which incurs an additional cost. To disable EFA, setefaEnabled
andprivateNetworking
to false. For more information on EFA, refer to EFA User Guide.
Update the parameters listed in the table to run the desired configuration:
PlaceHolder
Replace
AMI_ID
ami-xxxxxxxxxxxxxxxxx
PEM_KEY_NAME
key_name_no_extension
Create node group with the following command:
eksctl create nodegroup --config-file launch-template.yaml
Clean and Delete the Cluster¶
To delete the node group and the cluster, run the following command:
eksctl delete nodegroup --cluster=<clusterName> --name=<nodegroupName>
eksctl delete cluster --name my-cluster --region us-east-1