Creating Cluster and Node Group
On this Page
Creating Cluster and Node Group¶
Follow the guide Getting Started with Amazon EKS eksctl to learn how to create an EKS cluster.
Prerequisites¶
The below lists the prerequisites needed for setting up an AMI on Amazon EKS:
Amazon account
Install CLIs and set up the proper Amazon permissions. Refer to Getting Started with Amazon EKS eksctl.
Subscribe Habana EKS base AMI for the current release version. Refer to Habana Deep Learning EKS AMI.
Create a Cluster¶
To use Habana devices, it is recommended to avoid creating node groups in this step. You can add
--without-nodegroup
to the command in the Step 1 Managed Nodes Linux tab.To create an EKS cluster, run the following command:
eksctl create cluster --name my-cluster --region us-east-1 --without-nodegroup
Note
kubectl get nodes -o wide
in Step 2
will not show any nodes since no node groups were added.
Determine Zone DL1 Available for your Account¶
Run the following command to determine what zone dl1.24xlarge is available in for your account, and modify the region appropriately:
aws ec2 describe-instance-type-offerings \
--location-type availability-zone \
--filters Name=instance-type,Values=dl1.24xlarge \
--region us-east-1 \
--output table
Note
Make sure to set one of the two zones in the step below to the zone (location) discovered in this step.
Find the Latest AMI ID¶
To set up the node, find the latest Habana EKS AMI ID:
aws ec2 describe-images --region us-west-2 --filters "Name=name,Values=habanalabs-eks*" --query 'Images[].{Name: Name, ImageID: ImageId}'
Add Node Group for Multi-node Cluster¶
Follow Creating a managed node group guide to learn how to create a node group. You can create the node group with a launch template:
Note
Habana EKS AMI supports containerD only.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
availabilityZones: ["us-east-1a", "us-east-1b"]
metadata:
name: my-cluster
region: us-east-1
iam:
withOIDC: true
managedNodeGroups:
- name: ng01
ami: AMI_ID
availabilityZones: ["us-east-1b"]
volumeSize: 500
minSize: 0
desiredCapacity: 2
maxSize: 4
privateNetworking: true
efaEnabled: true
ssh:
publicKeyName: PEM_KEY_NAME
tags:
k8s.io/cluster-autoscaler/node-template/resources/habana.ai/gaudi: "8"
k8s.io/cluster-autoscaler/node-template/resources/hugepages-2Mi: "42000Mi"
k8s.io/cluster-autoscaler/enabled: "true"
overrideBootstrapCommand: |
#!/bin/bash
/etc/eks/bootstrap.sh my-cluster --container-runtime containerd
The node group is created with the name ng01
and Habana EKS AMI in my-cluster
with region us-east-1
. It includes 2 active nodes with a maximum node number up to 4.
Note
The launch-template.yaml`
has Elastic Fabric Adapater (EFA) enabled for optimal distributed training,
which incurs an additional cost. To disable EFA, set efaEnabled
and privateNetworking
to false.
For more information on EFA, refer to EFA User Guide.
Update the parameters listed in the table to run the desired configuration:
PlaceHolder |
Replace |
---|---|
AMI_ID |
ami-xxxxxxxxxxxxxxxxx |
PEM_KEY_NAME |
key_name_no_extension |
Create node group with the following command:
eksctl create nodegroup --config-file launch-template.yaml
Clean and Delete the Cluster¶
To delete the node group and the cluster, run the following command:
eksctl delete nodegroup --cluster=<clusterName> --name=<nodegroupName>
eksctl delete cluster --name my-cluster --region us-east-1