Create and Submit AWS Batch Job
On this Page
Create and Submit AWS Batch Job¶
Job definitions are the blueprints on how to run jobs. This section focuses on:
Creating a job definition to run MNIST training
Submitting a job and view the results
Create AWS Batch Job Definition¶
Job definitions specify how jobs run. Many job definition parameters can be overridden at runtime or submission. Follow the below steps:
Create
dl1_batch_jd.json
with the following configuration, and update the placeholders as described in the table below:{ "jobDefinitionName": "dl1_mnist_batch_jd", "type": "multinode", "nodeProperties": { "numNodes": 2, "mainNode": 0, "nodeRangeProperties": [ { "targetNodes": "0:", "container": { "image": "IMAGE_NAME", "command": [], "jobRoleArn": "TASK_EXEC_ROLE", "resourceRequirements": [ { "type": "MEMORY", "value": "760000" }, { "type": "VCPU", "value": "96" } ], "mountPoints": [], "volumes": [], "environment": [], "ulimits": [], "instanceType": "dl1.24xlarge", "linuxParameters": { "devices": [ { "hostPath": "/dev/infiniband/uverbs0", "containerPath": "/dev/infiniband/uverbs0", "permissions": [ "READ", "WRITE", "MKNOD" ] }, { "hostPath": "/dev/infiniband/uverbs1", "containerPath": "/dev/infiniband/uverbs1", "permissions": [ "READ", "WRITE", "MKNOD" ] }, { "hostPath": "/dev/infiniband/uverbs2", "containerPath": "/dev/infiniband/uverbs2", "permissions": [ "READ", "WRITE", "MKNOD" ] }, { "hostPath": "/dev/infiniband/uverbs3", "containerPath": "/dev/infiniband/uverbs3", "permissions": [ "READ", "WRITE", "MKNOD" ] }, ], "sharedMemorySize": 258048 }, "privileged": true } } ] } }
PlaceHolder
Replace
IMAGE_NAME
xxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/dl1_batch_training:v1
TASK_EXEC_ROLE
arn:aws:iam::xxxxxxx:role/ecsTaskExecutionRole
Create a job definition:
aws batch register-job-definition --cli-input-json file://dl1_batch_jd.json # Expected Results { "jobDefinitionName": "dl1_mnist_batch_jd", "jobDefinitionArn": "arn:aws:batch:us-west-2:xxxxxxxxxxxx:job-definition/dl1_mnist_batch_jd:1", "revision": 1 }
Submit AWS Batch Job¶
To submit a job, run the following command:
aws batch submit-job --job-name dl1_mnp_batch --job-definition dl1_mnist_batch_jd --job-queue dl1_mnp_jq --node-overrides numNodes=2
# Expected Results
{
"jobArn": "arn:aws:batch:us-west-2:xxxxxxxxxxxx:job/a434b6e9-5fda-415d-befb-079b04c95a97",
"jobName": "dl1_mnp_batch",
"jobId": "a434b6e9-5fda-415d-befb-079b04c95a97"
}
Note
The jobs’ status can also be submitted and viewed through the AWS Batch Console.
Observe Submitted AWS Batch Job Logs¶
AWS Batch creates a log that is hosted in CloudWatch. Follow View Log Data sent to CloudWatch Logs for specific instructions.