Create and Submit AWS Batch Job
On this Page
Create and Submit AWS Batch Job¶
Job Definitions are the blueprints on how to run jobs. This section focuses on:
Creating a Job Definition to run MNIST training
Submitting a Job and view the results
Create AWS Batch Job Definition¶
Job Definitions specifies how jobs run. Many job definition parameters can be overridden at runtime/submission. Follow the below steps:
Create
dl1_batch_jd.json
with the following configuration and update the placeholders:
{
"jobDefinitionName": "dl1_mnist_batch_jd",
"type": "multinode",
"nodeProperties": {
"numNodes": 2,
"mainNode": 0,
"nodeRangeProperties": [
{
"targetNodes": "0:",
"container": {
"image": "IMAGE_NAME",
"command": [],
"jobRoleArn": "TASK_EXEC_ROLE",
"resourceRequirements": [
{
"type": "MEMORY",
"value": "760000"
},
{
"type": "VCPU",
"value": "96"
}
],
"mountPoints": [],
"volumes": [],
"environment": [],
"ulimits": [],
"instanceType": "dl1.24xlarge",
"linuxParameters": {
"devices": [
{
"hostPath": "/dev/infiniband/uverbs0",
"containerPath": "/dev/infiniband/uverbs0",
"permissions": [
"READ",
"WRITE",
"MKNOD"
]
},
{
"hostPath": "/dev/infiniband/uverbs1",
"containerPath": "/dev/infiniband/uverbs1",
"permissions": [
"READ",
"WRITE",
"MKNOD"
]
},
{
"hostPath": "/dev/infiniband/uverbs2",
"containerPath": "/dev/infiniband/uverbs2",
"permissions": [
"READ",
"WRITE",
"MKNOD"
]
},
{
"hostPath": "/dev/infiniband/uverbs3",
"containerPath": "/dev/infiniband/uverbs3",
"permissions": [
"READ",
"WRITE",
"MKNOD"
]
},
],
"sharedMemorySize": 258048
},
"privileged": true
}
}
]
}
}
PlaceHolder |
Replace |
---|---|
IMAGE_NAME |
xxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/dl1_batch_training:v1 |
TASK_EXEC_ROLE |
arn:aws:iam::xxxxxxx:role/ecsTaskExecutionRole |
Run the aws command to create a job definition:
aws batch register-job-definition --cli-input-json file://dl1_batch_jd.json
# Expected Results
{
"jobDefinitionName": "dl1_mnist_batch_jd",
"jobDefinitionArn": "arn:aws:batch:us-west-2:xxxxxxxxxxxx:job-definition/dl1_mnist_batch_jd:1",
"revision": 1
}
Submit AWS Batch Job¶
Run the aws command to submit a job:
aws batch submit-job --job-name dl1_mnp_batch --job-definition dl1_mnist_batch_jd --job-queue dl1_mnp_jq --node-overrides numNodes=2
# Expected Results
{
"jobArn": "arn:aws:batch:us-west-2:xxxxxxxxxxxx:job/a434b6e9-5fda-415d-befb-079b04c95a97",
"jobName": "dl1_mnp_batch",
"jobId": "a434b6e9-5fda-415d-befb-079b04c95a97"
}
Note
Jobs status can also be submitted/viewed through the AWS Batch Console
Observe Submitted AWS Batch Job Logs¶
AWS Batch creates a Log that is hosted in CloudWatch. Follow View Log Data sent to CloudWatch Logs for specific instructions.