I use a computing environment of 0-256 m3.medium on demand instances. My Job definition requires 1 CPU and 3 GB of Ram, which m3.medium has.
What are possible reasons wh
I just fought with this for a while, and found the answer.
One possible reason jobs can get stuck in Runnable
is because there are no instances to run the job on. If this is the case, looking at the auto scaling group as mentioned in the above answer can show you the actual error that's preventing instances from being started, guiding you to the exact problem rather than leaving you to try any number solutions to problems you don't have. Error messages are our friends.
There are other reasons why a Job can get stuck in RUNNABLE:
Also, make sure to read the AWS Batch troubleshooting
Your compute environment might be invalid. Check AWS Batch -> Compute Environments -> Status column. Mine said invalid, and this symbol was next to the compute environment name:
Clicking on the compute environment gave me more information - my AMI ID was wrong.
In case it is useful, wanted to share this really helpful vid from AWS Cloud Support Engineer:
https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-stuck-runnable-status/
The roles should be defined using, at least, the next policies and trusted relationships. If not, they will get stuck in RUNNABLE as they don't have the enough privileges to start:
AWSBatchServiceRole
Trusted relationship: batch.amazonaws.com
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "batch.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
AmazonEC2ContainerServiceforEC2Role
Trusted relationship: ec2.amazonaws.com
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}