Why are AWS Batch Jobs stuck in RUNNABLE?

后端未结

关注

 5  1920

I use a computing environment of 0-256 m3.medium on demand instances. My Job definition requires 1 CPU and 3 GB of Ram, which m3.medium has.

What are possible reasons wh

相关标签:

5条回答

心在旅途

2021-02-05 04:05

I just fought with this for a while, and found the answer.

One possible reason jobs can get stuck in Runnable is because there are no instances to run the job on. If this is the case, looking at the auto scaling group as mentioned in the above answer can show you the actual error that's preventing instances from being started, guiding you to the exact problem rather than leaving you to try any number solutions to problems you don't have. Error messages are our friends.

0 讨论(0)
发布评论:

提交评论
- 加载中...
不思量自难忘°

2021-02-05 04:14
There are other reasons why a Job can get stuck in RUNNABLE:
- Insufficient permissions for the role associated to the Computed Environment
- No internet access from the Compute Environment instance. You will need to associate a NAT or Internet Gateway to the Compute Environment subnet.
  - Make sure to check the "Enable auto-assign public IPv4 address" setting on your Compute Environment's subnet. (Pointed out by @thisisbrians in the comments)
- Problems with your image. You need to use an ECS optimized AMI or make sure you have the ECS container agent working. More info at aws docs
- You're trying to launch instances for which you account is limited to 0 instances (EC2 console > limits, in the left menu). (Read more on gergely-danyi comment)
- And as mentioned insufficient resources
Also, make sure to read the AWS Batch troubleshooting
0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2021-02-05 04:16

Your compute environment might be invalid. Check AWS Batch -> Compute Environments -> Status column. Mine said invalid, and this symbol was next to the compute environment name:

Clicking on the compute environment gave me more information - my AMI ID was wrong.

0 讨论(0)
发布评论:

提交评论
- 加载中...
萌比男神i

2021-02-05 04:20

In case it is useful, wanted to share this really helpful vid from AWS Cloud Support Engineer:

https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-stuck-runnable-status/

0 讨论(0)
发布评论:

提交评论
- 加载中...
说谎

2021-02-05 04:29
The roles should be defined using, at least, the next policies and trusted relationships. If not, they will get stuck in RUNNABLE as they don't have the enough privileges to start:

AWSBatchServiceRole
- Attached policies: AWSBatchServiceRole
- Trusted relationship: batch.amazonaws.com
```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
         "Service": "batch.amazonaws.com"
       },
      "Action": "sts:AssumeRole"
    }
  ]
}
```
ecsInstanceRole
- Attached policies: AmazonEC2ContainerServiceforEC2Role
- Trusted relationship: ec2.amazonaws.com
```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
         "Service": "ec2.amazonaws.com"
       },
      "Action": "sts:AssumeRole"
    }
  ]
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

Why are AWS Batch Jobs stuck in RUNNABLE?

AWSBatchServiceRole

ecsInstanceRole