Pod stuck in Pending state when trying to schedule it on AWS Fargate

痞子三分冷 提交于 2021-02-07 12:25:36

问题


I have an EKS cluster to which I've added support to work in hybrid mode (in other words, I've added Fargate profile to it). My intention is to run only specific workload on the AWS Fargate while keeping the EKS worker nodes for other kind of workload.

To test this out, my Fargate profile is defined to be:

  • Restricted to specific namespace (Let's say: mynamespace)
  • Has specific label so that pods need to match it in order to be scheduled on Fargate (Label is: fargate: myvalue)

For testing k8s resources, I'm trying to deploy simple nginx deployment which looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: mynamespace
  labels:
    fargate: myvalue
spec:
  selector:
    matchLabels:
      app: nginx
      version: 1.7.9
      fargate: myvalue
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
        version: 1.7.9
        fargate: myvalue
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80

When I try to apply this resource, I get following:

$ kubectl get pods -n mynamespace -o wide
NAME                                                        READY   STATUS      RESTARTS   AGE     IP            NODE                          NOMINATED NODE                                READINESS GATES
nginx-deployment-596c594988-x9s6n                           0/1     Pending     0          10m     <none>        <none>                        07c651ad2b-7cf85d41b2424e529247def8bda7bf38   <none>

Pod stays in the Pending state and it is never scheduled to the AWS Fargate instances.

This is a pod describe output:

$ kubectl describe pod nginx-deployment-596c594988-x9s6n -n mynamespace
Name:               nginx-deployment-596c594988-x9s6n
Namespace:          mynamespace
Priority:           2000001000
PriorityClassName:  system-node-critical
Node:               <none>
Labels:             app=nginx
                    eks.amazonaws.com/fargate-profile=myprofile
                    fargate=myvalue
                    pod-template-hash=596c594988
                    version=1.7.9
Annotations:        kubernetes.io/psp: eks.privileged
Status:             Pending
IP:
Controlled By:      ReplicaSet/nginx-deployment-596c594988
NominatedNodeName:  9e418415bf-8259a43075714eb3ab77b08049d950a8
Containers:
  nginx:
    Image:        nginx:1.7.9
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-784d2 (ro)
Volumes:
  default-token-784d2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-784d2
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

One thing that I can conclude from this output is that correct Fargate profile was chosen:

eks.amazonaws.com/fargate-profile=myprofile

Also, I see that some value is added to NOMINATED NODE field but not sure what it represents.

Any ideas or usual problems that happen and that might be worth troubleshooting in this case? Thanks


回答1:


It turns out the problem was in networking setup of private subnets associated with the Fargate profile all the time.

To give more info, here is what I initially had:

  1. EKS cluster with several worker nodes where I've assigned only public subnets to the EKS cluster itself
  2. When I tried to add Fargate profile to the EKS cluster, because of the current limitation on Fargate, it is not possible to associate profile with public subnets. In order to solve this, I've created private subnets with the same tag like the public ones so that EKS cluster is aware of them
  3. What I forgot was that I needed to enable connectivity from the vpc private subnets to the outside world (I was missing NAT gateway). So I've created NAT gateway in Public subnet that is associated with EKS and added to the private subnets additional entry in their associated Routing table that looks like this:

    0.0.0.0/0 nat-xxxxxxxx

This solved the problem that I had above although I'm not sure about the real reason why AWS Fargate profile needs to be associated only with private subnets.




回答2:


If you use the community module, all of this can be taken care of by the following config:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "2.44.0"

  name = "vpc-module-demo"
  cidr = "10.0.0.0/16"

  azs             = slice(data.aws_availability_zones.available.names, 0, 3)
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  single_nat_gateway = true # needed for fargate (https://docs.aws.amazon.com/eks/latest/userguide/eks-ug.pdf#page=135&zoom=100,96,764)
  enable_nat_gateway = true # needed for fargate (https://docs.aws.amazon.com/eks/latest/userguide/eks-ug.pdf#page=135&zoom=100,96,764)
  enable_vpn_gateway = false
  enable_dns_hostnames = true # needed for fargate (https://docs.aws.amazon.com/eks/latest/userguide/eks-ug.pdf#page=135&zoom=100,96,764)
  enable_dns_support = true # needed for fargate (https://docs.aws.amazon.com/eks/latest/userguide/eks-ug.pdf#page=135&zoom=100,96,764)

  tags = {
    "Name"                                      = "terraform-eks-demo-node"
    "kubernetes.io/cluster/${var.cluster-name}" = "shared"
  }
}


来源:https://stackoverflow.com/questions/59649086/pod-stuck-in-pending-state-when-trying-to-schedule-it-on-aws-fargate

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!