GKE Cluster can't pull (ErrImagePull) from GCR Registry in same project (GitLab Kubernetes Integration): Why?

后端 未结 2 2011
滥情空心
滥情空心 2021-02-05 19:59

So after googling a little bit (which is polluted by people having trouble with Pull Secrets) I am posting this here — and to GCP Support (will update as I hear).

I crea

2条回答
  •  面向向阳花
    2021-02-05 20:46

    TL;DR — Clusters created by GitLab-Ci Kubernetes Integration will not be able to pull an image from a GCR Registry in the same project as the container images — without modifying the Node(s) permissions (scopes).

    While you CAN manually modify the permissions on an Individual Node machine(s) to grant the Application Default Credentials (see: https://developers.google.com/identity/protocols/application-default-credentials) the proper scopes in real time — doing it this way would mean that if your node is re-created at some point in the future it WOULD NOT have your modified scopes and things would break.

    Instead of modifying the permissions manually — create a new Node pool that has the proper Scope(s) to access your required GCP services.

    Here are some resources I used for reference:

    1. https://medium.com/google-cloud/updating-google-container-engine-vm-scopes-with-zero-downtime-50bff87e5f80
    2. https://adilsoncarvalho.com/changing-a-running-kubernetes-cluster-permissions-a-k-a-scopes-3e90a3b95636

    Creating a properly Scoped Node Pool Generally looks like this

    gcloud container node-pools create [new pool name] \
     --cluster [cluster name] \
     --machine-type [your desired machine type] \
     --num-nodes [same-number-nodes] \
     --scopes [your new set of scopes]
    

    If you aren't sure what the names of your required Scopes are — You can see a full list of Scopes AND Scope Aliases over here: https://cloud.google.com/sdk/gcloud/reference/container/node-pools/create

    For me I did gke-default (same as my other cluster) and sql-admin. The reason for this being that I need to be able to access an SQL Database in Cloud SQL during part of my build and I don't want to have to connect to a pubic IP to do that.

    gke-default Scopes (for reference)

    1. https://www.googleapis.com/auth/devstorage.read_only (allows you to pull)
    2. https://www.googleapis.com/auth/logging.write
    3. https://www.googleapis.com/auth/monitoring
    4. https://www.googleapis.com/auth/service.management.readonly
    5. https://www.googleapis.com/auth/servicecontrol
    6. https://www.googleapis.com/auth/trace.append

    Contrast the above with more locked down permissions from a GitLab-Ci created cluster ( ONLY these two: https://www.googleapis.com/auth/logging.write, https://www.googleapis.com/auth/monitoring):

    Obviosuly configuring your cluster to ONLY the minimum permissions needed is for sure the way to go here. Once you figure out what that is and create your new properly scoped Node Pool...

    List your nodes with:

    kubectl get nodes
    

    The one you just created (most recent) is has the new settings while the older option is the default gitlab cluster that can pull from the GCR.

    Then:

    kubectl cordon [your-node-name-here]
    

    After that you want to drain:

    kubectl drain [your-node-name-here] --force
    

    I ran into issues where the fact that I had a GitLab Runner installed meant that I couldn't drain the pods normally due to the local data / daemon set that was used to control it.

    For that reason once I cordon'd my Node I just deleted the node from Kubectl (not sure if this will cause problems — but it was fine for me). Once your node is deleted you need to delete the 'default-pool' node pool created by GitLab.

    List your node-pools:

    gcloud container node-pools list --cluster [CLUSTER_NAME]
    

    See the old scopes created by gitlab:

    gcloud container node-pools describe default-pool \
        --cluster [CLUSTER_NAME]
    

    Check to see if you have the correct new scopes (that you just added):

    gcloud container node-pools describe [NEW_POOL_NAME] \
        --cluster [CLUSTER_NAME]
    

    If your new Node Pool has the right scopes your deployments can now delete the default pool with:

    gcloud container node-pools delete default-pool \
       --cluster  --zone 
    

    In my personal case I am still trying to figure out how to allow access to the private network (ie. get to Cloud SQL via private IP) but I can pull my images now so I am half way there.

    I think that's it — hope it saved you a few minutes!

提交回复
热议问题