Scheduling and scaling pods in kubernetes

问题

i am running k8s cluster on GKE

it has 4 node pool with different configuration

Node pool : 1 (Single node coroned status)

Running Redis & RabbitMQ

Node pool : 2 (Single node coroned status)

Running Monitoring & Prometheus

Node pool : 3 (Big large single node)

Application pods

Node pool : 4 (Single node with auto-scaling enabled)

Application pods

currently, i am running single replicas for each service on GKE

however 3 replicas of the main service which mostly manages everything.

when scaling this main service with HPA sometime seen the issue of Node getting crashed or kubelet frequent restart PODs goes to Unkown state.

How to handle this scenario ? If the node gets crashed GKE taking time to auto repair and which cause service down time.

Question : 2

Node pool : 3 -4 running application PODs. Inside the application, there are 3-4 memory-intensive micro services i am also thinking same to use Node selector and fix it on one Node.

while only small node pool will run main service which has HPA and node auto scaling auto work for that node pool.

however i feel like it's not best way to it with Node selector.

it's always best to run more than one replicas of each service but currently, we are running single replicas only of each service so please suggest considering that part.

回答1:

As Patrick W rightly suggested in his comment:

if you have a single node, you leave yourself with a single point of failure. Also keep in mind that autoscaling takes time to kick in and is based on resource requests. If your node suffers OOM because of memory intensive workloads, you need to readjust your memory requests and limits – Patrick W Oct 10 at

you may need to redesign a bit your infrastructure so you have more than a single node in every nodepool as well as readjust mamory requests and limits

You may want to take a look at the following sections in the official kubernetes docs and Google Cloud blog:

Managing Resources for Containers
Assign CPU Resources to Containers and Pods
Configure Default Memory Requests and Limits for a Namespace
Resource Quotas
Kubernetes best practices: Resource requests and limits

How to handle this scenario ? If the node gets crashed GKE taking time to auto repair and which cause service down time.

That's why having more than just one node for a single node pool can be much better option. It greatly reduces the likelihood that you'll end up in the situation described above. GKE autorapair feature needs to take its time (usually a few minutes) and if this is your only node, you cannot do much about it and need to accept possible downtimes.

Node pool : 3 -4 running application PODs. Inside the application, there are 3-4 memory-intensive micro services i am also thinking same to use Node selector and fix it on one Node.

while only small node pool will run main service which has HPA and node auto scaling auto work for that node pool.

however i feel like it's not best way to it with Node selector.

You may also take a loot at node affinity and anti-affinity as well as taints and tolerations

来源：https://stackoverflow.com/questions/64287099/scheduling-and-scaling-pods-in-kubernetes

标签

Docker

Kubernetes

google-cloud-platform

google-kubernetes-engine