问题
i am running k8s cluster on GKE
it has 4 node pool with different configuration
Node pool : 1 (Single node coroned status)
Running Redis & RabbitMQ
Node pool : 2 (Single node coroned status)
Running Monitoring & Prometheus
Node pool : 3 (Big large single node)
Application pods
Node pool : 4 (Single node with auto-scaling enabled)
Application pods
currently, i am running single replicas for each service on GKE
however 3 replicas of the main service which mostly manages everything.
when scaling this main service with HPA sometime seen the issue of Node getting crashed or kubelet frequent restart
PODs goes to Unkown state.
How to handle this scenario ? If the node gets crashed GKE taking time to auto repair and which cause service down time.
Question : 2
Node pool : 3 -4 running application PODs. Inside the application, there are 3-4 memory-intensive micro services i am also thinking same to use Node selector and fix it on one Node.
while only small node pool will run main service which has HPA and node auto scaling auto work for that node pool.
however i feel like it's not best way to it with Node selector.
it's always best to run more than one replicas of each service but currently, we are running single replicas only of each service so please suggest considering that part.
回答1:
As Patrick W rightly suggested in his comment:
if you have a single node, you leave yourself with a single point of failure. Also keep in mind that autoscaling takes time to kick in and is based on resource requests. If your node suffers OOM because of memory intensive workloads, you need to readjust your memory requests and limits – Patrick W Oct 10 at
you may need to redesign a bit your infrastructure so you have more than a single node in every nodepool as well as readjust mamory requests and limits
You may want to take a look at the following sections in the official kubernetes docs and Google Cloud blog:
- Managing Resources for Containers
- Assign CPU Resources to Containers and Pods
- Configure Default Memory Requests and Limits for a Namespace
- Resource Quotas
- Kubernetes best practices: Resource requests and limits
How to handle this scenario ? If the node gets crashed GKE taking time to auto repair and which cause service down time.
That's why having more than just one node for a single node pool can be much better option. It greatly reduces the likelihood that you'll end up in the situation described above. GKE autorapair feature needs to take its time (usually a few minutes) and if this is your only node, you cannot do much about it and need to accept possible downtimes.
Node pool : 3 -4 running application PODs. Inside the application, there are 3-4 memory-intensive micro services i am also thinking same to use Node selector and fix it on one Node.
while only small node pool will run main service which has HPA and node auto scaling auto work for that node pool.
however i feel like it's not best way to it with Node selector.
You may also take a loot at node affinity and anti-affinity as well as taints and tolerations
来源:https://stackoverflow.com/questions/64287099/scheduling-and-scaling-pods-in-kubernetes