How to restart kubernetes nodes?

前端未结

关注

 6  1789

The status of nodes is reported as unknown

\"conditions\": [
          {
            \"type\": \"Ready\",
            \"status\": \"Unknown\",


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  独厮守ぢ        
                
              
                            
                2020-12-24 01:29
              
            
            
                                                                       
Get nodes

kubectl get nodes


Result:

NAME            STATUS     AGE
192.168.1.157   NotReady   42d
192.168.1.158   Ready      42d
192.168.1.159   Ready      42d


Describe node

Here is a NotReady on the node of 192.168.1.157. Then debugging this notready node, and you can read offical documents - Application Introspection and Debugging.

kubectl describe node 192.168.1.157


Partial Result:

Conditions:
Type          Status          LastHeartbeatTime                       LastTransitionTime                      Reason                  Message
----          ------          -----------------                       ------------------                      ------                  -------
OutOfDisk     Unknown         Sat, 28 Dec 2016 12:56:01 +0000         Sat, 28 Dec 2016 12:56:41 +0000         NodeStatusUnknown       Kubelet stopped posting node status.
Ready         Unknown         Sat, 28 Dec 2016 12:56:01 +0000         Sat, 28 Dec 2016 12:56:41 +0000         NodeStatusUnknown       Kubelet stopped posting node status.


There is a OutOfDisk on my node, then Kubelet stopped posting node status.
So, I must free some disk space, using the command of df on my Ubuntu14.04 I can check the details of memory, and using the command of docker rmi image_id/image_name under the role of su I can remove the useless images.

Login in node

Login in 192.168.1.157 by using ssh, like ssh administrator@192.168.1.157, and switch to the 'su' by sudo su;

Restart kubelet

/etc/init.d/kubelet restart


Result:

stop: Unknown instance: 
kubelet start/running, process 59261


Get nodes again

On the master:

kubectl get nodes


Result:

NAME            STATUS    AGE
192.168.1.157   Ready     42d
192.168.1.158   Ready     42d
192.168.1.159   Ready     42d


Ok, that node works fine.

Here is a reference: Kubernetes
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  刺人心        
                
              
                            
                2020-12-24 01:31
              
            
            
                                                                       
I had this problem too but it looks like it depends on the Kubernetes offering and how everything was installed. In Azure, if you are using acs-engine install, you can find the shell script that is actually being run to provision it at:

/opt/azure/containers/provision.sh


To get a more fine-grained understanding, just read through it and run the commands that it specifies. For me, I had to run as root:

systemctl enable kubectl
systemctl restart kubectl


I don't know if the enable is necessary and I can't say if these will work with your particular installation, but it definitely worked for me. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦毁少年i        
                
              
                            
                2020-12-24 01:35
              
            
            
                                                                       
I had an onpremises HA installation, a master and a worker stopped working returning a NOTReady status.
Checking the kubelet logs on the nodes I found out this problem:

failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false


Disabling swap on nodes with 

swapoff -a


and restarting the kubelet

systemctl restart kubelet


did the work.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2020-12-24 01:37
              
            
            
                                                                       
If a node is so unhealthy that the master can't get status from it -- Kubernetes may not be able to restart the node.  And if health checks aren't working, what hope do you have of accessing the node by SSH?

In this case, you may have to hard-reboot -- or, if your hardware is in the cloud, let your provider do it.

For example, the AWS EC2 Dashboard allows you to right-click an instance to pull up an "Instance State" menu -- from which you can reboot/terminate an unresponsive node.

Before doing this, you might choose to kubectl cordon node for good measure.  And you may find kubectl delete node to be an important part of the process for getting things back to normal -- if the node doesn't automatically rejoin the cluster after a reboot.



Why would a node become unresponsive?  Probably some resource has been exhausted in a way that prevents the host operating system from handling new requests in a timely manner.  This could be disk, or network -- but the more insidious case is out-of-memory (OOM), which Linux handles poorly.

To help Kubernetes manage node memory safely, it's a good idea to do both of the following:


Reserve some memory for the system.
Be very careful with (avoid) opportunistic memory specifications for your pods.  In other words, don't allow different values of requests and limits for memory.


The idea here is to avoid the complications associated with memory overcommit, because memory is incompressible, and both Linux and Kubernetes' OOM killers may not trigger before the node has already become unhealthy and unreachable.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  礼貌的吻别        
                
              
                            
                2020-12-24 01:37
              
            
            
                                                                       
In my case I am running 3 nodes in VM's by using Hyper-V. By using the following steps I was able to "restart" the cluster after restarting all VM's.


(Optional) Swap off 

$ swapoff -a
You have to restart all Docker containers

$ docker restart $(docker ps -a -q)
Check the nodes status after you performed step 1 and 2 on all nodes (the status is NotReady)

$ kubectl get nodes
Restart the node

$ systemctl restart kubelet
Check again the status (now should be in Ready status)


Note: I do not know if it does metter the order of nodes restarting, but I choose to start with the k8s master node and after with the minions. Also it will take a little bit to change the node state from NotReady to Ready
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2020-12-24 01:51
              
            
            
                                                                       
You can delete the node from the master by issuing:

kubectl delete node hostname.company.net


The NOTReady status probably means that the master can't access the kubelet service. Check if everything is OK on the client.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复