How to debug when Kubernetes nodes are in 'Not Ready' state

后端未结

关注

 5  1974

I initialized the master node and add 2 worker nodes, but only master and one of the worker node show up when I run the following command:

kubectl get nodes


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2021-02-02 05:53
              
            
            
                                                                       
Steps to debug:-
In case you face any issue in kubernetes, first step is to check if kubernetes self applications are running fine or not.
Command to check:- kubectl get pods -n kube-system
If you see any pod is crashing, check it's logs
if getting NotReady state error, verify network pod logs.
if not able to resolve with above, follow below steps:-

kubectl get nodes       # Check which node is not in ready state

kubectl describe node nodename        #nodename which is not in readystate

ssh to that node

execute systemctl status kubelet  # Make sure kubelet is running

systemctl status docker # Make sure docker service is running

journalctl -u kubelet  # To Check logs in depth


Most probably you will get to know about error here, After fixing it reset kubelet with below commands:-

systemctl daemon-reload
systemctl restart kubelet

In case you still didn't get the root cause, check below things:-

Make sure your node has enough space and memory. Check for /var directory space especially.
command to check: -df -kh, free -m

Verify cpu utilization with top command. and make sure any process is not taking an unexpected memory.


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2021-02-02 06:04
              
            
            
                                                                       
First, describe nodes and see if it reports anything:

$ kubectl describe nodes

Look for conditions, capacity and allocatable:

Conditions:
  Type              Status
  ----              ------
  OutOfDisk         False
  MemoryPressure    False
  DiskPressure      False
  Ready             True
Capacity:
 cpu:       2
 memory:    2052588Ki
 pods:      110
Allocatable:
 cpu:       2
 memory:    1950188Ki
 pods:      110


If everything is alright here, SSH into the node and observe kubelet logs to see if it reports anything. Like certificate erros, authentication errors etc.

If kubelet is running as a systemd service, you can use

$ journalctl -u kubelet
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小蘑菇        
                
              
                            
                2021-02-02 06:05
              
            
            
                                                                       
I recently started using VMWare Octant https://github.com/vmware-tanzu/octant. This is a better UI than the Kubernetes Dashboard. You can view the Kubernetes cluster and look at the details of the cluster and the PODS. This will allow you to check the logs and open a terminal into the POD(s).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲&欢浪女        
                
              
                            
                2021-02-02 06:06
              
            
            
                                                                       
I was having similar issue because of a different reason:

Error:

cord@node1:~$ kubectl get nodes
NAME    STATUS     ROLES    AGE     VERSION
node1   Ready      master   17h     v1.13.5
node2   Ready      <none>   17h     v1.13.5
node3   NotReady   <none>   9m48s   v1.13.5

cord@node1:~$ kubectl describe node node3
Name:               node3
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  Ready            False   Thu, 18 Apr 2019 01:15:46 -0400   Thu, 18 Apr 2019 01:03:48 -0400   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
  InternalIP:  192.168.2.6
  Hostname:    node3


cord@node3:~$ journalctl -u kubelet

Apr 18 01:24:50 node3 kubelet[54132]: W0418 01:24:50.649047   54132 cni.go:149] Error loading CNI config list file /etc/cni/net.d/10-calico.conflist: error parsing configuration list: no 'plugins' key
Apr 18 01:24:50 node3 kubelet[54132]: W0418 01:24:50.649086   54132 cni.go:203] Unable to update cni config: No valid networks found in /etc/cni/net.d
Apr 18 01:24:50 node3 kubelet[54132]: E0418 01:24:50.649402   54132 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 18 01:24:55 node3 kubelet[54132]: W0418 01:24:55.650816   54132 cni.go:149] Error loading CNI config list file /etc/cni/net.d/10-calico.conflist: error parsing configuration list: no 'plugins' key
Apr 18 01:24:55 node3 kubelet[54132]: W0418 01:24:55.650845   54132 cni.go:203] Unable to update cni config: No valid networks found in /etc/cni/net.d
Apr 18 01:24:55 node3 kubelet[54132]: E0418 01:24:55.651056   54132 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 18 01:24:57 node3 kubelet[54132]: I0418 01:24:57.248519   54132 setters.go:72] Using node IP: "192.168.2.6"


Issue:

My file: 10-calico.conflist was incorrect. Verified it from a different node and from sample file in the same directory "calico.conflist.template".

Resolution: 


  
    Changing the file, "10-calico.conflist" and restarting the service using "systemctl restart kubelet", resolved my issue:
  


NAME    STATUS   ROLES    AGE   VERSION
node1   Ready    master   18h   v1.13.5
node2   Ready    <none>   18h   v1.13.5
node3   Ready    <none>   48m   v1.13.5

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小蘑菇        
                
              
                            
                2021-02-02 06:06
              
            
            
                                                                       
I found applying the network and rebooting both the nodes did the trick for me. 

kubectl apply -f [podnetwork].yaml
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复