问题
I have install metrics server on my local k8s cluster on VirtualBox using https://github.com/kubernetes-sigs/metrics-server#installation
But the metrics server pod is in
metrics-server-844d9574cf-bxdk7 0/1 CrashLoopBackOff 28 12h 10.46.0.1 kubenode02 <none> <none>
Events from the pod describe
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> Successfully assigned kube-system/metrics-server-844d9574cf-bxdk7 to kubenode02
Normal Created 12h (x3 over 12h) kubelet, kubenode02 Created container metrics-server
Normal Started 12h (x3 over 12h) kubelet, kubenode02 Started container metrics-server
Normal Killing 12h (x2 over 12h) kubelet, kubenode02 Container metrics-server failed liveness probe, will be restarted
Warning Unhealthy 12h (x7 over 12h) kubelet, kubenode02 Liveness probe failed: HTTP probe failed with statuscode: 500
Warning Unhealthy 12h (x7 over 12h) kubelet, kubenode02 Readiness probe failed: HTTP probe failed with statuscode: 500
Normal Pulled 12h (x7 over 12h) kubelet, kubenode02 Container image "k8s.gcr.io/metrics-server/metrics-server:v0.4.0" already present on machine
Warning BackOff 12h (x35 over 12h) kubelet, kubenode02 Back-off restarting failed container
Normal SandboxChanged 55m (x22 over 59m) kubelet, kubenode02 Pod sandbox changed, it will be killed and re-created.
Normal Pulled 55m kubelet, kubenode02 Container image "k8s.gcr.io/metrics-server/metrics-server:v0.4.0" already present on machine
Normal Created 55m kubelet, kubenode02 Created container metrics-server
Normal Started 55m kubelet, kubenode02 Started container metrics-server
Warning Unhealthy 29m (x35 over 55m) kubelet, kubenode02 Liveness probe failed: HTTP probe failed with statuscode: 500
Warning BackOff 4m45s (x202 over 54m) kubelet, kubenode02 Back-off restarting failed container
Logs from the deployment of metrics is as follows using kubectl logs deployment/metrics-server -n kube-system
E1110 12:56:25.249873 1 pathrecorder.go:107] registered "/metrics" from goroutine 1 [running]:
runtime/debug.Stack(0x1942e80, 0xc0006e8db0, 0x1bb58b5)
/usr/local/go/src/runtime/debug/stack.go:24 +0x9d
k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).trackCallers(0xc0004f73b0, 0x1bb58b5, 0x8)
/go/pkg/mod/k8s.io/apiserver@v0.19.2/pkg/server/mux/pathrecorder.go:109 +0x86
k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).Handle(0xc0004f73b0, 0x1bb58b5, 0x8, 0x1e96f00, 0xc0005dc8d0)
/go/pkg/mod/k8s.io/apiserver@v0.19.2/pkg/server/mux/pathrecorder.go:173 +0x84
k8s.io/apiserver/pkg/server/routes.MetricsWithReset.Install(0xc0004f73b0)
/go/pkg/mod/k8s.io/apiserver@v0.19.2/pkg/server/routes/metrics.go:43 +0x5d
k8s.io/apiserver/pkg/server.installAPI(0xc00000a1e0, 0xc00013d8c0)
/go/pkg/mod/k8s.io/apiserver@v0.19.2/pkg/server/config.go:711 +0x6c
k8s.io/apiserver/pkg/server.completedConfig.New(0xc00013d8c0, 0x1f099c0, 0xc000697090, 0x1bbdb5a, 0xe, 0x1ef29e0, 0x2cef248, 0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apiserver@v0.19.2/pkg/server/config.go:657 +0xb45
sigs.k8s.io/metrics-server/pkg/server.Config.Complete(0xc00013d8c0, 0xc00013cb40, 0xc00013d680, 0xdf8475800, 0xc92a69c00, 0x0, 0x0, 0xdf8475800)
/go/src/sigs.k8s.io/metrics-server/pkg/server/config.go:52 +0x312
sigs.k8s.io/metrics-server/cmd/metrics-server/app.runCommand(0xc0001140b0, 0xc0000a65a0, 0x0, 0x0)
/go/src/sigs.k8s.io/metrics-server/cmd/metrics-server/app/start.go:66 +0x157
sigs.k8s.io/metrics-server/cmd/metrics-server/app.NewMetricsServerCommand.func1(0xc000618b00, 0xc0002c3a80, 0x0, 0x4, 0x0, 0x0)
/go/src/sigs.k8s.io/metrics-server/cmd/metrics-server/app/start.go:37 +0x33
github.com/spf13/cobra.(*Command).execute(0xc000618b00, 0xc000100060, 0x4, 0x4, 0xc000618b00, 0xc000100060)
/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842 +0x453
github.com/spf13/cobra.(*Command).ExecuteC(0xc000618b00, 0xc00012a120, 0x0, 0x0)
/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
main.main()
/go/src/sigs.k8s.io/metrics-server/cmd/metrics-server/metrics-server.go:38 +0xae
I1110 12:56:25.384926 1 secure_serving.go:197] Serving securely on [::]:4443
I1110 12:56:25.384972 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1110 12:56:25.384979 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I1110 12:56:25.384996 1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I1110 12:56:25.385018 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I1110 12:56:25.385069 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1110 12:56:25.385083 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1110 12:56:25.385105 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1110 12:56:25.385117 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E1110 12:56:25.385521 1 server.go:132] unable to fully scrape metrics: [unable to fully scrape metrics from node kubenode02: unable to fetch metrics from node kubenode02: Get "https://192.168.56.4:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 192.168.56.4 because it doesn't contain any IP SANs, unable to fully scrape metrics from node kubenode01: unable to fetch metrics from node kubenode01: Get "https://192.168.56.3:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 192.168.56.3 because it doesn't contain any IP SANs, unable to fully scrape metrics from node kubemaster: unable to fetch metrics from node kubemaster: Get "https://192.168.56.2:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 192.168.56.2 because it doesn't contain any IP SANs]
I1110 12:56:25.485100 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I1110 12:56:25.485359 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1110 12:56:25.485398 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
回答1:
The error is due to the self-signed TLS certificate. So adding - --kubelet-insecure-tls to the components.yaml and re-applying it to the K8s cluster fixes the issue.
Ref:- https://github.com/kubernetes-sigs/metrics-server#configuration
回答2:
I think, better would be to reissuing certificates for nodes (workers) and adding IP to SAN. cat w2k.csr.json
{
"hosts": [
"w2k",
"w2k.rezerw.at",
"172.16.8.113"
],
"CN": "system:node:w2k",
"key": {
"algo": "ecdsa",
"size": 256
},
"names": [
{
"O": "system:nodes"
}
]
}
and commands:
cat w2k.csr.json|cfssl genkey - | cfssljson -bare w2k
cat w2k.csr| base64
This will output a string to put it in a spec.requet in new yaml file:
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
name: worker01
spec:
request: "LS0tLS1CRUdJ0tLS0tCg=="
signerName: kubernetes.io/kubelet-serving
usages:
- digital signature
- key encipherment
- server auth
Apply it.
kubectl apply -f w2k.csr.yaml
certificatesigningrequest.certificates.k8s.io/worker01 configured
Approve the csr.
kubectl certificate approve w2k
certificatesigningrequest.certificates.k8s.io/w2k approved
Get the certificate and put together with its key on a node in /var/lib/kubelet/pki
root@w2k:/var/lib/kubelet/pki# mv w2k-key.pem kubelet.key
root@w2k:/var/lib/kubelet/pki# mv w2k-cert.pem kubelet.crt
https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/#create-a-certificate-signing-request-object-to-send-to-the-kubernetes-api
来源:https://stackoverflow.com/questions/64767239/kubernetes-metrics-server-not-running