报警:指prometheus将监测到的异常事件发送给alertmanager,而不是指发送邮件通知
通知:指alertmanager发送异常事件的通知(邮件、webhook等)包括silencing、inhibition,聚合报警信息过后通过email、PagerDuty、HipChat、Slack 等方式发送消息提示
配置 AlertManger:配置报警方式
#alert-cm.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: alertmanager-config
namespace: kube-system
data:
config.yml: |-
global:
smtp_smarthost: 'smtp.163.com:25' #邮箱服务器:此为163邮箱
smtp_from: 'username@163.com'
smtp_auth_username: 'username@163.com'
smtp_auth_password: "password" #邮箱密码或者客户端授权码
smtp_require_tls: false
route:
group_by: [alertname]
group_wait: 30s
group_interval: 5m
repeat_interval: 10m
receiver: default-receiver
receivers:
- name: 'default-receiver'
email_configs:
- to: '*************'
安装AlertManger
#alert-de.yaml
kind: Deployment
metadata:
labels:
name: alertmanager-deployment
name: alertmanager
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: alertmanager
template:
metadata:
labels:
app: alertmanager
spec:
containers:
- name: alertmanager
image: prom/alertmanager
imagePullPolicy: IfNotPresent
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
args:
- "--config.file=/etc/alertmanager/config.yml" #指定alertmanager配置文件路径
- "--storage.path=/alertmanager/data" #指定数据存储路径
- "--cluster.listen-address=$(POD_IP):6783"
ports:
- containerPort: 9093
name: http
volumeMounts:
- mountPath: "/etc/alertmanager"
name: alertcfg
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 100m
memory: 256Mi
serviceAccountName: prometheus #此处使用prometheus权限 (见prometheus安装文档)
volumes:
- name: alertcfg
configMap:
name: alertmanager-config
- name: data
emptyDir: {}
#alert-svc.yaml
#svc暴露端口
---
kind: Service
apiVersion: v1
metadata:
labels:
app: alertmanager
name: alertmanager
namespace: kube-system
spec:
type: NodePort
ports:
- port: 9093
targetPort: 9093
nodePort: 31000
selector:
app: alertmanager
配置Prometheus来和AlertManager通信 (添加 prometheus 中prome-cm.yamll)
rule_files:
- /etc/prometheus/rules.yml
alerting:
alertmanagers:
- static_configs:
- targets: ["SVC_IP:31000"]
Prometheus中创建报警规则(添加 prometheus 中prome-cm.yaml)
rules.yml: |
groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
创建
kubectl create -f alert-cm.yaml
kubectl create -f alert-de.yaml
kubectl create -f alert-svc.yaml
#prometheus
kubectl apply -f prome-cm.yaml
删除prometheus pod
页面访问:http://node_IP:31000
邮件报警如下:
来源:https://blog.csdn.net/weixin_39816723/article/details/99679592