Prometheus+alertmanager监控报警示例

不羁岁月 提交于 2020-11-30 01:24:27

Alertmanager 主要用于接收 Prometheus 发送的告警信息,它支持丰富的告警通知渠道,而且很容易做到告警信息进行去重,降噪,分组,策略路由,是一款前卫的告警通知系统。

安装alertmanager

#安装go 1.11
$ wget https://studygolang.com/dl/golang/go1.11.linux-amd64.tar.gz
$ tar zxvf go1.11.linux-amd64.tar.gz && mv go1.11 /opt/go
 
$ vi /etc/profile 添加
   export GOROOT=/opt/go
   export PATH=$GOROOT/bin:$PATH
   export GOPATH=/opt/go-project
   export PATH=$PATH:$GOPATH/bin
 
$ source /etc/profile
 
$ go version
 
#安装alertmanager(或者使用tar包安装)
$ git clone https://github.com/prometheus/alertmanager.git
$ cd alertmanager/
$ make build
 
安装成功以后,便可编辑报警配置文件了

 

配置文件为alertmanager.yml,默认如下所示

global:
  resolve_timeout: 2h
 
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'webhook'
 
receivers:
- name: 'webhook'
  webhook_configs:                #通过webhook报警
  - url: 'http://example.com/xxxx'
    send_resolved: true

  

修改配置,使用邮件报警

$ cat alertmanager.yml 
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.263.net:25'
  smtp_from: 'xxx@xxx.com'
  smtp_auth_username: 'xxx@xxx.com'
  smtp_auth_password: 'xxx'
  smtp_require_tls: false

route:
  group_by: ['alertname']
  group_wait: 5m
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'manager'
  routes: 
  - match:
      severity: critical
    receiver: manager

templates:
- 'templates/wechat.tmpl'


receivers:
- name: 'manager'
  email_configs: 
  - to: 'xxx@xxx.com'
    send_resolved: true

说明

  • golobal 下为发件人信息配置,其中:
    • smtp_require_tls: false 为关闭ssl设置
  • route 下设置为 alert报警设置,其中:
    • repeat_interval: 1h  :设置发送频率
    • receiver: 'manager' :定义的为邮件接收方,下面的receivers 的值要与这个一样
  • template 下设置发送邮件的模板wechat.tmpl
    • {{ define "wechat.default.message" }}
      {{ range .Alerts }}
      告警状态: {{ .Status }}
      告警类型:{{ .Labels.alertname }}
      故障主机: {{ .Labels.instance }}
      告警详情: {{ .Annotations.description }}
      触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
      {{ end }}
      {{ end }}
    • 其中 templates目录为自定义创建的
  • receivers: 定义接受者信息,其中
    • name的值要与上面route里定义的receiver值一样
    • email_config:邮件接受者信息
    • send_resolved:当故障解决后,发送邮件

更加详细配置可参考 github

 

启动

./alertmanager --config.file=/opt/prometheus-2.5.0.linux-amd64/conf/alertmanager.yml    #这个文件是自己自定义的,位置随便放

 

 

配置prometheus

安装prometheus参考上节

上面配置完alertmanager,启动如果没有报错就配置prometheus

修改prometheus.yml配置文件,主要修改这两点

alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 10.10.10.12:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules/*"

说明

  • alerting:用来配置alertmanager的地址,可配置多个
  • rule_files: 用来指定监控的报警规则,其中 rules为自定义建立的目录,本示例与prometheus.yml同级目录,rules目录可存放多个规则文件

rules下报警文件定义示例如下

$ cat rules/node.yml 
groups:
- name: node.rules
  rules:
  - alert: NodeDataDiskUsage
    expr: ceil((1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes))*100)> 80
    for: 5m
    labels:
      severity: critical
    annotations:
      description:  "{{$labels.instance}} data disk usage is above 80% current {{$value}}%"


  - alert: NodeMemoryUsage
    expr: ceil(((node_memory_MemTotal_bytes- node_memory_MemAvailable_bytes)/(node_memory_MemTotal_bytes)*100)) > 80
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "{{$labels.instance}}: High memory usage detected"
      description: "{{$labels.instance}}: Memory usage is above 80%  for 5m (current value is: {{ $value }} %)"

  - alert: NodeCpuLoad
    expr: node:cpu_load15
    for: 15s
    labels:
      severity: critical
    annotations:
      description: "{{$labels.instance}}: cpu load is {{ $value }}"

  - alert: NodeCpuUsage
    expr: ceil((avg(irate(node_cpu_seconds_total[5m]))* 100)/5) > 80
    for: 5m
    labels:
      severity: critical
    annotations:
      description: "{{$labels.instance}}: High cpu usage is above 80% for 5m (current value is: {{$value}} %)"

说明:

  • 上面中expr指定的值,可拿到prometheus里直接执行并能够获取值

 

修改完prometheus.yml启动并执行一条rule

 

查看定义的规则

 

 

 查看报警

 

报警邮件

 

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!