Prometheus AlertManager - Send Alerts to different clients based on routes

孤街醉人 提交于 2019-12-10 15:45:02

问题


I have 2 services A and B which I want to monitor. Also I have 2 different notification channels X and Y in the form of receivers in the AlertManager config file.

I want to send to notify X if service A goes down and want to notify Y if service B goes down. How can I achieve this my configuration?

My AlertManager YAML file is:

route:
  receiver: X

receivers:
  - name: X
    email_configs:

  - name: Y
    email_configs:

And alert.rule files is:

groups:

- name: A
  rules:
    - alert: A_down
      expr: expression
      for: 1m
      labels:
         severity: critical
      annotations:
         summary: "A is down"

- name: B
  rules:
    - alert: B_down
      expr: expression
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: "B is down"

回答1:


The config should roughly look like this (not tested):

route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 2h

  receiver: 'default-receiver'

  routes:
  - match:
      alertname: A_down
    receiver: X
  - match:
      alertname: B_down
    receiver: Y

The idea is, that each route field can has a routes field, where you can put a different config, that gets enabled if the labels in match match the condition.




回答2:


For clarifying - The General Flow to handle alert in Prometheus (Alertmanager and Prometheus integration) is like this:

SomeErrorHappenInYourConfiguredRule(Rule) -> RouteToDestination(Route) -> TriggeringAnEvent(Reciever)-> GetAMessageInSlack/PagerDuty/Mail/etc...

For example:

if my aws machine cluster production-a1 is down, I want to trigger an event sending "pagerDuty" and "Slack" to my team with the relevant error.

There's 3 files important to configure alerts on your prometheus system:

  1. alertmanager.yml - configuration of you routes (getting the triggered errors) and receivers (how to handle this errors)
  2. rules.yml - This rules will contain all the thresholds and rules you'll define in your system.
  3. prometheus.yml - global configuration to integrate your rules into routes and recivers together (the two above).

I'm attaching a Dummy example In order to demonstrate the idea, in this example I'll watch overload in my machine (using node exporter installed on it): On /var/data/prometheus-stack/alertmanager/alertmanager.yml

global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'localhost:25'
  smtp_from: 'JohnDoe@gmail.com'

route:
  receiver: defaultTrigger
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 6h
  routes:
  - match_re:
      service: service_overload
      owner: ATeam
    receiver: pagerDutyTrigger

receivers:
- name: 'pagerDutyTrigger'
  pagerduty_configs:
  - send_resolved: true
    routing_key: <myPagerDutyToken>

Add some rule On /var/data/prometheus-stack/prometheus/yourRuleFile.yml

groups:
- name: alerts
  rules:
  - alert: service_overload_more_than_5000
    expr: (node_network_receive_bytes_total{job="someJobOrService"} / 1000) >= 5000
    for: 10m
    labels:
      service: service_overload
      severity: pager
      dev_team: myteam
    annotations:
      dev_team: myteam
      priority: Blocker
      identifier: '{{ $labels.name }}'
      description: 'service overflow'
      value: '{{ humanize $value }}%'

On /var/data/prometheus-stack/prometheus/prometheus.yml add this snippet to integrate alertmanager:

global:

...

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - "alertmanager:9093"

rule_files:
  - "yourRuleFile.yml"

...

Pay attention that the key point of this example is service_overload which connects and binds the rule into the right receiver.

Reload the config (restart the service again or stop and start your docker containers) and test it, if it's configured well you can watch the alerts in http://your-prometheus-url:9090/alerts



来源:https://stackoverflow.com/questions/51485580/prometheus-alertmanager-send-alerts-to-different-clients-based-on-routes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!