How to make Prometheus alert description give both ratio and absolute numbers?

前端未结

关注

 1  1939

南方客

I currently have a Prometheus alert that fires when my success rate drops below 85%.

I would like to add the absolute numbers of the ratio to the alert description.

相关标签:

1条回答

情歌与酒

2021-01-15 21:56

Here's a simplified version of my TasksMissing alert, which outputs the number of tasks missing, the total number of tasks and the affected instances in the summary:

  - alert: TasksMissing
    expr: |
      job_env:up:ratio < .7
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Tasks missing for {{ $labels.job }} in {{ $labels.env }}
      description:
       '{{ with printf `job_env:up:count{job="%s",env="%s"} - job_env:up:sum{job="%s",env="%s"}` $labels.job $labels.env $labels.job $labels.env | query }}
          {{- . | first | value -}}
        {{ end }}
        of
        {{ with printf `job_env:up:count{job="%s",env="%s"}` $labels.job $labels.env | query }}
          {{- . | first | value -}}
        {{ end }}
        {{ $labels.job }} instances are missing in {{ $labels.env }}:
        {{ range printf `up{job="%s",env="%s"}==0` $labels.job $labels.env | query }}
          {{- .Labels.instance }}
        {{ end }}'

The resulting description is expected read something like "2 of 3 foo-service instances are missing in prod: foo01.prod.foo.org:8080 foo02.prod.foo.org:8080".

The idea is that you use Go templates to generate a query (by populating a template with values from $labels using printf) and then pipe that into the Prometheus-defined query function and get back either one result (that you can handle using with) or multiple values (that you can iterate over using range). Then you can print either the timeseries value directly or some label (e.g. the instance name).

0 讨论(0)