How to make Prometheus alert description give both ratio and absolute numbers?

前端 未结 1 1939
南方客
南方客 2021-01-15 21:25

I currently have a Prometheus alert that fires when my success rate drops below 85%.

I would like to add the absolute numbers of the ratio to the alert description.

相关标签:
1条回答
  • 2021-01-15 21:56

    Here's a simplified version of my TasksMissing alert, which outputs the number of tasks missing, the total number of tasks and the affected instances in the summary:

      - alert: TasksMissing
        expr: |
          job_env:up:ratio < .7
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: Tasks missing for {{ $labels.job }} in {{ $labels.env }}
          description:
           '{{ with printf `job_env:up:count{job="%s",env="%s"} - job_env:up:sum{job="%s",env="%s"}` $labels.job $labels.env $labels.job $labels.env | query }}
              {{- . | first | value -}}
            {{ end }}
            of
            {{ with printf `job_env:up:count{job="%s",env="%s"}` $labels.job $labels.env | query }}
              {{- . | first | value -}}
            {{ end }}
            {{ $labels.job }} instances are missing in {{ $labels.env }}:
            {{ range printf `up{job="%s",env="%s"}==0` $labels.job $labels.env | query }}
              {{- .Labels.instance }}
            {{ end }}'
    

    The resulting description is expected read something like "2 of 3 foo-service instances are missing in prod: foo01.prod.foo.org:8080 foo02.prod.foo.org:8080".

    The idea is that you use Go templates to generate a query (by populating a template with values from $labels using printf) and then pipe that into the Prometheus-defined query function and get back either one result (that you can handle using with) or multiple values (that you can iterate over using range). Then you can print either the timeseries value directly or some label (e.g. the instance name).

    0 讨论(0)
提交回复
热议问题