How can I get a reliable alert via Stackdriver when there are no clients pulling from a Pub/Sub subscription?

你离开我真会死。 提交于 2019-12-08 05:26:48

问题


I currently have some alerts set up to report when subscription/pull_request_count is 0. However, in a similar question about that metric, I found that metrics and alerting break once there is no data, which I believe happens when there are no subscriptions.

My intent is to figure out if my servers have stopped pulling messages. There are 2 scenarios I have in mind where the details are important.

  1. Even if there are no messages being published, I want to know if I'm no longer pulling from a subscription to make sure things are working properly.
  2. In the event that a ton of unacknowledged messages are queued up just because I pulled them but didn't ack them (e.g. a partner API was down), I don't want this alert to be triggered

Besides using subscription/pull_request_count as a condition, which won't work when no data is coming in (at least after a while), how can I set up an alert that notifies me that there no clients pulling from a Pub/Sub subscription?


回答1:


As you want to be alerted when there are no pull message operations you'll have to use the subscription/pull_request_count metric. If, after some time, the metric is dropped instead of reporting 0 pulls you can use two conditions: is absent for 3 minutes OR is below 1 for 1 minute:

However, the problem here is that the UI filters out all unused resources and metrics (for the past 6 weeks). While this greatly eases out setting alerts and browsing through metrics for running operations it requires a different approach to create new alerts before a system is in production. The easiest solution is to make a dummy subscription and pull messages so that the metric appears.

But you can still use the Stackdriver Monitoring API to set them up (I actually tested this with a Spanner metric in a workspace with no instances for the last few months). Keep in mind that the alerting policies API is in Beta so it's subject to non-backwards-compatible changes.

I'd recommend to start by inspecting an already existing policy with projects.alertPolicies/list and see how the AlertPolicy body is constructed.

Then you can set some initial variables:

TOKEN="$(gcloud auth print-access-token)"
PROJECT=$(gcloud config get-value project 2>\dev\null)
SUBSCRIPTION=PUBSUB_SUBSCRIPTION_ID
CHANNEL=NOTIFICATION_CHANNEL_ID

In my case I am monitoring only a specific Pub/Sub subscription throughout the example and I already had a notification channel (for my email). As you also have an existing policy you can get the notification channel ID here.

With projects.alertPolicies/create you can create the new alert policy:

curl -X POST \
    -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  "https://monitoring.googleapis.com/v3/projects/$PROJECT/alertPolicies" \
  -d @alert.json

where alert.json is (replace the variables as needed):

{
  "displayName": "no-pull-alert",
  "combiner": "OR",
  "conditions": [
    {
      "conditionAbsent": {
        "filter": "metric.type=\"pubsub.googleapis.com/subscription/pull_request_count\" resource.type=\"pubsub_subscription\" resource.label.\"project_id\"=\"$PROJECT\" resource.label.\"subscription_id\"=\"$SUBSCRIPTION\"",
        "duration": "180s",
        "trigger": {
          "count": 1
        },
        "aggregations": [
          {
            "alignmentPeriod": "60s",
            "perSeriesAligner": "ALIGN_RATE"
          }
        ]
      },
      "displayName": "Pull requests absent for $PROJECT, $SUBSCRIPTION"
    },
    {
      "conditionThreshold": {
        "filter": "metric.type=\"pubsub.googleapis.com/subscription/pull_request_count\" resource.type=\"pubsub_subscription\" resource.label.\"project_id\"=\"$PROJECT\" resource.label.\"subscription_id\"=\"$SUBSCRIPTION\"",
        "comparison": "COMPARISON_LT",
        "thresholdValue": 1,
        "duration": "60s",
        "trigger": {
          "count": 1
        },
        "aggregations": [
          {
            "alignmentPeriod": "60s",
            "perSeriesAligner": "ALIGN_RATE"
          }
        ]
      },
      "displayName": "Pull requests are 0 for $PROJECT, $SUBSCRIPTION"
    }
  ],
  "documentation": {
    "content": "**ALERT**\n\nNo pull message operations",
    "mimeType": "text/markdown"
  },
  "notificationChannels": [
    "projects/$PROJECT/notificationChannels/$CHANNEL"
  ],
  "enabled": true
}

Briefly, you don't need to pass policy or condition IDs as those will be populated by the API. Use OR as the combiner (policy violates when ANY condition is met) to trigger the alert when the metric is either absent (conditionAbsent) or below 1 (conditionThreshold). And, of course, you can modify parameters to better suit your use case, display names, descriptions, etc.



来源:https://stackoverflow.com/questions/57505585/how-can-i-get-a-reliable-alert-via-stackdriver-when-there-are-no-clients-pulling

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!