Configure SQS Dead letter Queue to raise a cloud watch alarm on receiving a message

前提是你 提交于 2020-07-03 12:52:46

问题


I was working with Dead letter Queue in Amazon SQS. I want that whenever a new message is received by the queue it should raise a CloudWatch alarm. The problem is I configured an alarm on the metric: number_of_messages_sent of the queue but this metric don't work as expected in case of Dead letter Queues as mentioned in the Amazon SQS Dead-Letter Queues - Amazon Simple Queue Service documentation.

Now some suggestions on this were use number_of_messages_visible but I am not sure how to configure this in an alarm. So if i set that the value of this metric>0 then this is not same as getting a new message in the queue. If an old message is there then the metric value will always be >0. I can do some kind of mathematical expression to get the delta in this metric for some defined period (let's say a minute) but I am looking for some better solution.


回答1:


It is difficult to achieve what is being asked in the question. If the endpoint of cloudwatch alarm is to send Email or notify users about the DLQ message arrival you can do a similar thing with the help of SQS, SNS and Lambda. And from cloudwatch you can see how the DLQ messages grows on time whenever you receive any Email.

  1. Create a SQS DLQ for an existing queue.
  2. Create an SNS topic and subscribe the SNS topic to send Email.
  3. Create a small lambda function which listens the SQS queue for an incoming messages, if there is any new incoming messages, send it to SNS. Since SNS is subscribed to Email you will get the Email whenever any new messages comes to SQS queue. Obviously the trigger for the lambda function is SQS and batch size is 1.
#!/usr/bin/python3
import json
import boto3
import os

def lambda_handler(event, context):
    batch_processes=[]
    for record in event['Records']:
        send_request(record["body"])


def send_request(body):
    # Create SNS client
    sns = boto3.client('sns')

    # Publish messages to the specified SNS topic
    response = sns.publish(
        TopicArn=#YOUR_TOPIC_ARN
        Message=body,    
    )

    # Print out the response
    print(response)



回答2:


What you can do is create a lambda with event source as your DLQ. And from the Lambda you can post custom metric data to CloudWatch. Alarm will be triggered when your data meets the conditions.

Use this reference to configure your lambda such that it gets triggered when a message is sent to your DLQ: Using AWS Lambda with Amazon SQS - AWS Lambda

Here is a nice explanation with code that suggests how we can post custom metrics from Lambda to CloudWatch: Sending CloudWatch Custom Metrics From Lambda With Code Examples

Once the metrics are posted, CloudWatch alarm will trigger as it will match the metrics.




回答3:


I struggled with the same problem and the answer for me was to use NumberOfMessagesSent instead. Then I could set my criteria for new messages that came in during my configured period of time. Here is what worked for me in CloudFormation.

Note that individual alarms do not occur if the alarm stays in an alarm state from constant failure. You can setup another alarm to catch those. ie: Alarm when 100 errors occur in 1 hour using the same method.

Updated: Because the metrics for NumberOfMessagesReceived and NumberOfMessagesSent are dependent on how the message is queued, I have devised a new solutions for our needs using the metric ApproximateNumberOfMessagesDelayed after adding a delay to the dlq settings. If you are adding the messages to the queue manually then NumberOfMessagesReceived will work. Otherwise use ApproximateNumberOfMessagesDelayed after setting up a delay.

MyDeadLetterQueue:
    Type: AWS::SQS::Queue
    Properties:
      MessageRetentionPeriod: 1209600  # 14 days
      DelaySeconds: 60 #for alarms

DLQthresholdAlarm:
 Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: "Alarm dlq messages when we have 1 or more failed messages in 10 minutes"
      Namespace: "AWS/SQS"
      MetricName: "ApproximateNumberOfMessagesDelayed"
      Dimensions:
        - Name: "QueueName"
          Value:
            Fn::GetAtt:
              - "MyDeadLetterQueue"
              - "QueueName"
      Statistic: "Sum"
      Period: 300  
      DatapointsToAlarm: 1 
      EvaluationPeriods: 2       
      Threshold: 1
      ComparisonOperator: "GreaterThanOrEqualToThreshold"
      AlarmActions:
        - !Ref MyAlarmTopic


来源:https://stackoverflow.com/questions/60211243/configure-sqs-dead-letter-queue-to-raise-a-cloud-watch-alarm-on-receiving-a-mess

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!