SQS Lambda - retry logic?

后端 未结 4 1194
忘了有多久
忘了有多久 2021-02-12 22:21

When the message has been added to an SQS queue and it is configured to trigger a lambda function (nodejs).

When a lambda function is triggered - I may want to retry sa

4条回答
  •  失恋的感觉
    2021-02-12 22:35

    Re-tries and re-tries "timeout" can all be configured directly in the SQS queue.

    When you create a queue, set up the following attributes:

    The Default Visibility Timeout will be the time that the message will be hidden once it has been received by your application. If the message fails during the lambda run and an exception is thrown, lambda will not delete any of the messages in the batch and all of them will eventually re-appear in the queue.

    If you only want to try 3 times, you must set the SQS re-drive policy (AKA Dead Letter Queue)

    The re-drive policy will enable your queue to redirect messages to a Dead Letter Queue (DLQ) after the message has re-appeared in the queue N number of times, where N is a number between 1 and 1000.

    It is essential to understand that lambda will continue to process a failed message (a message that generates an exception in the code) until:

    1. It is processed without any errors (lambda deletes the message)
    2. The Message Retention Period expires (SQS deletes the message)
    3. It is sent to the DLQ set in the SQS queue re-drive policy (SQS "moves" the message to the DLQ)
    4. You delete the message from the queue directly in your code (User deletes the message)

    Lambda will not dispose of this bad message otherwise.


    Important observations

    Lambda will not deal with failed messages

    Based on several experiments I ran to understand the behavior of the SQS integration (the documentation on re-tries is ambiguous ATM), lambda will not delete failed messages and will continue to re-try them. Even if you have a Lambda DLQ is setup, messages will not be sent to the DLQ, it fully relies on the configuration of the SQS queue for this purpose as stated in the lambda DLQ documentation.

    Recommendation:

    • Always use a re-drive policy in your SQS queue.

    Exceptions will fail a whole batch of messages

    As I stated earlier if there is an exception in your code while processing a message, the whole batch of messages is re-tried, it doesn't matter if some of the messages were processed correctly. If for some reason a downstream service is failing you may end up with messages that were processed in the DLQ.

    Recommendation:

    • Manually delete messages that have been processed correctly
    • Ensure that your lambda function can process the same message more than once

    Lambda concurrency limits and SQS side effects

    The blog post "Lambda Concurrency Limits and SQS Triggers Don’t Mix Well (Sometimes)" describes how, if your concurrency limit is set too low, lambda may cause batches of messages to be throttled and the received attempt to be incremented without ever being processed.

    Recommendation:

    The post and Amazon's recommendations are:

    • Set the queue’s visibility timeout to at least 6 times the timeout that you configure on your function.
    • The extra time allows for Lambda to retry if your function execution is throttled while your function is processing a previous batch.
    • Set the maxReceiveCount on the queue’s re-drive policy to at least 5. This will help avoid sending messages to the dead-letter queue due to throttling.
    • Configure the dead-letter to retain failed messages long enough so that you can move them back later to be reprocessed

提交回复
热议问题