DynamoDB not receiving the entire SQS message body

£可爱£侵袭症+ 提交于 2021-01-28 05:43:44

问题


I am pulling data from an API in batches and sending it to an SQS Queue. Where I am having an issue is processing the message in order to send the data to DynamoDB. There is supposed to be 147,689 records in the dataset. However, when running the code, sometimes less than 147,689 records will be put to DynamoDB, sometimes more than 147,689 records will be put to DynamoDB, and sometimes 147,689 records will be put to DynamoDB. It is not consistently putting 147,689 records into the database.

I have tried everything I can think of to try and fix this issue including (utilizing a Fifo queue instead of a standard queue, increasing the visibility timeout, increasing the delivery timeout, using uuid.uuid1() instead of uuid.uuid4()) I am looping through the "Record" list so not sure why it is not processing the entire batch. Below is my latest code to process the message and send the data to DynamoDB:

import boto3
import json
import uuid
import time

dynamo = boto3.client("dynamodb", "us-east-1")

def lambda_handler(event, context):
    for item in json.loads(event["Records"][0]["body"]):
        item["id"] = uuid.uuid1().bytes
        for key, value in item.items():
            if key == "id":
                item[key] = {"B": bytes(value)}
            elif key == "year":
                item[key] = {"N": str(value)}
            elif key == "amt_harvested":
                item[key] = {"N": str(value)}
            elif key == "consumed":
                item[key] = {"N": str(value)}
            else:
                item[key] = {"S": str(value)}

     
            time.sleep(0.001)
        
        dynamo.put_item(TableName="TableOne", Item=dict(item))

回答1:


Lambda Event Source Mapping for SQS will poll for messages and call Lambda function for a batch of records based on batch size which by default is 10. And processing the batch should be done by looping event["Records"] array.

Key factors that should be considered for setting batch size.

  • If lambda processing fails, entire batch will be resend and will be retried by AWS. If function can't accept processing duplicate records, batchsize should be set to 1.
  • If processing a single record in lambda takes 20 ms, we will still be charged for 100ms(this is minimum) by AWS, we can easily reduce 5x cost by simply setting batch size of 5.

Always recommended to

  • Set a batch size higher and code lambda to be idempotent.
  • Code Lambda to process all records irrespective what batch size is.


来源:https://stackoverflow.com/questions/65646101/dynamodb-not-receiving-the-entire-sqs-message-body

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!