问题
I want to do an incremental DynamoDB backup on S3 using DynamoDB Streams. I have a lambda that reads the dynamodb stream and writes files into S3. In order to mark already read shards I have ExclusiveStartShardId logged into configuration file.
What I do is:
- Describe the stream (using the logged ExclusiveStartShardId)
- Get stream's shards
- For all shards that are CLOSED (has EndingSequenceNumber) I do the following:
- Get shard iterator for the certain shard (shardIteratorType: 'TRIM_HORIZON')
- Iterate through shard and fetch records till NextShardIterator becomes null
The problem here is that I read only closed shards and in order to get new records I must wait (undetermined-amount-of-time) for it to be closed.
It seems that the last shard is usually in OPEN state (has NO EndingSequenceNumber). If I remove the check for EndingSequenceNumber from the pseudo code above I end up with infinite loop because when I hit the last shard NextShardIterator is always presented. I cannot also do a check if fetched items are 0 because there could be "gaps" in the shard.
In this tutorial numChanges is used in order to stop the infinite loop http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.LowLevel.Walkthrough.html#Streams.LowLevel.Walkthrough.Step5
What is the best approach in this situation?
I also found a similar question: Reading data from dynamodb streams. Unfortunately I could not find the answer for my question.
回答1:
Why not attach the DynamoDB stream as an event source for your Lambda function? Then Lambda will take care of polling the stream and calling your function when necessary. See this for details.
来源:https://stackoverflow.com/questions/37814516/reading-aws-dynamodb-stream