问题
My goal is to take daily snapshots of an RDS table and put it in a DynamoDB table. The table should only contain data from a single day.
For this have a Data Pipeline set up to query a RDS table and publish the results into S3 in CSV format.
Then a HiveActivity imports this CSV into a DynamoDB table by creating external tables for the file and an existing DynamoDB table.
This works great, but older entries from the previous day still exist in the DynamoDB table. I want to do this within Data Pipeline if at all possible. I need to:
1) Find a way to clear the DynamoDB table, or at least drop/recreate it, or 2) Include an extra column of the snapshot date and find a way to clear out all older entries.
Any ideas on how I can do this?
回答1:
You can use DynamoDb Time to Live(TTL) which allows you to set an expiration time after which items are auto deleted from the DynamoDb table. TTL is very useful for cases where data loses it's relevance after a specific time period and in your case it can be start time of next day.
来源:https://stackoverflow.com/questions/49963481/clear-all-existing-entries-in-dynamodb-table-in-aws-data-pipeline