I'm using Dynamo DB pretty heavily for a service I'm building. A new client request has come in that requires cloud search. I see that a cloud search domain can be created from a dynamo table via the AWS console.
My question is this:
Is there a way to automatically offload data from a dynamo table into a cloud search domain via the API or otherwise at a specified time interval?
I'd prefer this to manually offloading dynamo documents to cloudsearch. All help greatly appreciated!
Here are two ideas.
The official AWS way of searching DynamoDB data with CloudSearch
This approach is described pretty thoroughly in the "Synchronizing a Search Domain with a DynamoDB Table" section of http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-dynamodb-data.html.
The downside is that it sounds like a huge pain: you have to either re-create new search domains or maintain an update table in order to sync, and you'd need a cron job or something to execute the script.
The AWS Lambdas way
Use the newish Lambdas event processing service. It is pretty simple to set up an event stream based on Dynamo (see http://docs.aws.amazon.com/lambda/latest/dg/wt-ddb.html).
Your Lambda would then submit a search document to CloudSearch based on the Dynamo event. For an example of submitting a document from a Lambda, see https://gist.github.com/fzakaria/4f93a8dbf483695fb7d5
This approach is a lot nicer in my opinion as it would continuously update your search index without any involvement from you.
I'm not so clear on how Lambda would always keep the data in sync with the data in dynamoDB. Consider the following flow:
- Application updates a DynamoDB table's Record A (say to A1)
- Very closely after that Application updates same table's same record A (to A2)
- Trigger for 1 causes Lambda of 1 to start execute
- Trigger for 2 causes Lambda of 2 to start execute
- Step 4 completes first, so CloudSearch sees A2
- Now Step 3 completes, so CloudSearch sees A1
Lambda triggers are not guaranteed to start ONLY after previous invocation is complete (Correct if wrong, and provide me link)
As we can see, the thing goes out of sync.
The closest I can think which will work is to use AWS Kinesis Streams, but those too with a single Shard (1MB ps limit ingestion). If that restriction works, then your consumer application can be written such that the record is first processed sequentially, i.e., only after previous record is put into CS, then the next record should be put.
来源:https://stackoverflow.com/questions/30202956/automatically-offload-dynamo-table-to-cloud-search-domain