Automatically offload dynamo table to cloud search domain

痞子三分冷 提交于 2020-01-10 19:54:14

问题


I'm using Dynamo DB pretty heavily for a service I'm building. A new client request has come in that requires cloud search. I see that a cloud search domain can be created from a dynamo table via the AWS console.

My question is this:

Is there a way to automatically offload data from a dynamo table into a cloud search domain via the API or otherwise at a specified time interval?

I'd prefer this to manually offloading dynamo documents to cloudsearch. All help greatly appreciated!


回答1:


Here are two ideas.

  1. The official AWS way of searching DynamoDB data with CloudSearch

    This approach is described pretty thoroughly in the "Synchronizing a Search Domain with a DynamoDB Table" section of http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-dynamodb-data.html.

    The downside is that it sounds like a huge pain: you have to either re-create new search domains or maintain an update table in order to sync, and you'd need a cron job or something to execute the script.

  2. The AWS Lambdas way

    Use the newish Lambdas event processing service. It is pretty simple to set up an event stream based on Dynamo (see http://docs.aws.amazon.com/lambda/latest/dg/wt-ddb.html).

    Your Lambda would then submit a search document to CloudSearch based on the Dynamo event. For an example of submitting a document from a Lambda, see https://gist.github.com/fzakaria/4f93a8dbf483695fb7d5

    This approach is a lot nicer in my opinion as it would continuously update your search index without any involvement from you.




回答2:


I'm not so clear on how Lambda would always keep the data in sync with the data in dynamoDB. Consider the following flow:

  1. Application updates a DynamoDB table's Record A (say to A1)
  2. Very closely after that Application updates same table's same record A (to A2)
  3. Trigger for 1 causes Lambda of 1 to start execute
  4. Trigger for 2 causes Lambda of 2 to start execute
  5. Step 4 completes first, so CloudSearch sees A2
  6. Now Step 3 completes, so CloudSearch sees A1

Lambda triggers are not guaranteed to start ONLY after previous invocation is complete (Correct if wrong, and provide me link)

As we can see, the thing goes out of sync.

The closest I can think which will work is to use AWS Kinesis Streams, but those too with a single Shard (1MB ps limit ingestion). If that restriction works, then your consumer application can be written such that the record is first processed sequentially, i.e., only after previous record is put into CS, then the next record should be put.



来源:https://stackoverflow.com/questions/30202956/automatically-offload-dynamo-table-to-cloud-search-domain

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!