CloudSearch performance with frequent updates of small batches

心已入冬 提交于 2019-12-24 13:05:06

问题


I have a use case where I need to upload small document batches (typical 1 to 10 documents of 1KB each) to CloudSearch. Every 2 or 3 seconds a new batch is uploaded. The CloudSearch docs for bulk uploads say:

Make sure your batches are as close to the 5 MB limit as possible. Uploading a larger amount of smaller batches slows down the upload and indexing process.

It's ok if there is a 30 seconds delay before the documents show up in search results. Will my implementation work well as my document count is increasing, let's say to 500.000 docs?


回答1:


Indexing time should be well under your 30 second SLA even with 500k docs, regardless of how or whether you batch your submissions.

I say this based on my own testing with an index of 300k docs and 38 index fields on an m1.small instance type, where it takes less than 3 seconds for a document to be searchable. There are a lot of variables that could affect your own situation, such as how many index fields you have, your instance size, etc, but I think my setup reflects the unfavorable conditions (m1.small instance with complex indexing schema) and is still an order of magnitude faster than your SLA. It's anecdotal evidence of course, but you should be fine.



来源:https://stackoverflow.com/questions/37131035/cloudsearch-performance-with-frequent-updates-of-small-batches

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!