How can I retrieve all searchable (not deleted) documents in Amazon cloudsearch

不打扰是莪最后的温柔 提交于 2019-12-08 19:46:27

问题


I want to retrieve all my searchable document from cloudsearch

I tried to do a negative search like that:

search-[mySearchEndPoint].cloudsearch.amazonaws.com/2011-02-01/search?bq=(not keywords: '!!!testtest!!!')

It work's but it also return all the deleted documents.

So how can I get all active document only?


回答1:


The key thing to know is that CloudSearch doesn't really delete. Instead, the "delete" function retains IDs in the index, but clears all fields in those deleted docs, including setting uint fields to 0. This works fine for positive queries, which will match no text in the cleared, "deleted" docs.

A workaround is to add a uint field to your docs, called 'updated' below, to use as a filter for queries that might return deleted IDs, such as negative queries.

(The samples below uses the Boto interface library for CloudSearch, with many steps omitted for brevity.)

When you add docs, set the field to the current timestamp

doc['updated'] = now_utc  # unix time in seconds; useful for 'version' also.
doc_service.add(id, now_utc, doc)
conn.commit()

when you delete, CloudSearch sets uint fields to 0:

doc_service.delete(id, now_utc)
conn.commit()
# CloudSearch sets doc's 'updated' field = 0

Now you can distinguish between deleted and active docs in a negative query. The samples below are searching a test index with 86 docs, about half of them deleted.

# negative query that shows both active and deleted IDs
neg_query = "title:'-foobar'"
results = search_service.search(bq=neg_query)
results.hits  # 86 docs in a test index

# deleted items
deleted_query = "updated:0"
results = search_service.search(bq=deleted_query)
results.hits  # 46 of them have been deleted

# negative, filtered query that lists only active
filtered_query = "(and updated:1.. title:'-foobar')"
results = search_service.search(bq=filtered_query)
results.hits  # 40 active docs



回答2:


I think you can do that like this:

search-[mySearchEndPoint].cloudsearch.amazonaws.com/2011-02-01/search?bq=-impossibleTermToSearch

Attention to the '-' in the begin of the term



来源:https://stackoverflow.com/questions/14566522/how-can-i-retrieve-all-searchable-not-deleted-documents-in-amazon-cloudsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!