Why is it possible to get duplicate results from Azure Search when paging?

此生再无相见时 提交于 2019-12-22 09:57:52

问题


Sometimes when using Azure Search's paging there may be duplicate documents in the results. Here is an example of a paging request:

GET /indexes/myindex/docs?search=*$top=15&$skip=15&$orderby=rating desc

Why is this possible? How can it happen? Are there any consistency guarantees when paging?


回答1:


The results of paginated queries are not guaranteed to be stable if the underlying index is changing, or if you are relying on sorting by relevance score. Paging simply changes the value of $skip for each page, but each query is independent and operates on the current view of the data (i.e. – there is no snapshotting or other consistency mechanism like you’d find in a general-purpose database).

Here is an example of how you might get duplicates. Assume an index with four documents:

  1. { "id": "1", "rating": 5 }
  2. { "id": "2", "rating": 3 }
  3. { "id": "3", "rating": 2 }
  4. { "id": "4", "rating": 1 }

Now assume you want to page through the results with a page size of two, ordered by rating. You’d execute this query to get the first page:

$top=2&$skip=0&$orderby=rating desc

And get these results:

  1. { "id": "1", "rating": 5 }
  2. { "id": "2", "rating": 3 }

Now you insert a fifth document into the index:

{ "id": "5", "rating": 4 }

Shortly thereafter, you execute a query to fetch the second page of results:

$top=2&$skip=2&$orderby=rating desc

And get these results:

  1. { "id": "2", "rating": 3 }
  2. { "id": "3", "rating": 2 }

Notice that you’ve fetched document 2 twice. This is because the new document 5 has a greater value for rating, so it sorts before document 2 and lands on the first page.

In situations where you're relying on document score (either you don't use $orderby or you're using $orderby=search.score()), paging can return duplicate results because each query might be handled by a different replica, and that replica may have different term and document frequency statistics -- enough to change the relative ordering of documents at page boundaries.

For these reasons, it’s important to think of Azure Search as a search engine (because it is), and not a general-purpose database.



来源:https://stackoverflow.com/questions/42428473/why-is-it-possible-to-get-duplicate-results-from-azure-search-when-paging

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!