Solr proximity ordered vs unordered

不羁岁月 提交于 2019-12-21 05:27:04

问题


In Solr you can perform an ordered proximity search using syntax

"word1 word2"~10

By ordered, I mean word1 will always come before word2 in the document. I would like to know if there is an easy way to perform an unordered proximity search, ie. word1 and word2 occur within 10 words of each other and it doesn't matter which comes first.

One way to do this would be:

"word1 word2"~10 OR "word2 word1"~10

The above will work but I'm looking for something simpler, if possible.


回答1:


Slop means how many word transpositions can occur. So "a b" is going to be different than "b a" because a different number of transpositions are allowed.

  • a foo b has positions (a,1), (foo, 2), (b, 3). To match (a,1), (b,2) will require one change: (b,2) => (b,3)
  • However, to match (b,1), (a,2) you will need (a,2) => (a,1) and (b,1) => (b,3), for a total of three position movements

In general, if "a b"~n matches something, then "b a"~(n+2) will match it too.

EDIT: I guess I never gave an answer. I see two options:

  1. If you want a slop of n, increase it to n+2
  2. Manually disjunctivize your search like you suggested

I think #2 is probably better, unless your slop is very large to begin with.




回答2:


Are you sure it's already doesn't work like that? There is nothing in documentation saying that it's 'ordered':

A proximity search can be done with a sloppy phrase query. The closer together the two terms appear in the document, the higher the score will be. A sloppy phrase query specifies a maximum "slop", or the number of positions tokens need to be moved to get a match.

This example for the standard request handler will find all documents where "batman" occurs within 100 words of "movie":

http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_search_for_one_term_near_another_term_.28say.2C_.22batman.22_and_.22movie.22.29




回答3:


Since Solr 4 it is possible with SurroundQueryParser.

E.g. to do ordered search (query where "phrase two" follows "phrase one" not further than 3 words after):

3W(phrase W one, phrase W two)

To do unordered search (query "phrase two" in proximity of 5 words of "phrase one"):

5N(phrase W one, phrase W two)


来源:https://stackoverflow.com/questions/4079388/solr-proximity-ordered-vs-unordered

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!