query for document that two fields are equal?

前端 未结 3 1362
有刺的猬
有刺的猬 2021-01-13 16:07

There are two text fields in solr, both of them are white space tokenized and have lower case filter. below is the schema:



        
相关标签:
3条回答
  • 2021-01-13 16:29

    For how to correctly query Solr on equality between two fields, please see Nicholas DiPiazza's answer

    Given that the question specifies comparing the full contents of two text (that is analyzed) fields, I believe that won't work well with function queries and the like, so two approaches:

    • Rethink what you are trying to do, or change the index structure. Should those be strings instead of text? If so, do that then refer, as above, to Nicholas DiPiazza's answer.

    • (Original Answer here) A simple way to accomplish this would be to perform the comparison at index time, and store the result in the index. That is, if you have field1 and field2, create a field 1_equals_2, and index it with true, if they are equal based on your comparison when adding the document. Then you can simply search for 1_equals_2:true.

    0 讨论(0)
  • 2021-01-13 16:43

    Method 1 - frange parser

    As mentioned by @dduo you can use the https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-FunctionRangeQueryParser. Here's the way Trey Grainger (one of the authors of Solr in Action) said to do it:

    q=*:*&fq={!frange l=1 u=1 v=$equals}&equals=if(eq(field1,field2),1,0)
    

    I tested this and it worked for a collection with 140 million documents in about 10 second query with 600,000 in the result set.

    So this works, but it's kinda slow.

    Method 2 - Use a streaming expression

    The following expression seems to work to do what we are looking to do here:

    having(search(your_collection_name, q="*:*", sort="id asc"), eq(field1, field2))
    

    This seems to be much more performant, as it returns instant results. So if you can use streaming expressions, this is probably a faster way to get what you are looking for.

    0 讨论(0)
  • 2021-01-13 16:46

    Have you tried the function 'strdist' and range query 'frange'? A range query like this would help:

    {!frange l=1 u=1}strdist(field1, field2, edit)

    0 讨论(0)
提交回复
热议问题