There are two text fields in solr, both of them are white space tokenized and have lower case filter. below is the schema:
For how to correctly query Solr on equality between two fields, please see Nicholas DiPiazza's answer
Given that the question specifies comparing the full contents of two text (that is analyzed) fields, I believe that won't work well with function queries and the like, so two approaches:
Rethink what you are trying to do, or change the index structure. Should those be strings instead of text? If so, do that then refer, as above, to Nicholas DiPiazza's answer.
(Original Answer here) A simple way to accomplish this would be to perform the comparison at index time, and store the result in the index. That is, if you have field1
and field2
, create a field 1_equals_2
, and index it with true
, if they are equal based on your comparison when adding the document. Then you can simply search for 1_equals_2:true
.
As mentioned by @dduo you can use the https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-FunctionRangeQueryParser. Here's the way Trey Grainger (one of the authors of Solr in Action) said to do it:
q=*:*&fq={!frange l=1 u=1 v=$equals}&equals=if(eq(field1,field2),1,0)
I tested this and it worked for a collection with 140 million documents in about 10 second query with 600,000 in the result set.
So this works, but it's kinda slow.
The following expression seems to work to do what we are looking to do here:
having(search(your_collection_name, q="*:*", sort="id asc"), eq(field1, field2))
This seems to be much more performant, as it returns instant results. So if you can use streaming expressions, this is probably a faster way to get what you are looking for.
Have you tried the function 'strdist' and range query 'frange'? A range query like this would help:
{!frange l=1 u=1}strdist(field1, field2, edit)