问题
I have some documents in Solr 4.0
. I want the most relevant records to be displayed first and then the less relevant ones.
For eg, I have 3 documents with titles as follows -
- Towards Income Distribution Policy
- Income distribution and economic policies
- Income Distribution Policy in Developing Countries
Now when I query something like q=title:Income Distribution Policy
,
I would like document number 3 to show up first (as the first 3 words are an exact match) then I want the document number 1 to show up second (as except for "Towards" the remaining match) then I want the document number 2 to show up (as there are some words in between).
My schema.xml
looks like this -
<types>
<fieldType name="search" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="German2" />
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="German2" />
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="title" type="search" indexed="true" stored="true"/>
</fields>
EDIT 1 Debug output
"rawquerystring": "title:Income Distribution Policy",
"querystring": "title:Income Distribution Policy",
"parsedquery": "title:incom title:distribut title:polici",
"parsedquery_toString": "title:incom title:distribut title:polici"
EDIT 2 Modified the fieldType
I used the following combination, still the output is the same.
- StandardTokenizerFactory - autoGeneratePhraseQueries(not present) - PorterStemFilterFactory.
- StandardTokenizerFactory - autoGeneratePhraseQueries="true" - PorterStemFilterFactory.
- StandardTokenizerFactory - autoGeneratePhraseQueries(not present).
- StandardTokenizerFactory - autoGeneratePhraseQueries="true".
- WhitespaceTokenizerFactory - autoGeneratePhraseQueries(not present) - PorterStemFilterFactory.
- WhitespaceTokenizerFactory - autoGeneratePhraseQueries="true" - PorterStemFilterFactory.
- WhitespaceTokenizerFactory - autoGeneratePhraseQueries(not present).
- WhitespaceTokenizerFactory - autoGeneratePhraseQueries="true".
回答1:
If you don't sort by anything else, you are sorting by Similarity/Relevance. So, if you are not getting the results in the right order, you may need to play with how you are assigning weights and which query parsers you are using.
I assume you are using eDismax with the boost on the title field. In addition have a look at mm (minimum match) and pf (phrase fields) for boosting.
You may also want to test with autoGeneratePhraseQueries field set on your fieldType.
And, of course, debugQuery=true on the queries will help you to see what is going on. You may find that also adding debug.explain.structured=true could useful the first couple of times you are trying to read the debug output.
回答2:
I tried with ""
around the query string and it worked.
Like - q=title:"Income Distribution Policy" OR title:Income Distribution Policy
.
This gave me the output as document 1 then document 3 and then document 2. Not perfect but close enough.
来源:https://stackoverflow.com/questions/14554850/solrj-query-get-the-most-relevant-record-first