Solr highlighting gives field/snippets with ANY term, instead of those that satisfy the query fully

爷,独闯天下 提交于 2019-12-12 02:13:53

问题


I'm using Solr 5.x, standard highlighter, and i'm getting snippets which matches even one of the search terms only, even if i indicate q.op=AND. I need ONLY the fields and snippets that matches ALL the terms (unless i say q.op=OR or just omit it), i.e. the field/snippet must satisfy the query. Solr does return the field/snippet that has all the terms, but also return many others.

I'm using hl.fl=*, to get the only fields having the terms, and searching against the default field ('text' containing full doc). Need to use * since i have multiple dynamic fields. Most fields are 'text_general' type (for search and HL), and some are 'string' type for faceting.

If its not possible for snippets to have all the terms, i MUST get only the fields that satisfy the query fully (since the question is more talking about matching all the terms, but the search query can become arbitrarily complex, so the fields/snippets should match the query).

Also, next is to get snippets highlighted with proximity based search/terms. What should i do/use for this? The fields coming in highlighting in this scenario should also satisfy the proximity query (unlike i get a field that contain any term, without regard to proximity constrains and other query terms etc)

Thanks for your help.


回答1:


I've also encountered the same problem with highlighting. In my case, the query like

(foo AND bar) OR eggs

highlighted eggs and foo despite bar was not present in the document. I didn't manage to come up with proper solution, however I devised a dirty workaround.

I use the following query:

id:highlighted_document_id AND text:(my_original_query)

with debugQuery set to true. Then I parse explain text for highlighted_document_id. The text contains the terms from the query, which have contributed to the score. The terms, which should not be highlighted, are not present in the explanation.

The Python regex expressions I use to extract the terms (valid for Solr 5.2.1):

term_regex = re.compile(r'weight\(text:(.+) in') wildcard_term_regex = re.compile(r'text:(.+), product')

then I simply search the markings in the highlighted text and remove them if the term doesn't match against any of the term in term_regex and wildcard_term_regex.

The solution is probably pretty limited, but works for me.



来源:https://stackoverflow.com/questions/31198161/solr-highlighting-gives-field-snippets-with-any-term-instead-of-those-that-sati

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!