reducing similar top results in solr result output

徘徊边缘 提交于 2019-12-09 07:11:30

问题


I have a search in solr that is returning about 1500 documents. These documents are basically products. For example, I have a bunch of womens shoes in my dataset. My dataset has a wide variety of shoes for women, but it also has some very similar results, for instance, size 11 womens nike trainers, size 10 womens nike trainers, etc... Now, when I search for womens shoes, solr scoring causes a certain set of these results to bubble to the top that are all very similar.. For instance, all the colors of one particular shoe model might come to the top. They are definitely different products, but I would prefer to get a wider variety of results than just every color of nike trainer shoes.

Does anyone have any suggestions? Note, I don't want to eliminate all the individually colored products. When someone searches for blue womens nike trainers, I want them to get the blue model as the top result. I'm using the dismax query as my main query. What I would like to do is basically boost on some kind of "uniqueness of name compared to other results" factor.


回答1:


You could either collapse on fields like color or so:

http://wiki.apache.org/solr/FieldCollapsing

or you can use near duplicate detection when indexing:

http://wiki.apache.org/solr/Deduplication

http://karussell.wordpress.com/2010/12/23/detect-stolen-and-duplicate-tweets-with-solr/

the latter algorithm is implemented in jetwick for tweets, so it should work for titles, but not performant enough for big documents (so only plagiarism detection for 'short' strings). for long text you'll need local sensitive hashing:

http://en.wikipedia.org/wiki/Locality_sensitive_hashing



来源:https://stackoverflow.com/questions/5122788/reducing-similar-top-results-in-solr-result-output

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!