Solr MoreLikeThis boosting query fields

匿名 (未验证) 提交于 2019-12-03 08:54:24

问题:

I am experimenting with Solr's MoreLikeThis feature.

My schema deals with articles, and I'm looking for similarities between articles within three fields: articletitle, articletext and topic.

The following query works well:

q=id:(2e2ec74c-7c26-49c9-b359-31a11ea50453) &rows=100000000&mlt=true &mlt.fl=articletext,articletitle,topic&mlt.boost=true&mlt.mindf=1&mlt.mintf=1 

But I would like to experiment with boosting different query fields - i.e. putting more weight on similarities in the articletitle, for instance.

The documentation (http://wiki.apache.org/solr/MoreLikeThis) suggests that this can be achieved by including the mlt.qf property, with some boosting.

My attempt at such a query is as follows:

q=id:(2e2ec74c-7c26-49c9-b359-31a11ea50453)&rows=100000000&mlt=true &mlt.fl=articletext,articletitle,topic&mlt.boost=true &mlt.mindf=1&mlt.mintf=1 &mlt.qf=articletext^0.1 articletitle^100 topic^0.1 

However, the boosts seem to have no affect - no matter what boosts I supply, the recommendations remain the same (I would except the above query to heavily favour similarities in the titles, but this doesn't seem to be happening)

I can't find any examples in the documentation that use MoreLikeThis in this way, which leads me to believe I've got something wrong.

Has anyone managed to achieve something like this?

回答1:

The MLT component is useful if you have simple recommendation requirements where you have only one field to match on, or several of equal importance. But any time you want to vary the relative importances of the different fields, or need to do something more specific like include an inverse distance boost, then you will probably want to write your own pseudo MLT handler. All the MLT handler does is to generate the top terms from the fields specified based on their tf.idf scores from the source document. You can easily emulate that functionality in some code that generates a custom SOLR OR query. You will lose the advantage of the termvectors, but so long as your queries are reasonably sized (say < 20 terms) it will probably perform pretty well. We have a small index and so generate our own MLT queries with several hundred terms and it executes in an acceptable amount of time (a few ms). However, I have seen this behavior deteriorate somewhat on large indexes with a few 100 million documents and larger fields, and in those cases you need to restrict your query to a small number of top terms. Using your own code in place of MLT is more work, but you gain a lot more in flexibility.



回答2:

even iam facing with same problem in my case i have to find the similarity between documents using more like this on two fields 1) description and 2) columnValue where columnValue where columnValue is given more weightage than description in the final score. Since solr support only string type similarity match and it doesnot support for type double, so i have converted the columnValue field to string type.(i.e the double values are now of type string ex: 231.0 is now "231.0"). here is the schema:

enter image description here schema.xml

and the query which i am using

http://hostname:8983/solr/collection3/mlt?q= &wt=xml&indent=true&mlt=true&mlt.fl=description,columnValue &fq=versionId:1068383519&mlt.count=4000&mlt.mindf=1&mlt.mintf=1 &fl=tableId,tableIndex,score,versionId,columnId,description,columnValue, refVersionId &mlt.qf=description^0.4+columnValue^0.6

here "id" is composite key on refVersionId,VersionId,TableId,TableIndex,ColumnId

but problem is that columnValue boosting is not effective and i dont find any change in the response even if i remove the columnValue from mlt.fl and mlt.qf, columnValue is not participating in the similarity match. According to me the mlt is working only on single field i.e description. Do u have any suggestion or any solution to fix this issue.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!