Similarity function for Mahout boolean user-based recommender

柔情痞子 提交于 2019-12-07 22:38:01

问题


I am using Mahout to build a user-based recommendation system which operates with boolean data.

I use GenericBooleanPrefUserBasedRecommender, NearestNUserNeighborhood and now trying to decide about the most suitable user similarity function.

It was suggested to use either LogLikelihoodSimilarity or TanimotoCoefficientSimilarity. I tried both and am getting [subjectively evaluated] meaningful results in both cases. However the RMSE rating for the same data set is better the LogLikehood. The number of "no recommendation" is similar in both case.

Can anyone recommend which of these similarity function is most suitable for this case?


回答1:


(I'm the developer.) If I was stranded on a desert island with just one similarity metric for data without ratings/prefs, it would be log-likelihood. I would generally expect it to be the better similarity metric.

The problem with the test you're doing is that, perhaps not at all obviously, it's not meaningful for this kind of recommender / data. RMSE is root-mean-square-error, and it's comparing the actual vs predicted rating for held-out test data. But you have no ratings. They're all "1.0". Really, RMSE is always 0!

It's not, since to have anything to rank on, these recommenders will rank by some meaningful function of the similarities. But they are not estimating ratings / prefs at all. So, RMSE means squat here.

The only metric you can really use is a precision/recall test in this case, I think. Even that is problematic. This and more fun topics are covered in a book which I will shamelessly promote: Mahout in Action



来源:https://stackoverflow.com/questions/7529333/similarity-function-for-mahout-boolean-user-based-recommender

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!