问题
I am using Mahout to build a user-based recommendation system which operates with boolean data.
I use GenericBooleanPrefUserBasedRecommender
, NearestNUserNeighborhood
and now trying to decide about the most suitable user similarity function.
It was suggested to use either LogLikelihoodSimilarity
or TanimotoCoefficientSimilarity
. I tried both and am getting [subjectively evaluated] meaningful results in both cases. However the RMSE rating for the same data set is better the LogLikehood. The number of "no recommendation" is similar in both case.
Can anyone recommend which of these similarity function is most suitable for this case?
回答1:
(I'm the developer.) If I was stranded on a desert island with just one similarity metric for data without ratings/prefs, it would be log-likelihood. I would generally expect it to be the better similarity metric.
The problem with the test you're doing is that, perhaps not at all obviously, it's not meaningful for this kind of recommender / data. RMSE is root-mean-square-error, and it's comparing the actual vs predicted rating for held-out test data. But you have no ratings. They're all "1.0". Really, RMSE is always 0!
It's not, since to have anything to rank on, these recommenders will rank by some meaningful function of the similarities. But they are not estimating ratings / prefs at all. So, RMSE means squat here.
The only metric you can really use is a precision/recall test in this case, I think. Even that is problematic. This and more fun topics are covered in a book which I will shamelessly promote: Mahout in Action
来源:https://stackoverflow.com/questions/7529333/similarity-function-for-mahout-boolean-user-based-recommender