Why does ALS.trainImplicit give better predictions for explicit ratings?

问题

Edit: I tried a standalone Spark application (instead of PredictionIO) and my observations are the same. So this is not a PredictionIO issue, but still confusing.

I am using PredictionIO 0.9.6 and the Recommendation template for collaborative filtering. The ratings in my data set are numbers between 1 and 10. When I first trained a model with defaults from the template (using ALS.train), the predictions were horrible, at least subjectively. Scores ranged up to 60.0 or so but the recommendations seemed totally random.

Somebody suggested that ALS.trainImplicit did a better job, so I changed src/main/scala/ALSAlgorithm.scala accordingly:

val m = ALS.trainImplicit(  // instead of ALS.train
  ratings = mllibRatings,
  rank = ap.rank,
  iterations = ap.numIterations,
  lambda = ap.lambda,
  blocks = -1,
  alpha = 1.0,  // also added this line
  seed = seed)

Scores are much lower now (below 1.0) but the recommendations are in line with the personal ratings. Much better, but also confusing. PredictionIO defines the difference between explicit and implicit this way:

explicit preference (also referred as "explicit feedback"), such as "rating" given to item by users. implicit preference (also referred as "implicit feedback"), such as "view" and "buy" history.

and:

By default, the recommendation template uses ALS.train() which expects explicit rating values which the user has rated the item.

source

Is the documentation wrong? I still think that explicit feedback fits my use case. Maybe I need to adapt the template with ALS.train in order to get useful recommendations? Or did I just misunderstand something?

回答1:

A lot of it depends on how you gathered the data. Often ratings that seem explicit can actually be implicit. For instance, assume you give the option of allowing users to rate items that they have purchased / used before. This means that the very fact that they have spent their time evaluating that particular item means that the item is of a high quality. As such, items of poor quality are not rated at all because people do not even bother to use them. As such, even though the dataset is intended to be explicit, you may get better results because if you consider the results to be implicit. Again, this varies significantly based on how the data is obtained.

来源：https://stackoverflow.com/questions/38007724/why-does-als-trainimplicit-give-better-predictions-for-explicit-ratings

标签

machine-learning

apache-spark-mllib

recommendation-engine

collaborative-filtering