I'm exploring some of the code examples in Mahout in Action in more detail. I have built a small test that computes the RMS of various algorithms applied to my data.
Of course, multiple parameters impact the RMS, but I don't understand the "unable to recommend in ... cases" message that is generated while running an evaluation.
Looking at StatsCallable.java, this is generated when an evaluator encounters a NaN response; Perhaps not enough data in the training set or the user's prefs to provide a recommendation.
It seems like the RMS score isn't impacted by a very large set of "unable to recommend" cases. Is that assumption correct? Should I be evaluating my algorithm not only on RMS but also the ratio of "unable to recommend" cases versus my overall training set?
I'd appreciate any feedback.
Yes this essentially means there was no data at all on which to base an estimate. That's generally a symptom of data sparseness. It should be rare, and happen only for users with data that's very small or disconnected from others'.
I personally think it's not such a big deal unless it's a really significant percentage (20%+?) I'd worry more if you couldn't generate any recs at all for many users.
来源:https://stackoverflow.com/questions/10355031/evaluating-recommenders-unable-to-recommend-in-x-cases