Utilizing multiple, weighed data models for a Mahout recommender

十年热恋 提交于 2019-12-04 13:57:13

I think you're thinking of it in the right way. Yes you want a bit more expressiveness than a simple exists/doesn't exist for subscriptions and articles since they mean somewhat different things. I would suggest picking weights that reflect their relative frequency. For example if users have read 100K articles over all time, and made 10000 subscriptions, then you might pick a subscription weight to be "10" and a read weight to be "1".

This doesn't quite work if you treat those values as preference scores, for a number of reasons. It works better if you use an approach that treats them like what they are, which are linear weights.

I would point you to the ALS-WR algorithm, which is specifically designed for this type of input. For example: Collaborative Filtering for Implicit Feedback Datasets

This is implemented in Mahout as ParallelALSFactorizationJob on Hadoop. It works nicely though requires Hadoop. (I can't take credit for that, though I did write most of the recommender code in Mahout.)

Advertisement: I'm working on commercializing a "next generation" system, evolved from my work in Mahout, as Myrrix. It is an implementation of ALS-WR and is ideal for your kind of input. It's quite easy to download and run, and doesn't need Hadoop.

Given that it may be directly suitable for your problem I don't mind plugging it here.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!