问题
How much data is needed for User CF, Item CF to give recommendation?
I've manually created a small dataset, so I can understand well how the algorithm is working.
I found that for the small dataset I created, Slope-One can give a recommendation, User CF or Item CF can not give recommendation.
What is the reason behind it?
What is the threshold of the data amount ?
回答1:
In user and item based CF, the size of the data set can be really small. The important part is the frequency of the mapping between the items and the users in the dataset. If a user exists in the dataset only once, user based cf most probably will not give recommendations. Because one common item will not provide the threshold similarity for two users to become neighbors. The above explanation is just an example case. For a small dataset like 1000 data, both recommenders will give answers for the most similar item and recommend methods. However, for much smaller datasets, it is useful to control the data manually whether there is enough info about the queried user/item id or not. In this link you can find a really very small controlled dataset to create an item based CF and how it works. I hope this answer is helpful.
回答2:
Movielens, netflix, jester, kddcup dataset are all open for everyone. If you have problem getting dataset, check this http://code.google.com/p/recsyscode/wiki/dataset
回答3:
For small dataset, user CF and item CF maybe the same, but for large data, if user count is larger than item count (e.g. Netflix dataset and yahoo kddcup2011 dataset), item CF is much faster than User CF.
For the result of Top N recommendation, the accuracy of User CF and Item CF are the same,but the coverage are different, User CF recommendation are good for recommending long tail item, while item CF has a better diversity.
来源:https://stackoverflow.com/questions/5470768/how-much-data-is-needed-for-user-based-cf-or-item-based-cf-to-give-recommendatio