项目:
是一个发展中的推荐系统
安装:
( 优先用easy_install )
* numpy
Q:遇到缺少vcvarsall.bat的问题
A: 当已经安装了vs20xx时,可设置环境变量
SET VS90COMNTOOLS=%VS100COMNTOOLS%
Q:遇到各种编译错误
A:直接上win的安装版,注意和python版本对应
http://sourceforge.net/projects/numpy/files/NumPy/
* Scipy
直接上win的安装版,注意和python版本对应
http://sourceforge.net/projects/scipy/files/Scipy
* scikits.learn
依赖numpy
* matplotlib
为了构建文档和一些示例代码
依赖较多,可跳过
* crab
easy_install安装效果不好,import scikits.crab提示找不到
改在 https://github.com/muricoca/crab 下载源码包,用python setup.py install 安装
(在linux下也是同样情况)
----------------------------------
* 测试
保存为文件,直接python执行即可
#!/usr/bin/env python
#coding=utf-8
def base_demo():
# 基础数据-测试数据
from scikits.crab import datasets
movies = datasets.load_sample_movies()
#print movies.data
#print movies.user_ids
#print movies.item_ids
#Build the model
from scikits.crab.models import MatrixPreferenceDataModel
model = MatrixPreferenceDataModel(movies.data)
#Build the similarity
# 选用算法 pearson_correlation
from scikits.crab.metrics import pearson_correlation
from scikits.crab.similarities import UserSimilarity
similarity = UserSimilarity(model, pearson_correlation)
# 选择 基于User的推荐
from scikits.crab.recommenders.knn import UserBasedRecommender
recommender = UserBasedRecommender(model, similarity, with_preference=True)
print recommender.recommend(5) # 输出个结果看看效果 Recommend items for the user 5 (Toby)
# 选择 基于Item 的推荐(同样的基础数据,选择角度不同)
from scikits.crab.recommenders.knn import ItemBasedRecommender
recommender = ItemBasedRecommender(model, similarity, with_preference=True)
print recommender.recommend(5) # 输出个结果看看效果 Recommend items for the user 5 (Toby)
def itembase_demo():
from scikits.crab.models.classes import MatrixPreferenceDataModel
from scikits.crab.recommenders.knn.classes import ItemBasedRecommender
from scikits.crab.similarities.basic_similarities import ItemSimilarity
from scikits.crab.recommenders.knn.item_strategies import ItemsNeighborhoodStrategy
from scikits.crab.metrics.pairwise import euclidean_distances
movies = {
'Marcel Caraciolo': \
{'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5, 'The Night Listener': 3.0}, \
'Paola Pow': \
{'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5, 'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 3.5}, \
'Leopoldo Pires': \
{'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0, 'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Lorena Abreu': \
{'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'The Night Listener': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 2.5}, \
'Steve Gates': \
{'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 2.0}, \
'Sheldom':\
{'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5}, \
'Penny Frewman': \
{'Snakes on a Plane':4.5,'You, Me and Dupree':1.0, 'Superman Returns':4.0}, 'Maria Gabriela': {}
}
model = MatrixPreferenceDataModel(movies)
items_strategy = ItemsNeighborhoodStrategy()
similarity = ItemSimilarity(model, euclidean_distances)
recsys = ItemBasedRecommender(model, similarity, items_strategy)
print recsys.most_similar_items('Lady in the Water')
#Return the recommendations for the given user.
print recsys.recommend('Leopoldo Pires')
#Return the 2 explanations for the given recommendation.
print recsys.recommended_because('Leopoldo Pires', 'Just My Luck', 2)
#Return the similar recommends
print recsys.most_similar_items('Lady in the Water')
#估算评分
print recsys.estimate_preference('Leopoldo Pires','Lady in the Water')
base_demo()
itembase_demo()
推荐算法:
这里不细究算法本身,只介绍概念,方便理解crab的实现
* kNN算法
简单的分类/聚类算法,从训练集中找到和新数据最接近的k条记录,然后根据他们的主要分类来决定新数据的类别。
3个主要因素:训练集、距离或相似的衡量、k的大小
* SVD
带有社交因素,根据已有的评分情况,分析出评分者对各个因子的喜好程度以及电影包含各个因子的程度,最后再反过来根据分析结果预测评分
来源:oschina
链接:https://my.oschina.net/u/216880/blog/260749