mahout

Evaluating recommenders - unable to recommend in x cases

心不动则不痛 提交于 2019-12-05 15:22:58
I'm exploring some of the code examples in Mahout in Action in more detail. I have built a small test that computes the RMS of various algorithms applied to my data. Of course, multiple parameters impact the RMS, but I don't understand the "unable to recommend in ... cases" message that is generated while running an evaluation. Looking at StatsCallable.java, this is generated when an evaluator encounters a NaN response; Perhaps not enough data in the training set or the user's prefs to provide a recommendation. It seems like the RMS score isn't impacted by a very large set of "unable to

Datasets for Apache Mahout

大兔子大兔子 提交于 2019-12-05 14:28:49
I am looking for datasets that can be used for implementing recommendation system usecase of Apache Mahout. I know of only MovieLens Data Sets from GroupLens Research group. Anyone knows any other datasets that can be used for recommendation system implementation? I am particularly interested in item-based data sets though other datasets are most welcome. this is Sebastian from Mahout. There is a dataset from a czech dating website available that might be of interest to you: http://www.occamslab.com/petricek/data/ Btw the term item-based refers to a special collaborative filtering approach not

Using mahout in eclipse WITHOUT USING MAVEN

自古美人都是妖i 提交于 2019-12-05 12:32:45
I really don't want to use maven because it seems like a massive hassle. Is there any way to just download mahout and use it in my eclipse project? All I get from using maven is build path errors and millions of warnings. I have searched for a way to do this but people seem pretty set on using maven all the time. I'm not set on it. I hate Maven. The problem you'll have with Mahout is that they've decided to use it. If that's the case, you're stuck with it, too. Freya Ren Actually, I don't think you could use Mahout without Maven, because Mahout is a Maven project! In eclipse, you could install

how can I compile/using mahout for hadoop 2.0?

五迷三道 提交于 2019-12-05 12:01:02
The latest release mahout 0.9 is only built on hadoop 1.x. (mvn clean install) How can I compile mahout for hadoop 2.0.x? Because When I was running the commands: hadoop jar mahout-examples-0.9-SNAPSHOT-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURENCE -i test -o result I always got the error message IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected. Thanks! To compile Mahout to work with 2.x since it isn't released in a package that is compatible with Hadoop 2.x: mvn clean install -Dhadoop2

Mahout: adjusted cosine similarity for item based recommender

有些话、适合烂在心里 提交于 2019-12-05 07:48:43
问题 For an assignment I'm supposed to test different types of recommenders, which I have to implement first. I've been looking around for a good library to do that (I had thought about Weka at first) and stumbled upon Mahout. I must therefore put forward that: a) I'm completely new to Mahout b) I do not have a strong background in recommenders nor their algorithms (otherwise I wouldn't be doing this class...) and c) sorry but I'm far from being the best developper in the world ==> I'd appreciate

Getting an IOException when running a sample code in “Mahout in Action” on mahout-0.6

隐身守侯 提交于 2019-12-05 05:04:20
I'm learning Mahout and reading "Mahout in Action". When I tried to run the sample code in chapter7 SimpleKMeansClustering.java, an exception popped up: Exception in thread "main" java.io.IOException: wrong value class: 0.0: null is not class org.apache.mahout.clustering.WeightedPropertyVectorWritable at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1874) at SimpleKMeansClustering.main(SimpleKMeansClustering.java:95) I successed this code on mahout-0.5, but on mahout-0.6 I saw this exception. Even I changed directory name from clusters-0 to clusters-0-final, I'm still facing

深入推荐引擎相关算法

时光毁灭记忆、已成空白 提交于 2019-12-05 03:07:34
在现今的推荐技术和算法中,最被大家广泛认可和采用的就是基于协同过滤的推荐方法。它以其方法模型简单,数据依赖性低,数据方便采集 , 推荐效果较优等多个优点成为大众眼里的推荐算法“No.1”。本文将带你深入了解协同过滤的秘密,并给出基于 Apache Mahout 的协同过滤算法的高效实现。 Apache Mahout 是 ASF 的一个较新的开源项目,它源于 Lucene ,构建在 Hadoop 之上,关注海量数据上的机器学习经典算法的高效实现。 点击下面地址阅读全文: http://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/ 来源: oschina 链接: https://my.oschina.net/u/129540/blog/14612

Apache Mahout中推荐算法Slope one源码分析

℡╲_俬逩灬. 提交于 2019-12-05 03:05:33
关于推荐引擎 如今的互联网中,无论是电子商务还是社交网络,对数据挖掘的需求都越来越大了,而推荐引擎正是数据挖掘完美体现;通过分析用户历史行为,将他可能喜欢内容推送给他,能产生相当好的用户体验,这就是推荐引擎。 推荐算法Slope one的原理 首先 Slope one是一种基于项目的协同过滤算法( Item-based Recommendation ),简单介绍这种算法(若理解有误,欢迎大家更正, I am just a beginner ):根据用户们对产品的喜好程度,来将产品分类;举个简单例子:比如有10个用户, 其中有9个人即喜欢产品A,也喜欢产品B,但只有2个人喜欢产品C;于是可以推断产品A和产品B是属于同类的,而产品C可能跟它们不是一类。 好了话不多讲,让我们看看Slope one吧! Slope one是通过用户们对每个产品的评分,来计算产品间的一个差值;这种计算是通过 线性回归 f ( x ) = a x + b 到 的 , 其中a = 1,正如它的名字Slope one(斜率为一);另外用户的评分,在Slope one中 是必不可少的。这里举 例看看它的计算方式:下面是一张用户对书籍的评分表 书 1 书 2 书 3 用户 A 5 3 2 用户 B 3 4 未评分 用户 C 未评分 2 5 书1是否适合推荐给用户C,需要通过Slope one 计算出一个值来判定

Calculate TF-IDF of documents using HBase as the datasource

℡╲_俬逩灬. 提交于 2019-12-04 19:34:51
I want to calculate the TF (Term Frequency) and the IDF (Inverse Document Frequency) of documents that are stored in HBase. I also want to save the calculated TF in a HBase table, also save the calculated IDF in another HBase table. Can you guide me through? I have looked at BayesTfIdfDriver from Mahout 0.4 but I am not getting a head start. The outline of a solution is pretty straight forward: do a word count over your hbase tables, storing both term frequency and document frequency for each word in your reduce phase aggregate the term frequency and document frequency for each word Given a

ClassNotFoundException org.apache.mahout.math.VectorWritable

穿精又带淫゛_ 提交于 2019-12-04 19:21:37
I'm trying to turn a csv file into sequence files so that I can train and run a classifier across the data. I have a job java file that I compile and then jar into the mahout job jar. And when I try to hadoop jar my job in the mahout jar, I get a java.lang.ClassNotFoundException: org.apache.mahout.math.VectorWritable . I'm not sure why this is because if I look in the mahout jar, that class is indeed present. Here are the steps I'm doing #get new copy of mahout jar rm iris.jar cp /home/stephen/home/libs/mahout-distribution-0.7/core/target/mahout-core-0.7-job.jar iris.jar javac -cp :/home