mahout

How to solve “log4j:WARN No appenders could be found for logger” error on Twenty Newsgroups Classification Example

£可爱£侵袭症+ 提交于 2019-12-20 06:30:50
问题 I am trying to run the 2newsgroup classification example in Mahout. I have set MAHOUT_LOCAL=true, the classifier doesn't display the Confusion matrix and gives the following warnings : ok. You chose 1 and we'll use cnaivebayes creating work directory at /tmp/mahout-work-cloudera + echo 'Preparing 20newsgroups data' Preparing 20newsgroups data + rm -rf /tmp/mahout-work-cloudera/20news-all + mkdir /tmp/mahout-work-cloudera/20news-all + cp -R /tmp/mahout-work-cloudera/20news-bydate/20news-bydate

How can I use Mahout's sequencefile API code?

白昼怎懂夜的黑 提交于 2019-12-19 09:47:04
问题 There exists in Mahout a command for create sequence file as bin/mahout seqdirectory -c UTF-8 -i <input address> -o <output address> . I want use this command as code API. 回答1: You can do something like this: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path

K-means with really large matrix

落爺英雄遲暮 提交于 2019-12-18 15:48:33
问题 I have to perform a k-means clustering on a really huge matrix (about 300.000x100.000 values which is more than 100Gb). I want to know if I can use R software to perform this or weka. My computer is a multiprocessor with 8Gb of ram and hundreds Gb of free space. I have enough space for calculations but loading such a matrix seems to be a problem with R (I don't think that using the bigmemory package would help me and big matrix use automatically all my RAM then my swap file if not enough

Dropwizard Application crashed by AbstractJAXBProvider

戏子无情 提交于 2019-12-14 03:55:17
问题 I have a server application implemented using Dropwizard and Gradle as Build System. Now I want to integrate Apache Mahout for some recommender system action. After adding the Mahout dependency and try to run, I get exceptions. My initial dependencies look like dependencies { compile 'io.dropwizard:dropwizard-core:0.9.1' compile 'io.dropwizard:dropwizard-jdbi:0.9.1' compile 'mysql:mysql-connector-java:5.1.37' compile 'redis.clients:jedis:2.8.0' compile 'com.google.guava:guava:18.0' compile

Model creation for User User collanborative filtering

陌路散爱 提交于 2019-12-14 02:55:41
问题 I want to do a sort of user-user collaborative filtering wherein the users in the user-item matrix are a selected part of whole users in the database. These selected users are refreshed regularly with newly selected users preferences. New users shouldn't be added to the matrix. For a new user, based on his preferences we need to recommend items from the user-item matrix (which has only a part of users which are selected). I do not want to add the new anonymous users to the matrix. Explored in

K means clustering mahout

最后都变了- 提交于 2019-12-13 10:29:54
问题 I am trying to cluster a sample dataset which is in csv file format. But when I give the below command, user@ubuntu:/usr/local/mahout/trunk$ bin/mahout kmeans -i /root/Mahout/temp/parsedtext-seqdir-sparse-kmeans/tfidf-vectors/ -c /root/Mahout/temp/parsedtext-kmeans-clusters -o /root/Mahout/reuters21578/root/Mahout/temp/parsedtext-kmeans -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 2 -k 1 -ow --clustering -cl I am getting the following error, saying there is no input clusters

Mahout 0.9 and Hadoop 2.2.0 - Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

為{幸葍}努か 提交于 2019-12-13 00:37:25
问题 Where did my code go wrong? When I searched, I found a similar post, but couldn't adapt it to my problem. Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174) at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614) at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run

How to vectorize text file in mahout?

早过忘川 提交于 2019-12-12 11:43:10
问题 I'm having a text file with label and tweets . positive,I love this car negative,I hate this book positive,Good product. I need to convert each line into vector value.If i use seq2sparse command means the whole document gets converted to vector,but i need to convert each line as vector not the whole document. ex : key : positive value : vectorvalue(tweet) How can we achieve this in mahout? /* Here is what i have done */ StringTokenizer str= new StringTokenizer(line,","); String label=str

Hadoop 2.2.0 is compatible with Mahout 0.8?

本秂侑毒 提交于 2019-12-12 08:02:40
问题 I have hadoop cluster version 2.2.0 running with mahout 0.8, is it compatible? Because whenever I run this command: bin/mahout recommenditembased --input mydata.dat --usersFile user.dat --numRecommendations 2 --output output/ --similarityClassname SIMILARITY_PEARSON_CORRELATION Give me this error: Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.mahout.common.HadoopUtil

Error when setting mapred.map.tasks in pseudo-distributed mode

亡梦爱人 提交于 2019-12-12 06:54:22
问题 As suggested here, I am running hadoop in pseudodistributed mode with the following mapred-site.xml file. The job is running on a 4 core machine. <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>mapred.map.tasks</name> <value>4</value> </property> <property> <name>mapred.reduce.tasks</name> <value>4</value> </property> </configuration> I am getting the following error: The ratio of reported blocks 1.0000 has reached the