mahout | 易学教程

How to solve “log4j:WARN No appenders could be found for logger” error on Twenty Newsgroups Classification Example

阅读更多关于 How to solve “log4j:WARN No appenders could be found for logger” error on Twenty Newsgroups Classification Example

问题 I am trying to run the 2newsgroup classification example in Mahout. I have set MAHOUT_LOCAL=true, the classifier doesn't display the Confusion matrix and gives the following warnings : ok. You chose 1 and we'll use cnaivebayes creating work directory at /tmp/mahout-work-cloudera + echo 'Preparing 20newsgroups data' Preparing 20newsgroups data + rm -rf /tmp/mahout-work-cloudera/20news-all + mkdir /tmp/mahout-work-cloudera/20news-all + cp -R /tmp/mahout-work-cloudera/20news-bydate/20news-bydate

How can I use Mahout's sequencefile API code?

阅读更多关于 How can I use Mahout's sequencefile API code?

问题 There exists in Mahout a command for create sequence file as bin/mahout seqdirectory -c UTF-8 -i <input address> -o <output address> . I want use this command as code API. 回答1: You can do something like this: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path

K-means with really large matrix

阅读更多关于 K-means with really large matrix

问题 I have to perform a k-means clustering on a really huge matrix (about 300.000x100.000 values which is more than 100Gb). I want to know if I can use R software to perform this or weka. My computer is a multiprocessor with 8Gb of ram and hundreds Gb of free space. I have enough space for calculations but loading such a matrix seems to be a problem with R (I don't think that using the bigmemory package would help me and big matrix use automatically all my RAM then my swap file if not enough

Dropwizard Application crashed by AbstractJAXBProvider

阅读更多关于 Dropwizard Application crashed by AbstractJAXBProvider

问题 I have a server application implemented using Dropwizard and Gradle as Build System. Now I want to integrate Apache Mahout for some recommender system action. After adding the Mahout dependency and try to run, I get exceptions. My initial dependencies look like dependencies { compile 'io.dropwizard:dropwizard-core:0.9.1' compile 'io.dropwizard:dropwizard-jdbi:0.9.1' compile 'mysql:mysql-connector-java:5.1.37' compile 'redis.clients:jedis:2.8.0' compile 'com.google.guava:guava:18.0' compile

Model creation for User User collanborative filtering

阅读更多关于 Model creation for User User collanborative filtering

问题 I want to do a sort of user-user collaborative filtering wherein the users in the user-item matrix are a selected part of whole users in the database. These selected users are refreshed regularly with newly selected users preferences. New users shouldn't be added to the matrix. For a new user, based on his preferences we need to recommend items from the user-item matrix (which has only a part of users which are selected). I do not want to add the new anonymous users to the matrix. Explored in

K means clustering mahout

阅读更多关于 K means clustering mahout

问题 I am trying to cluster a sample dataset which is in csv file format. But when I give the below command, user@ubuntu:/usr/local/mahout/trunk$ bin/mahout kmeans -i /root/Mahout/temp/parsedtext-seqdir-sparse-kmeans/tfidf-vectors/ -c /root/Mahout/temp/parsedtext-kmeans-clusters -o /root/Mahout/reuters21578/root/Mahout/temp/parsedtext-kmeans -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 2 -k 1 -ow --clustering -cl I am getting the following error, saying there is no input clusters

Mahout 0.9 and Hadoop 2.2.0 - Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

阅读更多关于 Mahout 0.9 and Hadoop 2.2.0 - Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

问题 Where did my code go wrong? When I searched, I found a similar post, but couldn't adapt it to my problem. Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174) at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614) at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run

How to vectorize text file in mahout?

阅读更多关于 How to vectorize text file in mahout?

问题 I'm having a text file with label and tweets . positive,I love this car negative,I hate this book positive,Good product. I need to convert each line into vector value.If i use seq2sparse command means the whole document gets converted to vector,but i need to convert each line as vector not the whole document. ex : key : positive value : vectorvalue(tweet) How can we achieve this in mahout? /* Here is what i have done */ StringTokenizer str= new StringTokenizer(line,","); String label=str

Hadoop 2.2.0 is compatible with Mahout 0.8?

阅读更多关于 Hadoop 2.2.0 is compatible with Mahout 0.8?

问题 I have hadoop cluster version 2.2.0 running with mahout 0.8, is it compatible? Because whenever I run this command: bin/mahout recommenditembased --input mydata.dat --usersFile user.dat --numRecommendations 2 --output output/ --similarityClassname SIMILARITY_PEARSON_CORRELATION Give me this error: Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.mahout.common.HadoopUtil

Error when setting mapred.map.tasks in pseudo-distributed mode

阅读更多关于 Error when setting mapred.map.tasks in pseudo-distributed mode

问题 As suggested here, I am running hadoop in pseudodistributed mode with the following mapred-site.xml file. The job is running on a 4 core machine. <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>mapred.map.tasks</name> <value>4</value> </property> <property> <name>mapred.reduce.tasks</name> <value>4</value> </property> </configuration> I am getting the following error: The ratio of reported blocks 1.0000 has reached the