mahout | 易学教程

Hadoop Mahout Clustering

阅读更多关于 Hadoop Mahout Clustering

问题 I am trying to apply canopy clustering in Mahout. I already converted a text file into sequence file. But i cannot view the sequence file. Anyways I thought of applying canopy clustering by giving the following command, hduser@ubuntu:/usr/local/mahout/trunk$ mahout canopy -i /user/Hadoop/mahout_seq/seqdata -o /user/Hadoop/clustered_data -t1 5 -t2 3 I got the following error, 16/05/10 17:02:03 INFO mapreduce.Job: Task Id : attempt_1462850486830_0008_m_000000_1, Status : FAILED Error: java.lang

Mahout IntDoubleProcedure NoClassDefFoundError

阅读更多关于 Mahout IntDoubleProcedure NoClassDefFoundError

问题 I'm using my school's server which already have hadoop and mahout. But I need to parse csv to vector. So I tried someone else code from git. But I got the following exception which I can't solve. dcmac04:dir username$ java -jar BigDataNaiveBayes_fat.jar May 30, 2015 1:48:17 AM org.apache.hadoop.util.NativeCodeLoader <clinit> WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable May 30, 2015 1:48:17 AM org.apache.hadoop.io.compress

Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector

阅读更多关于 Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector

问题 I write java code to convert CSV file to vectors to use in classification task using random forest algorithm.I use mahout 0.10.0, hadoop 2.6.0 and eclipse.Then, I try to run this code from cmd using that command: hadoop jar /path to my jar/CSVToVector.jar com.classification.csvtovector.CSVToVector But I got this error: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348)

Running Mahout Locally getting ClassNotFoundException for MahoutDriver

阅读更多关于 Running Mahout Locally getting ClassNotFoundException for MahoutDriver

问题 I am trying to run Mahout locally (without Hadoop) on a Windows 8 Machine. I realize this is not the optimal set up but that's what I've got to work with. When I try to run bin/mahout I get the following error: $ bin/mahout MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath. no HADOOP_HOME set, running locally Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/dri ver/MahoutDriver Caused by: java.lang.ClassNotFoundException: org.apache.mahout.driver

Datasets for Apache Mahout

阅读更多关于 Datasets for Apache Mahout

问题 I am looking for datasets that can be used for implementing recommendation system usecase of Apache Mahout. I know of only MovieLens Data Sets from GroupLens Research group. Anyone knows any other datasets that can be used for recommendation system implementation? I am particularly interested in item-based data sets though other datasets are most welcome. 回答1: this is Sebastian from Mahout. There is a dataset from a czech dating website available that might be of interest to you: http://www

how can I compile/using mahout for hadoop 2.0?

阅读更多关于 how can I compile/using mahout for hadoop 2.0?

问题 The latest release mahout 0.9 is only built on hadoop 1.x. (mvn clean install) How can I compile mahout for hadoop 2.0.x? Because When I was running the commands: hadoop jar mahout-examples-0.9-SNAPSHOT-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURENCE -i test -o result I always got the error message IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected. Thanks! 回答1: To compile Mahout to work with 2.x

ClassNotFoundException org.apache.mahout.math.VectorWritable

阅读更多关于 ClassNotFoundException org.apache.mahout.math.VectorWritable

问题 I'm trying to turn a csv file into sequence files so that I can train and run a classifier across the data. I have a job java file that I compile and then jar into the mahout job jar. And when I try to hadoop jar my job in the mahout jar, I get a java.lang.ClassNotFoundException: org.apache.mahout.math.VectorWritable . I'm not sure why this is because if I look in the mahout jar, that class is indeed present. Here are the steps I'm doing #get new copy of mahout jar rm iris.jar cp /home

Mahout runs out of heap space

阅读更多关于 Mahout runs out of heap space

问题 I am running NaiveBayes on a set of tweets using Mahout. Two files, one 100 MB and one 300 MB. I changed JAVA_HEAP_MAX to JAVA_HEAP_MAX=-Xmx2000m ( earlier it was 1000). But even then, mahout ran for a few hours ( 2 to be precise) before it complained of heap space error. What should i do to resolve ? Some more info if it helps : I am running on a single node, my laptop infact and it has 3GB of RAM (only) . Thanks. EDIT: I ran it the third time with <1/2 of the data that i used the first time

Why is Maven trying to compile my code as -source 1.3?

阅读更多关于 Why is Maven trying to compile my code as -source 1.3?

问题 I get this error mvn -e package in Ubuntu 12.04: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project HadoopSkeleton: Compilation failure: Compilation failure: [ERROR] /home/jesvin/dev/hadoop/HadoopMahoutSkeleton-master/src/main/java/HadoopSkeleton/App.java:[22,8] error: generics are not supported in -source 1.3 [ERROR] [ERROR] (use -source 5 or higher to enable generics) [ERROR] /home/jesvin/dev/hadoop/HadoopMahoutSkeleton

Run cvb in mahout 0.8

阅读更多关于 Run cvb in mahout 0.8

问题 The current Mahout 0.8-SNAPSHOT includes a Collapsed Variational Bayes (cvb) version for Topic Modeling and removed the Latent Dirichlet Analysis (lda) approach, because cvb can be parallelized way better. Unfortunately there is only documentation for lda on how to run an example and generate meaningful output. Thus, I want to: preprocess some texts correctly run the cvb0_local version of cvb inspect the results by looking at the top n words in each of the generated topics 回答1: So here are