I would like to run this code which I found in Mahout In Action:
package org.help;
import java.io.IOException;
import java.util.ArrayList;
import java.util.
You need to use the "job" JAR file provided by Mahout. It packages up all the dependencies. You need to add your classes to it too. This is how all the Mahout examples work. You shouldn't put Mahout jars in the Hadoop lib since that sort of "installs" a program too deeply in Hadoop.
if you will take code for examples from https://github.com/tdunning/MiA repository, then it contains ready to use pom.xml
file for Maven. And when you compile code with mvn package
, then it will create mia-0.1-job.jar
in the target
directory - this archive contains all dependencies, except Hadoop's, so you can run it on Hadoop cluster without problems
What I did is to set the HADOOP_CLASSPATH with my jar and all the mahout jar files as shown below.
export HADOOP_CLASSPATH=/home/xxx/my.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4.3.0-job.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-examples-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-examples-0.7-cdh4.3.0-job.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-integration-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-math-0.7-cdh4.3.0.jar
Then I was able to run hadoop com.mycompany.mahout.CSVtoVector iris/nb/iris1.csv iris/nb/data/iris.seq
So you have to include all your jars and the mahout jar in the HADOOP_CLASSPATH and then you can just run your class with
hadoop <classname>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-math</artifactId>
<version>0.7</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-collections</artifactId>
<version>1.0</version>
</dependency>