Running Mahout from the command line (CLASSPATH)

假装没事ソ 提交于 2019-12-06 03:40:45

This is better asked at user@mahout.apache.org.

Your classpath is missing compiled code in Mahout's examples module, which is where this class lives.

Better yet, have a look at this walkthrough: https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation

pferrel

If you put $MAHOUT_HOME/examples/target/classes is in the java CLASSPATH (as Sean mentions) this will work when running locally but you'll probably have to try the method below for a hadoop cluster deployment.

I found the following post very illuminating about how get the right classes in various configurations of mahout/hadoop.

http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

The mahout script does not accept hadoop job parameters (like --libJar) in all cases although I hope it does in the future, especially where a parameter to the job is a classname (seq2sparse for instance).

What I had to do was copy my custom jar into $HADOOP_HOME/lib on the master node. Evidently a symlink does not work, it appears you have to copy each jar you want to the directory.

Then don't forget to stop and start hadoop because as the cloudera reference says it packages the libs at startup.

What I did is to set the HADOOP_CLASSPATH with my jar and all the mahout jar files as shown below.

export HADOOP_CLASSPATH=/home/xxx/my.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4.3.0-job.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-examples-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-examples-0.7-cdh4.3.0-job.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-integration-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-math-0.7-cdh4.3.0.jar

Then I was able to run hadoop com.mycompany.mahout.CSVtoVector iris/nb/iris1.csv iris/nb/data/iris.seq

So you have to include all your jars and the mahout jar in the HADOOP_CLASSPATH and then you can just run your class with

hadoop <classname>

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!