I\'m trying out Spring Data - Hadoop for executing the MR code on a remote cluster from my local machine\'s IDE
//Hadoop 1.1.2, Spring 3.2.4, Spring-Data-Hadoop 1.0.0
You can try jar-by-class
attribute to locate the jar and let hadoop upload the jar to the TaskTracker.
<hdp:job id="wc-job" mapper="com.hadoop.basics.WordCounter.WCMapper"
reducer="com.hadoop.basics.WordCounter.WCReducer"
input-path="${wordcount.input.path}"
jar-by-class="com.hadoop.basics.WordCounter"
output-path="${wordcount.output.path}" user="bigdata">
</hdp:job>
At last, WCMapper and WCReducer should be static, or they can not be created by hadoop.
I had same problem with my very first hadoop job here, and the solution was with way of compiling the jar file in eclipse.
When you want to export java project as a jar file in eclipse, two options are available:
Extract required libraries into generated JAR
Package required libraries into generated JAR
First option solved my problem and probably will solve yours.
Right Click on your Project, Build Path -> Configure Build Path. Select Libraries tab, Click Add External JARs and copy all the Required JAR files in your hadoop directory. I guess, this will solve your problem.
I was getting the same issue then separating the mapper and reducer class works and the following changes were done in applicationContext.xml
:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:util="http://www.springframework.org/schema/util"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:hdp="http://www.springframework.org/schema/hadoop" xmlns:batch="http://www.springframework.org/schema/batch"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd
http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-4.2.xsd">
<context:property-placeholder location="classpath:application.properties" />
<hdp:configuration namenode-principal="hdfs://xx.yy.com" rm-manager-uri="xx.yy.com"
security-method="kerb" user-keytab="location" rm-manager-principal="username"
user-principal="username">
fs.default.name=${fs.default.name}
mapred.job.tracker=${mapred.job.tracker}
</hdp:configuration>
<hdp:job id="wordCountJobId" input-path="${input.path}"
output-path="${output.path}" jar-by-class="com.xx.poc.Application"
mapper="com.xx.poc.Map" reducer="com.xx.poc.Reduce" />
<hdp:job-runner id="wordCountJobRunner" job-ref="wordCountJobId"
run-at-startup="true" />
</beans>