I have a custom partitioner like below:
import java.util.*;
import org.apache.hadoop.mapreduce.*;
public static class SignaturePartitioner extends Partitioner<Text,Text>
public int getPartition(Text key,Text value,int numReduceTasks)
return (key.toString().Split(' ')[0].hashCode() & Integer.MAX_VALUE) % numReduceTasks;
I set the hadoop streaming parameter like below
-file SignaturePartitioner.java \
-partitioner SignaturePartitioner \
Then I get an error: Class Not Found.
Do you know what's the problem?
Best Regards,
I faced the same issue, but managed to solve after lot of research.
Root cause is streaming-2.6.0.jar uses mapred api and not mapreduce api. Also, implement Partitioner interface, and not extend Partitioner class. The following worked for me:
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.Partitioner;
import org.apache.hadoop.mapred.JobConf;`
public class Mypartitioner implements Partitioner<Text, Text> {`
public void configure(JobConf job) {}
public int getPartition(Text pkey, Text pvalue, int pnumparts)
if (pkey.toString().startsWith("a"))
return 0;
else return 1 ;
compile Mypartitioner, create jar, and then,
bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar
-libjars /home/sanjiv/hadoop-2.6.0/Mypartitioner.jar
-D mapreduce.job.reduces=2
-files /home/sanjiv/mymapper.sh,/home/sanjiv/myreducer.sh
-input indir -output outdir -mapper mymapper.sh
-reducer myreducer.sh -partitioner Mypartitioner
-file SignaturePartitioner.java -partitioner SignaturePartitioner
The -file option will make the file available on all the required nodes by the Hadoop framework. It needs to point to the class name and not the Java file name.