问题
I have a custom partitioner like below:
import java.util.*;
import org.apache.hadoop.mapreduce.*;
public static class SignaturePartitioner extends Partitioner<Text,Text>
{
@Override
public int getPartition(Text key,Text value,int numReduceTasks)
{
return (key.toString().Split(' ')[0].hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
}
I set the hadoop streaming parameter like below
-file SignaturePartitioner.java \
-partitioner SignaturePartitioner \
Then I get an error: Class Not Found.
Do you know what's the problem?
Best Regards,
回答1:
I faced the same issue, but managed to solve after lot of research.
Root cause is streaming-2.6.0.jar uses mapred api and not mapreduce api. Also, implement Partitioner interface, and not extend Partitioner class. The following worked for me:
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.Partitioner;
import org.apache.hadoop.mapred.JobConf;`
public class Mypartitioner implements Partitioner<Text, Text> {`
public void configure(JobConf job) {}
public int getPartition(Text pkey, Text pvalue, int pnumparts)
{
if (pkey.toString().startsWith("a"))
return 0;
else return 1 ;
}
}
compile Mypartitioner, create jar, and then,
bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar
-libjars /home/sanjiv/hadoop-2.6.0/Mypartitioner.jar
-D mapreduce.job.reduces=2
-files /home/sanjiv/mymapper.sh,/home/sanjiv/myreducer.sh
-input indir -output outdir -mapper mymapper.sh
-reducer myreducer.sh -partitioner Mypartitioner
回答2:
-file SignaturePartitioner.java -partitioner SignaturePartitioner
The -file option will make the file available on all the required nodes by the Hadoop framework. It needs to point to the class name and not the Java file name.
来源:https://stackoverflow.com/questions/13191468/how-to-specify-the-partitioner-for-hadoop-streaming