问题
I’m new to both Java and Hadoop. I’m trying a very simple program to get Frequent pairs.
e.g.
Input: My name is Foo. Foo is student.
Intermediate Output:
Map:
(my, name): 1
(name ,is): 1
(is, Foo): 2 // (is, Foo) = (Foo, is)
(is, student)
So finally it should give frequent pair is (is ,Foo)
.
Pseudo code looks like this:
Map(Key: line_num, value: line)
words = split_words(line)
for each w in words:
for each neighbor x:
emit((w, x)), 1)
Here my key is not one, it’s pair. While going through documentation, I read that for each new key we have to implement WritableComparable.
So I'm confused about that. If someone can explain about this class, that would be great. Not sure it’s really true. Then I can figure out on my own how to do that!
I don't want any code neither mapper nor anything ... just want to understand what does this WritableComparable do? Which method of WritableComparable actually compares keys? I can see equals and compareTo, but I cannot find any explanation about that. Please no code! Thanks
EDIT 1: In compareTo I return 0 for pair (a, b) = (b, a) but still its not going to same reducer, is there any way in compareTo method I reset key (b, a) to (a, b) or generate totally new key?
EDIT 2: I don't know for generating new key, but in compareTo changing logic, it worked fine ..! Thanks everyone!
回答1:
WritableComparable
is an interface that makes the class that implements it be two things: Writable
, meaning it can be written to and read from your network via serialization, etc. This is necessary if you're going to use it as a key or value so that it can be sent between Hadoop nodes. And Comparable
, which means that methods must be provided that show how one object of the given class can be compared to another. This is used when the Reducer organizes by key.
This interface is neceesary when you want to create your own object to be a key. And you'd need to create your own InputFormat
as opposed to using one of the ones that come with Hadoop. This can get be rather difficult (from my experience), especially if you're new to both Java and Hadoop.
So if I were you, I wouldn't bother with that as there's a much simpler way. I would use TextInputFormat
which is conveniently both the default InputFormat
as well as pretty easy to use and understand. You could simply emit each key as a Text
object which is pretty simliar to a string. There is a caveat though; like you mentioned "is Foo"
and "Foo is"
need to be evaluated to be the same key. So with every pair of words you pull out, sort them alphabetically before passing them as a key with the String.compareTo
method. That way you're guarenteed to have no repeats.
回答2:
Here is mapper class for your problem , the frequent pair of words logic is not implemented . i guess u were not lookin for that .
public class MR {
public static class Mapper extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, LongWritable>
{
public static int check (String keyCheck)
{
// logig to check key is frequent or not ?
return 0;
}
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
Map< String, Integer> keyMap=new HashMap<String, Integer>();
String line=value.toString();
String[] words=line.split(" ");
for(int i=0;i<(words.length-1);i++)
{
String mapkeyString=words[i]+","+words[i+1];
// Logic to check is mapKeyString is frequent or not .
int count =check(mapkeyString);
keyMap.put(mapkeyString, count);
}
Set<Entry<String,Integer>> entries=keyMap.entrySet();
for(Entry<String, Integer> entry:entries)
{
context.write(new Text(entry.getKey()), new LongWritable(entry.getValue()));
}
}
}
public static class Reduce extends Reducer<Text, LongWritable, Text, Text>
{
protected void reduce(Text key, Iterable<LongWritable> Values,
Context context)
throws IOException, InterruptedException {
}
}
public static void main(String[] args) {
Configuration configuration=new Configuration();
try {
Job job=new Job(configuration, "Word Job");
job.setMapperClass(Mapper.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
来源:https://stackoverflow.com/questions/12625669/mapreduce-writablecomparables