Issue with org.apache.hadoop.mapreduce imports in Apache Hadoop 2.2

问题

I recently installed the new Hadoop 2.2. I had previously written a simple Word Count MapReduce program which used to work with ease on CDH4. But now, I have problems with all org.apache.hadoop.mapreduce imports. Can someone tell me which jar exactly to export to fix these imports? The code is as follows just in case someone needs to point out changes I need to make to make sure it runs in Hadoop 2.2.

import java.io.IOException;
import java.lang.InterruptedException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MapRWordCount {
    private final static IntWritable ONE = new IntWritable(1);
    private final static Pattern WORD = Pattern.compile("\\w+");

    public static class WordCountMapper 
            extends Mapper<LongWritable, Text, Text, IntWritable> {
        private final Text word = new Text();

        @Override
        public void map(LongWritable key, Text value, Context context) 
                throws IOException, InterruptedException {

            String valueString = value.toString();
            Matcher matcher = WORD.matcher(valueString);
            while (matcher.find()) {
                word.set(matcher.group().toLowerCase());
                context.write(word, ONE);
            }
        }
    }

    public static class WordCountReducer 
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private final IntWritable totalCount = new IntWritable();

        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context) 
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable value : values) {
                sum += value.get();
            }
            totalCount.set(sum);
            context.write(key, totalCount);
        }
    }

    public static void main(String[] args) 
            throws IOException, ClassNotFoundException, InterruptedException {

        if (args.length != 2) {
            System.err.println("Usage: MapRWordCount <input_path> <output_path>");
            System.exit(-1);
        }

        Job job = new Job();
        job.setJarByClass(MapRWordCount.class);
        job.setJobName("MapReduce Word Count");

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.setMapperClass(WordCountMapper.class);
        job.setCombinerClass(WordCountReducer.class);
        job.setReducerClass(WordCountReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

}

回答1:

I found the JARs in the following locations:

$HADOOP_HOME/share/hadoop/common/hadoop-common-2.2.0.jar
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar

回答2:

In maven, I had to add the following to the pom.xml and then build cleanly to be able to find the Mapper and Reducer classes in Java:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.2.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>2.2.0</version>
</dependency>

Now the following don't throw errors:

import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;

回答3:

If you're just looking for the location of appropriate JARs in Hadoop 2.2, then look under share/hadoop/{common,hdfs,mapreduce}. You will find files ending in -2.2.0.jar that are likely what you are looking for.

This should be the same as in CDH4, unless you installed the "MR1" version, which matches the Hadoop 1.x structure.