How to specify KeyValueTextInputFormat Separator in Hadoop-.20 api?

后端未结

关注

 7  738

In new API (apache.hadoop.mapreduce.KeyValueTextInputFormat) , how to specify separator (delimiter) other than tab(which is default) to separate key and Value.

Samp

相关标签:

7条回答

北海茫月

2020-12-08 06:04

Example

public class KeyValueTextInput extends Configured implements Tool {
    public static void main(String args[]) throws Exception {
        String log4jConfPath = "log4j.properties";
        PropertyConfigurator.configure(log4jConfPath);
        int res = ToolRunner.run(new KeyValueTextInput(), args);
        System.exit(res);
    }

    public int run(String[] args) throws Exception {

Configuration conf = this.getConf();

        //conf.set("key.value.separator.in.input.line", ",");

conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",");

        Job job = Job.getInstance(conf, "WordCountSampleTemplate");
        job.setJarByClass(KeyValueTextInput.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        //job.setMapOutputKeyClass(Text.class);
        //job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setInputFormatClass(KeyValueTextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        Path outputPath = new Path(args[1]);
        FileSystem fs = FileSystem.get(new URI(outputPath.toString()), conf);
        fs.delete(outputPath, true);
        FileOutputFormat.setOutputPath(job, outputPath);
        return job.waitForCompletion(true) ? 0 : 1;
    }
}

class Map extends Mapper<Text, Text, Text, Text> {
    public void map(Text k1, Text v1, Context context) throws IOException, InterruptedException {
        context.write(k1, v1);
    }
}

class Reduce extends Reducer<Text, Text, Text, Text> {
    public void reduce(Text Key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        String sum = " || ";
        for (Text value : values)
            sum = sum + value.toString() + " || ";
        context.write(Key, new Text(sum));
    }
}

0 讨论(0)

南方客

2020-12-08 06:07
It's a sequence matter.

The first line conf.set("key.value.separator.in.input.line", ",") must come before you create an instance of Job class. So:
```
conf.set("key.value.separator.in.input.line", ","); 
Job job = new Job(conf);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2020-12-08 06:07

First, the new API did not finished in 0.20.* so if you want to use new API in 0.20.*, you should implement the feature by yourself.For example you can use FileInputFormat to achieve. Ignore the LongWritable key, and split the Text value on comma yourself.

0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2020-12-08 06:12
Please set the following in the Driver Code.
```
conf.set("key.value.separator.in.input.line", ",");
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一生所求

2020-12-08 06:17
By default, the KeyValueTextInputFormat class uses tab as a separator for key and value from input text file.

If you want to read the input from a custom separator, then you have to set the configuration with the attribute that you are using.

For the new Hadoop APIs, it is different:
```
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ";");
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

隐瞒了意图╮

2020-12-08 06:18

In the newer API you should use mapreduce.input.keyvaluelinerecordreader.key.value.separator configuration property.

Here's an example:

Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",");

Job job = new Job(conf);
job.setInputFormatClass(KeyValueTextInputFormat.class);
// next job set-up

0 讨论(0)

1 2 下一页