How to specify KeyValueTextInputFormat Separator in Hadoop-.20 api?

后端 未结 7 739
盖世英雄少女心
盖世英雄少女心 2020-12-08 05:57

In new API (apache.hadoop.mapreduce.KeyValueTextInputFormat) , how to specify separator (delimiter) other than tab(which is default) to separate key and Value.

Samp

相关标签:
7条回答
  • 2020-12-08 06:20

    For KeyValueTextInputFormat the input line should be a key value pair seperated by "\t"

    Key1     Value1,Value2
    

    By changing default seperator, You will be able to read as you wish.

    For New Api

    Here is the solution

    //New API
    Configuration conf = new Configuration();
    conf.set("key.value.separator.in.input.line", ","); 
    Job job = new Job(conf);
    job.setInputFormatClass(KeyValueTextInputFormat.class);
    

    Map

    public class Map extends Mapper<Text, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    
    public void map(Text key, Text value, Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
        System.out.println("key---> "+key);
        System.out.println("value---> "+value.toString());
       .
       .
    

    Output

    key---> one
    value---> first line
    key---> two
    value---> second line
    
    0 讨论(0)
提交回复
热议问题