Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?

前端 未结 3 1033
轻奢々
轻奢々 2020-12-09 05:46

I think the title is already explaining my question. I would like to change

key (tab space) value

into

key;value

相关标签:
3条回答
  • 2020-12-09 05:54

    Set the configuration property mapred.textoutputformat.separator to ";"

    0 讨论(0)
  • 2020-12-09 06:04

    you can use "KEY_VALUE_SEPERATOR " property of "KeyValueLineRecordReader" to specify a separator of your choice.

    0 讨论(0)
  • 2020-12-09 06:17

    In lack of better documentation, here's what I've collected:

        setTextOutputFormatSeparator(final Job job, final String separator){
                final Configuration conf = job.getConfiguration(); //ensure accurate config ref
    
                conf.set("mapred.textoutputformat.separator", separator); //Prior to Hadoop 2 (YARN)
                conf.set("mapreduce.textoutputformat.separator", separator);  //Hadoop v2+ (YARN)
                conf.set("mapreduce.output.textoutputformat.separator", separator);
                conf.set("mapreduce.output.key.field.separator", separator);
                conf.set("mapred.textoutputformat.separatorText", separator); // ?
        }
    
    0 讨论(0)
提交回复
热议问题