Hadoop - output key/value separator

眉间皱痕 提交于 2019-12-11 03:33:12

问题


I want to change the Output Separator to ; instead of tab. I already tried: Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated? but still my Output ist

key (tab) value

I'm using the Cloudera Demo (CDH 4.1.3). Here is my Code:

Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: Driver <in> <out>");
            System.exit(2);
        }
        conf.set("mapreduce.textoutputformat.separator", ";");

        Path in = new Path(otherArgs[0]);
        Path out = new Path(otherArgs[1]);

        Job job= new Job(getConf());
        job.setJobName("MapReduce");

        job.setMapperClass(Mapper.class);
        job.setReducerClass(Reducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job, in);
        FileOutputFormat.setOutputPath(job, out);

        job.setJarByClass(Driver.class);
        job.waitForCompletion(true) ? 0 : 1;

I want

key;value

as my output.


回答1:


The property is called mapreduce.output.textoutputformat.separator. So you are basically missing the output there.

You can see that in the newest trunk source code found in the Apache SVN.




回答2:


You should conf.set("mapreduce.textoutputformat.separator", ";");

Use of conf.set("mapreduce.textoutputformat.separator", ";"); is deprecated

mapred and mapreduce

Link

Full code:This is working.

    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
        System.err.println("Usage: Driver <in> <out>");
        System.exit(2);
    }
    conf.set("mapred.textoutputformat.separator", ";");

    Path in = new Path(otherArgs[0]);
    Path out = new Path(otherArgs[1]);

    Job job= new Job(getConf());
    job.setJobName("MapReduce");

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.setInputPaths(job, in);
    FileOutputFormat.setOutputPath(job, out);

    job.setJarByClass(Driver.class);
    job.waitForCompletion(true) ? 0 : 1;



回答3:


In 2019, it's getConf().set(TextOutputFormat.SEPARATOR, ";"); (thanks @AsheshKumarSingh)

Using native constant provides better maintainability and less surprise I believe.

Important: this property must be set before Job.getInstance(getConf()) / new Job(getConf()), as job copies parameters and doesn't care about further conf modifications.



来源:https://stackoverflow.com/questions/16614029/hadoop-output-key-value-separator

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!