Hadoop: How to output different format types in the same job?

左心房为你撑大大i 提交于 2019-12-07 14:07:14

问题


I want to output gzip and lzo formats at the same time in one job.

I used MultipleOutputs, and add two named outputs like this:

MultipleOutputs.addNamedOutput(job, "LzoOutput", GBKTextOutputFormat.class, Text.class, Text.class);

GBKTextOutputFormat.setOutputCompressorClass(job, LzoCodec.class);

MultipleOutputs.addNamedOutput(job, "GzOutput", TextOutputFormat.class, Text.class, Text.class);

TextOutputFormat.setOutputCompressorClass(job, GzipCodec.class);

(GBKTextOutputFormat here is written by myself which extends FileOutputFormat)

They are used in reducer like:

multipleOutputs.write("LzoOutput", NullWritable.get(), value, "/user/hadoop/lzo/"+key.toString());

multipleOutputs.write("GzOutput", NullWritable.get(), value, "/user/hadoop/gzip/"+key.toString());

The result is:

I can get outputs in the two path, but they are both in gzip format.

Someone can help me? Thanks!

==========================================================================

More:

I just looked at the source code of setOutputCompressorClass in FileOutputFormat, in which conf.setClass("mapred.output.compression.codec", codecClass, CompressionCodec.class);

It seems that mapred.output.compression.codec in configuration will be reset when setOutputCompressorClass is called.

So the actual compression format is the one we set at last, and we cannot set two different compression formats in the same job ? Or there is something else ignored ?


回答1:


So maybe as a work-around, try setting the correct outputCompressorClass directly in the configuration

context.getConfiguration().setOutputCompressorClass(GzipCodec.class);

just before your write call to each of the outputs. It does look like any output format configuration parameters other than key class, value class and output path are not handled well by MultipleOutputs and we may have to write a bit of code to offset that oversight.



来源:https://stackoverflow.com/questions/12953010/hadoop-how-to-output-different-format-types-in-the-same-job

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!