Generating Multiple Output files with Hadoop 0.20+

后端 未结 2 613
隐瞒了意图╮
隐瞒了意图╮ 2021-01-03 04:52

I am trying to output the results of my reducer to multiple files. The data results are all contained in one file, and the rest of the results are split based on a category

相关标签:
2条回答
  • 2021-01-03 05:35

    You can do this in Hadoop 0.20, just that as mentioned you have to use the older API.

    There's some very rough code to do so in http://github.com/orngejaket/Info_Moist_1_Splicer/tree/master/src/contrib/streaming/src/java/org/infochimps/hadoop/mapred/lib/

    The resulting jar writes each record to a file named after its (sanitized) key.

    0 讨论(0)
  • 2021-01-03 05:46

    Support for MultipleOutputs isn't in 0.20. You will need to use the older API.

    It has been added into 0.21 which is currently unreleased as org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.

    This thread on the mailing list talks about this problem.

    0 讨论(0)
提交回复
热议问题