Generating Multiple Output files with Hadoop 0.20+

后端未结

关注

 2  613

I am trying to output the results of my reducer to multiple files. The data results are all contained in one file, and the rest of the results are split based on a category

相关标签:

2条回答

青春惊慌失措

2021-01-03 05:35

You can do this in Hadoop 0.20, just that as mentioned you have to use the older API.

There's some very rough code to do so in http://github.com/orngejaket/Info_Moist_1_Splicer/tree/master/src/contrib/streaming/src/java/org/infochimps/hadoop/mapred/lib/

The resulting jar writes each record to a file named after its (sanitized) key.

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2021-01-03 05:46

Support for MultipleOutputs isn't in 0.20. You will need to use the older API.

It has been added into 0.21 which is currently unreleased as org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.

This thread on the mailing list talks about this problem.

0 讨论(0)
发布评论:

提交评论
- 加载中...