How to use Hadoop Streaming with LZO-compressed Sequence Files?

后端 未结 4 1308
抹茶落季
抹茶落季 2021-01-13 05:20

I\'m trying to play around with the Google ngrams dataset using Amazon\'s Elastic Map Reduce. There\'s a public dataset at http://aws.amazon.com/datasets/8172056142375670, a

4条回答
  •  北海茫月
    2021-01-13 05:57

    I have weird results use lzo and my problem get resolved with some other codec

    -D mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
    -D mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
    

    Then things just work. You don't need (maybe also shouldn't) to change the -inputformat.

    Version: 0.20.2-cdh3u4, 214dd731e3bdb687cb55988d3f47dd9e248c5690
    

提交回复
热议问题