I\'m trying to play around with the Google ngrams dataset using Amazon\'s Elastic Map Reduce. There\'s a public dataset at http://aws.amazon.com/datasets/8172056142375670, a
I have weird results use lzo and my problem get resolved with some other codec
-D mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
-D mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
Then things just work. You don't need (maybe also shouldn't) to change the -inputformat
.
Version: 0.20.2-cdh3u4, 214dd731e3bdb687cb55988d3f47dd9e248c5690