Very basic question about Hadoop and compressed input files

前端未结

关注

 4  714

耶瑟儿～ 2021-02-04 03:15

I have started to look into Hadoop. If my understanding is right i could process a very big file and it would get split over different nodes, however if the file is compressed t

4条回答

难免孤独 (楼主)

2021-02-04 03:41

Consider using LZO compression. It's splittable. That means a big .lzo file can be processed by many mappers. Bzip2 can do that, but it's slow.

Cloudera had an introduction about it. For MapReduce, LZO sounds a good balance between compression ratio and compress/decompress speed.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...