Very basic question about Hadoop and compressed input files

前端 未结 4 714
耶瑟儿~
耶瑟儿~ 2021-02-04 03:15

I have started to look into Hadoop. If my understanding is right i could process a very big file and it would get split over different nodes, however if the file is compressed t

4条回答
  •  难免孤独
    2021-02-04 03:41

    Consider using LZO compression. It's splittable. That means a big .lzo file can be processed by many mappers. Bzip2 can do that, but it's slow.

    Cloudera had an introduction about it. For MapReduce, LZO sounds a good balance between compression ratio and compress/decompress speed.

提交回复
热议问题