data block size in HDFS, why 64MB?

后端未结

关注

 8  1425

The default data block size of HDFS/hadoop is 64MB. The block size in disk is generally 4KB. What does 64MB block size mean? ->Does it mean that the smallest unit of read fr

相关标签:

8条回答

醉梦人生

2020-11-28 22:32
1. If block size was set to less than 64, there would be a huge number of blocks throughout the cluster, which causes NameNode to manage an enormous amount of metadata.
2. Since we need a Mapper for each block, there would be a lot of Mappers, each processing a piece bit of data, which isn't efficient.
0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2020-11-28 22:32

The reason Hadoop chose 64MB was because Google chose 64MB. The reason Google chose 64MB was due to a Goldilocks argument.

Having a much smaller block size would cause seek overhead to increase.

Having a moderately smaller block size makes map tasks run fast enough that the cost of scheduling them becomes comparable to the cost of running them.

Having a significantly larger block size begins to decrease the available read parallelism available and may ultimately make it hard to schedule tasks local to the tasks.

See Google Research Publication: MapReduce http://research.google.com/archive/mapreduce.html

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2