data block size in HDFS, why 64MB?

后端 未结 8 1423
既然无缘
既然无缘 2020-11-28 21:43

The default data block size of HDFS/hadoop is 64MB. The block size in disk is generally 4KB. What does 64MB block size mean? ->Does it mean that the smallest unit of read fr

相关标签:
8条回答
  • 2020-11-28 22:32
    1. If block size was set to less than 64, there would be a huge number of blocks throughout the cluster, which causes NameNode to manage an enormous amount of metadata.
    2. Since we need a Mapper for each block, there would be a lot of Mappers, each processing a piece bit of data, which isn't efficient.
    0 讨论(0)
  • 2020-11-28 22:32

    The reason Hadoop chose 64MB was because Google chose 64MB. The reason Google chose 64MB was due to a Goldilocks argument.

    Having a much smaller block size would cause seek overhead to increase.

    Having a moderately smaller block size makes map tasks run fast enough that the cost of scheduling them becomes comparable to the cost of running them.

    Having a significantly larger block size begins to decrease the available read parallelism available and may ultimately make it hard to schedule tasks local to the tasks.

    See Google Research Publication: MapReduce http://research.google.com/archive/mapreduce.html

    0 讨论(0)
提交回复
热议问题