Why Is a Block in HDFS So Large?

后端未结

关注

 3  1089

野性不改

Can somebody explain this calculation and give a lucid explanation?

A quick calculation shows that if the seek time is around 10 ms and the transfer r

相关标签:

3条回答

别那么骄傲

2020-12-30 04:53

Since 100mb is divided into 10 blocks you gotta do 10 seeks and transfer rate is (10/100)mb/s for each file. (10ms*10) + (10/100mb/s)*10 = 1.1 sec. which is greater than 1.01 anyway.

0 讨论(0)
发布评论:

提交评论
- 加载中...
你的背包

2020-12-30 04:58

Since 100mb is divided among 10 blocks, each block has 10mb only as it is HDFS. Then it should be 10*10ms + 10mb/(100Mb/s) = 0.1s+ 0.1s = 0.2s and even lesser time.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-12-30 05:03

A block will be stored as a contiguous piece of information on the disk, which means that the total time to read it completely is the time to locate it (seek time) + the time to read its content without doing any more seeks, i.e. sizeOfTheBlock / transferRate = transferTime.

If we keep the ratio seekTime / transferTime small (close to .01 in the text), it means we are reading data from the disk almost as fast as the physical limit imposed by the disk, with minimal time spent looking for information.

This is important since in map reduce jobs we are typically traversing (reading) the whole data set (represented by an HDFS file or folder or set of folders) and doing logic on it, so since we have to spend the full transferTime anyway to get all the data out of the disk, let's try to minimise the time spent doing seeks and read by big chunks, hence the large size of the data blocks.

In more traditional disk access software, we typically do not read the whole data set every time, so we'd rather spend more time doing plenty of seeks on smaller blocks rather than losing time transferring too much data that we won't need.

0 讨论(0)
发布评论:

提交评论
- 加载中...