How the data is split in Hadoop

后端未结

关注

 5  1445

不知归路 2021-01-31 12:17

Does the Hadoop split the data based on the number of mappers set in the program? That is, having a data set of size 500MB, if the number of mappers is 200 (assuming that the Ha

5条回答

遥遥无期 (楼主)

2021-01-31 12:26

I just ran a sample MR program based on your question and here is my finding

Input: a file smaller that block size.

Case 1: Number of mapper =1 Result : 1 map task launched. Inputsplit size for each mapper(in this case only one) is same as the input file size.

Case 2: Number of mappers = 5 Result : 5 map tasks launched. Inputsplit size for each mapper is one fifth of the input file size.

Case 3: Number of mappers = 10 Result : 10 map tasks launched. Inputsplit size for each mapper is one 10th of the input file size.

So based on above, for file less then block size,

split size = total input file size / number of map task launched.

Note: But keep in mind that no. of map task is decided by based on input splits.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...