How the data is split in Hadoop

后端 未结 5 1445
不知归路
不知归路 2021-01-31 12:17

Does the Hadoop split the data based on the number of mappers set in the program? That is, having a data set of size 500MB, if the number of mappers is 200 (assuming that the Ha

5条回答
  •  遥遥无期
    2021-01-31 12:26

    I just ran a sample MR program based on your question and here is my finding

    Input: a file smaller that block size.

    Case 1: Number of mapper =1 Result : 1 map task launched. Inputsplit size for each mapper(in this case only one) is same as the input file size.

    Case 2: Number of mappers = 5 Result : 5 map tasks launched. Inputsplit size for each mapper is one fifth of the input file size.

    Case 3: Number of mappers = 10 Result : 10 map tasks launched. Inputsplit size for each mapper is one 10th of the input file size.

    So based on above, for file less then block size,

    split size = total input file size / number of map task launched.

    Note: But keep in mind that no. of map task is decided by based on input splits.

提交回复
热议问题