Does the Hadoop split the data based on the number of mappers set in the program? That is, having a data set of size 500MB, if the number of mappers is 200 (assuming that the Ha
I just ran a sample MR program based on your question and here is my finding
Input: a file smaller that block size.
Case 1: Number of mapper =1 Result : 1 map task launched. Inputsplit size for each mapper(in this case only one) is same as the input file size.
Case 2: Number of mappers = 5 Result : 5 map tasks launched. Inputsplit size for each mapper is one fifth of the input file size.
Case 3: Number of mappers = 10 Result : 10 map tasks launched. Inputsplit size for each mapper is one 10th of the input file size.
So based on above, for file less then block size,
split size = total input file size / number of map task launched.
Note: But keep in mind that no. of map task is decided by based on input splits.