Does the Hadoop split the data based on the number of mappers set in the program? That is, having a data set of size 500MB, if the number of mappers is 200 (assuming that the Ha
If 200 mapper are running for 500mb of data, then you need to check for each individual file size. If that file size is lesser than block size (64 mb ) then it will run map task for each file.
Normally we merge the smaller files in large file (sizing greater than block size)