How the data is split in Hadoop
问题 Does the Hadoop split the data based on the number of mappers set in the program? That is, having a data set of size 500MB, if the number of mappers is 200 (assuming that the Hadoop cluster allows 200 mappers simultaneously), is each mapper given 2.5 MB of data? Besides,do all the mappers run simultaneously or some of them might get run in serial? 回答1: I just ran a sample MR program based on your question and here is my finding Input: a file smaller that block size. Case 1: Number of mapper