The input data is generated by mapreduce and then handed over to pyspark for processing. Because it is only 20mb in size, there is only one block. But I need to increase the