I am generating some fake data of 10M rows(to learn PySpark). Here is the code I want to parallelise: (link to complete code: https://gist.github.com/aialenti/cfd4e213