I am saving a dataframe to a CSV file in PySpark using below statement:
df_all.repartition(1).write.csv(\"xyz.csv\", header=True, mode=\'overwrite\')
The error is very clear, there is not the module 'app'. Your Python code runs on driver, but you udf runs on executor PVM. When you call the udf, spark serializes the create_emi_amount
to sent it to the executors.
So, somewhere in your method create_emi_amount
you use or import the app module.
A solution to your problem is to use the same environment in both driver and executors. In spark-env.sh
set the save Python virtualenv in PYSPARK_DRIVER_PYTHON=...
and PYSPARK_PYTHON=...
.