发表新帖

发表新帖

PySpark: ModuleNotFoundError: No module named 'app'

前端未结

关注

 1  903

情书的邮戳

I am saving a dataframe to a CSV file in PySpark using below statement:

df_all.repartition(1).write.csv(\"xyz.csv\", header=True, mode=\'overwrite\')

相关标签:

1条回答

陌清茗

2021-01-18 20:53

The error is very clear, there is not the module 'app'. Your Python code runs on driver, but you udf runs on executor PVM. When you call the udf, spark serializes the create_emi_amount to sent it to the executors.

So, somewhere in your method create_emi_amount you use or import the app module. A solution to your problem is to use the same environment in both driver and executors. In spark-env.sh set the save Python virtualenv in PYSPARK_DRIVER_PYTHON=... and PYSPARK_PYTHON=....

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题